This is the final installment of a seven-part series demonstrating multithreading techniques and performance characteristics in VB.NET. Catch up on the previous installments:
- Part one: Introduction to multithreading
- Part two: The Application Skeleton
- Part three: Single Threaded Performance
- Part four: SyncLock Performance
- Part five: Mutex Performance
- Part six: Monitor Performance
This week’s post on multithreading takes us full circle, back to non-atomic operations. Unlike the first post that tested performance, which performed them in a single thread, this one is multithreaded. Although it is multithreaded, and performs read/writes to shared variables, there is no thread safety whatsoever. As a result, in actual code where the contents shared variables are needed, they cannot be trusted. The variable that holds the number of completed computations is actually trustworthy, because all threads are doing the exact same thing, adding the number 1 to it. If it was doing something less predictable (such as multiplying by the iteration number or adding to a string of characters) the results would be chaotic at best. That is an important distinction to note: while this code is actually functional, in a real program it would probably not work, and definitely not work as expected!
You can download an installer for this program here. The installer will also install the full source code for the project, in a subdirectory of the installation path. Feel free to try it out yourself and tinker with it or just to look at it to get a better understanding of multithreading techniques.
This is the code that launches the threads:
Public Sub NonAtomicMultiThreadComputation(ByVal Iterations As Integer, Optional ByVal ThreadCount As Integer = 0)
Dim twNonAtomic As NonAtomicThreadWorker
Dim IntegerIterationCounter As Integer
Dim iOriginalMaxThreads As Integer
Dim iOriginalMinThreads As Integer
Dim iOriginalMaxIOThreads As Integer
Dim iOriginalMinIOThreads As Integer twNonAtomic = New NonAtomicThreadWorker Threading.ThreadPool.GetMaxThreads(iOriginalMaxThreads, iOriginalMaxIOThreads)
Threading.ThreadPool.GetMinThreads(iOriginalMinThreads, iOriginalMinIOThreads) If ThreadCount > 0 Then
Threading.ThreadPool.SetMaxThreads(ThreadCount, ThreadCount)
Threading.ThreadPool.SetMinThreads(ThreadCount, ThreadCount)
End If For IntegerIterationCounter = 1 To Iterations
Threading.ThreadPool.QueueUserWorkItem(AddressOf twNonAtomic.ThreadProc, Double.Parse(IntegerIterationCounter))
Next While NonAtomicThreadWorker.IntegerCompletedComputations < Iterations
End While
Threading.ThreadPool.SetMaxThreads(iOriginalMaxThreads, iOriginalMaxIOThreads)
Threading.ThreadPool.SetMinThreads(iOriginalMinThreads, iOriginalMinIOThreads)
twNonAtomic = Nothing
IntegerIterationCounter = Nothing
End Sub
And here is the code for the class itself that performs the work:
Public Class NonAtomicThreadWorker
Public Shared IntegerCompletedComputations As Integer = 0
Private Shared DoubleStorage As Double Public Property Storage() As Double
Get
Return DoubleStorage
End Get
Set(ByVal value As Double)
DoubleStorage = value
End Set
End Property Public Property CompletedComputations() As Integer
Get
Return IntegerCompletedComputations
End Get
Set(ByVal value As Integer)
IntegerCompletedComputations = value
End Set
End Property Public Sub ThreadProc(ByVal StateObject As Object)
Dim ttuComputation As ThreadTestUtilities ttuComputation = New ThreadTestUtilities Storage = ttuComputation.Compute(CDbl(StateObject))
CompletedComputations += 1
ttuComputation = Nothing
End Sub
Public Sub New()
End Sub
End Class
Here are the results of our tests. All tests are for 1,000,000 iterations, and the results are in milliseconds per test run
TEST 1
This test allows the ThreadPool to manage the total number of minimum and maximum threads on its own:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 16578.125 | 17296.875 | 15359.375 | 14453.125 | 19265.625 | 16590.625 |
System B | 16296.666 | 16296.666 | 17562.275 | 15859.172 | 14140.444 | 16031.045 |
System C | 17328.347 | 19140.870 | 19625.251 | 19531.500 | 23125.296 | 19750.253 |
System D | 30250.194 | 30140.818 | 29531.439 | 30078.318 | 29500.189 | 29900.192 |
Average | 20568.029 |
TEST 2
In this test, we limit the maximum number of threads to one per logical processor:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 11046.875 | 10796.875 | 10968.750 | 10906.250 | 10843.750 | 10912.500 |
System B | 18624.762 | 19796.622 | 26874.656 | 13359.204 | 14577.938 | 18646.636 |
System C | 12234.532 | 13390.796 | 26000.333 | 31641.030 | 12656.412 | 19184.621 |
System D | 29468.939 | 29297.063 | 29468.939 | 29500.189 | 29406.438 | 29428.314 |
Average | 19543.018 |
TEST 3
This test uses only one thread:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 10812.500 | 11078.125 | 12265.625 | 10781.250 | 13296.875 | 11646.875 |
System B | 14749.811 | 14906.059 | 19718.498 | 38577.631 | 41999.462 | 25990.292 |
System C | 12812.664 | 12609.536 | 13453.297 | 16078.331 | 13234.544 | 13637.674 |
System D | 29406.438 | 29484.564 | 30234.569 | 29375.188 | 29468.939 | 29593.940 |
Average | 20217.195 |
TEST 4
This test uses two concurrent threads:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 12937.500 | 13453.125 | 14218.750 | 15593.750 | 13718.750 | 13984.375 |
System B | 54249.306 | 30396.266 | 30036.824 | 21266.850 | 18468.986 | 30883.646 |
System C | 19531.500 | 17172.095 | 19656.502 | 18203.358 | 22312.786 | 19375.248 |
System D | 29437.688 | 29359.563 | 29625.190 | 29437.688 | 29468.939 | 29465.814 |
Average | 23427.271 |
TEST 5
Here we show four concurrent threads:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 22468.750 | 20437.500 | 23703.125 | 22828.125 | 21203.125 | 22128.125 |
System B | 22719.041 | 32422.290 | 20484.637 | 30828.520 | 34328.564 | 28156.610 |
System C | 36047.336 | 39578.632 | 37469.230 | 40000.512 | 34203.563 | 37459.855 |
System D | 30297.069 | 29359.563 | 29343.938 | 29312.688 | 29359.563 | 29534.564 |
Average | 29319.789 |
TEST 6
This test uses eight concurrent threads:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 24906.250 | 25046.875 | 24250.000 | 24812.500 | 24734.375 | 24750.000 |
System B | 37453.604 | 36453.125 | 29078.125 | 30562.500 | 33890.625 | 33487.596 |
System C | 32891.046 | 32266.038 | 32969.172 | 32516.041 | 32766.044 | 32681.668 |
System D | 29406.438 | 29422.063 | 29375.188 | 29390.813 | 29422.063 | 29403.313 |
Average | 30080.644 |
TEST 7
Finally, this test runs 16 simultaneous threads:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 24937.500 | 25125.000 | 24859.375 | 24546.875 | 24734.375 | 24840.625 |
System B | 44749.427 | 43311.946 | 33749.568 | 36218.286 | 39358.871 | 39477.620 |
System C | 32937.922 | 32719.169 | 32953.547 | 32734.794 | 32812.920 | 32831.670 |
System D | 29468.939 | 45390.916 | 45765.918 | 34687.722 | 34578.346 | 37978.368 |
Average | 33782.071 |
System A: AMD Sempron 3200 (1 logical x64 CPU), 1 GB RAM
System B: AMD Athlon 3200+ (1 logical x64 CPU), 1 GB RAM
System C: Intel Pentium 4 2.8 gHz (1 logical x86 CPU), 1 GB RAM
System D: Two Intel Xeon 3.0 gHz (2 dual core, HyperThreaded CPUs providing 8 logical x64 CPUs), 2 GB RAM
It is extremely important to understand the following information and disclaimers regarding these benchmark figures:
They are not to be taken as absolute numbers. They are taken on real-world systems with real-world OS installations, not clean benchmark systems. They are not to be used as any concrete measure of relative CPU performance; they simply illustrate the different relative performance characteristics of different multithreading techniques on different numbers of logical CPUs, in order to show how different processors can perform differently with different techniques.
As you can see, having no locks occurring results in significant performance gains over the tests that were performed with carious locking mechanisms. However, it is important to understand the ramifications of not using locking. If there is any type of data that needs to be shared amongst threads, locking will have to come into play. Judicial use of locking should prevent the performance hit from being too high. It is also important to note that for many situations, running your computations in a single thread will actually be faster than using multithreading; it all depends on your hardware and what you will actually be doing. As I have said before, “your mileage will vary.” Test, test, and test again to see what techniques work best for your particular application.
We have come to the end of the end of this series. As always, feedback and comments are appreciated.
J.Ja