This is the sixth installment of a multi-part series demonstrating multithreading techniques and performance characteristics in VB.Net. Catch up on the previous installments:
- Part one: Introduction to multithreading
- Part two: The Application Skeleton
- Part three: Single Threaded Performance
- Part four: SyncLock Performance
- Part five: Mutex Performance
So far, we have covered single thread performance, and SyncLock and Mutex for maintaining data concurrency. This time, we will be using Monitor for concurrency. The differences between the three methods are subtle, but important. SyncLock ensures that a particular block of code is only being run by one thread at a time. Mutex is a class of its own with methods to ensure that only one thread is using it at a time; by attempting to lock the Mutex and unlocking it when finished, we are assured that only one thread is performing an operation at a time. In contract, Monitor is a singleton class that performs the locking itself, using objects as a key for the locking.
One very important note about using the Monitor class: do not use primitives as the object to lock! Because the primitive gets automatically wrapped in an object when passed to Monitor, each time you attempt the lock on the primitive, it is considered a separate object, despite the fact that the same primitive was passed in. As a result, your lock will not properly occur, and concurrency will be lost.
Here is the code used for this test:
Public Sub MonitorMultiThreadComputation(ByVal Iterations As Integer, Optional ByVal ThreadCount As Integer = 0)
Dim twMonitorLock As MonitorThreadWorker
Dim IntegerIterationCounter As Integer
Dim iOriginalMaxThreads As Integer
Dim iOriginalMinThreads As Integer
Dim iOriginalMaxIOThreads As Integer
Dim iOriginalMinIOThreads As Integer twMonitorLock = New MonitorThreadWorker Threading.ThreadPool.GetMaxThreads(iOriginalMaxThreads, iOriginalMaxIOThreads)
Threading.ThreadPool.GetMinThreads(iOriginalMinThreads, iOriginalMinIOThreads) If ThreadCount > 0 Then
Threading.ThreadPool.SetMaxThreads(ThreadCount, ThreadCount)
Threading.ThreadPool.SetMinThreads(ThreadCount, ThreadCount)
End If For IntegerIterationCounter = 1 To Iterations
Threading.ThreadPool.QueueUserWorkItem(AddressOf twMonitorLock.ThreadProc, Double.Parse(IntegerIterationCounter))
Next While MonitorThreadWorker.IntegerCompletedComputations < Iterations
End While
Threading.ThreadPool.SetMaxThreads(iOriginalMaxThreads, iOriginalMaxIOThreads)
Threading.ThreadPool.SetMinThreads(iOriginalMinThreads, iOriginalMinIOThreads)
twMonitorLock = Nothing
IntegerIterationCounter = Nothing
End Sub
And the MonitorThreadWorker class:
Public Class MonitorThreadWorker
Private Shared ObjectStorageLock As New Object
Private Shared ObjectComputationsLock As New Object
Public Shared IntegerCompletedComputations As Integer = 0
Private Shared DoubleStorage As Double Public Property Storage() As Double
Get
Threading.Monitor.Enter(ObjectStorageLock)
Return DoubleStorage
Threading.Monitor.Exit(ObjectStorageLock)
End Get
Set(ByVal value As Double)
Threading.Monitor.Enter(ObjectStorageLock)
DoubleStorage = value
Threading.Monitor.Exit(ObjectStorageLock)
End Set
End Property Public Property CompletedComputations() As Integer
Get
Return IntegerCompletedComputations
End Get
Set(ByVal value As Integer)
IntegerCompletedComputations = value
End Set
End Property Public Sub ThreadProc(ByVal StateObject As Object)
Dim ttuComputation As ThreadTestUtilities ttuComputation = New ThreadTestUtilities Storage = ttuComputation.Compute(CDbl(StateObject))
Threading.Monitor.Enter(ObjectComputationsLock)
CompletedComputations += 1
Threading.Monitor.Exit(ObjectComputationsLock)
ttuComputation = Nothing
End Sub
Public Sub New()
End Sub
End Class
Here are the results of our tests. All tests are for 1,000,000 iterations, and the results are in milliseconds per test run
TEST 1
This test allows the ThreadPool to manage the total number of minimum and maximum threads on its own:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 18609.375 | 21125.000 | 15187.500 | 16953.125 | 14859.375 | 17346.875 |
System B | 16890.301 | 13624.738 | 19702.747 | 19155.882 | 25280.765 | 18930.887 |
System C | 16265.625 | 28687.500 | 18109.375 | 15765.625 | 19015.625 | 19568.750 |
System D | 30468.945 | 30547.071 | 30422.070 | 30390.820 | 30484.570 | 30462.695 |
Average | 21577.302 |
TEST 2
In this test, we limit the maximum number of threads to one per logical processor:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 19000.000 | 17875.000 | 16109.375 | 19937.500 | 17546.875 | 18093.750 |
System B | 28765.073 | 20327.735 | 25983.876 | 30952.531 | 18812.139 | 24968.271 |
System C | 22406.250 | 34031.250 | 36984.375 | 45703.125 | 38093.750 | 35443.750 |
System D | 30453.320 | 30437.695 | 30484.570 | 30359.569 | 30515.820 | 30450.195 |
Average | 27238.992 |
TEST 3
This test uses only one thread:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 17625.000 | 13609.375 | 15921.875 | 18000.000 | 15890.625 | 16209.375 |
System B | 19218.381 | 14812.216 | 24437.031 | 20030.865 | 37702.401 | 23240.179 |
System C | 26562.500 | 22828.125 | 24218.750 | 34640.625 | 31171.875 | 27884.375 |
System D | 30453.320 | 30437.695 | 30406.445 | 30515.820 | 30562.696 | 30475.195 |
Average | 24452.281 |
TEST 4
This test uses two concurrent threads:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 13468.750 | 14687.500 | 15796.875 | 17312.500 | 13625.000 | 14978.125 |
System B | 29124.441 | 22187.074 | 21077.720 | 13640.363 | 16859.051 | 20577.730 |
System C | 16625.000 | 15687.500 | 18375.000 | 17406.250 | 17296.875 | 17078.125 |
System D | 30453.320 | 30265.819 | 30437.695 | 30468.945 | 30422.070 | 30409.570 |
Average | 20760.888 |
TEST 5
Here we show four concurrent threads:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 24515.625 | 25187.500 | 15546.875 | 26234.375 | 25125.000 | 23321.875 |
System B | 33061.865 | 34436.839 | 31327.524 | 32593.124 | 18484.020 | 29980.674 |
System C | 24375.000 | 21062.500 | 20656.250 | 23750.000 | 20531.250 | 22075.000 |
System D | 30406.445 | 30390.820 | 30468.945 | 30328.319 | 30531.445 | 30425.195 |
Average | 26450.686 |
TEST 6
This test uses eight concurrent threads:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 26156.250 | 25593.750 | 25328.125 | 25906.250 | 26109.375 | 25818.750 |
System B | 44889.763 | 34108.720 | 22905.810 | 19077.759 | 16843.427 | 27565.096 |
System C | 34796.875 | 34343.750 | 30812.500 | 33718.750 | 21296.875 | 30993.750 |
System D | 30625.196 | 30531.445 | 30328.319 | 30468.945 | 30406.445 | 30472.070 |
Average | 28712.417 |
TEST 7
Finally, this test runs 16 simultaneous threads:
Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Average | |
System A | 26109.375 | 25421.875 | 25640.625 | 25203.125 | 25281.250 | 25531.250 |
System B | 31296.274 | 22093.326 | 16359.061 | 42827.303 | 20687.103 | 26652.613 |
System C | 41296.875 | 32125.000 | 34078.125 | 32781.250 | 29984.375 | 34053.125 |
System D | 36890.861 | 50687.824 | 50672.199 | 50547.199 | 50578.449 | 47875.306 |
Average | 33528.074 |
System A: AMD Sempron 3200 (1 logical x64 CPU), 1 GB RAM
System B: AMD Athlon 3200+ (1 logical x64 CPU), 1 GB RAM
System C: Intel Pentium 4 2.8 gHz (1 logical x86 CPU), 1 GB RAM
System D: Two Intel Xeon 3.0 gHz (2 dual core, HyperThreaded CPUs providing 8 logical x64 CPUs), 2 GB RAM
It is extremely important to understand the following information and disclaimers regarding these benchmark figures:
They are not to be taken as absolute numbers. They are taken on real-world systems with real-world OS installations, not clean benchmark systems. They are not to be used as any concrete measure of relative CPU performance; they simply illustrate the different relative performance characteristics of different multithreading techniques on different numbers of logical CPUs, in order to show how different processors can perform differently with different techniques.
You will see that while Monitor is consistently a bit slower than SyncLock, it is nearly as fast, and is much faster than Mutex. So why have different methods at all? Well, there are some very good reasons for it. Each method has its own purposes. Although SyncLock is marginally quicker than Monitor, Monitor locks on a per-object basis while SyncLock locks entire blocks of code at a time. For a single line where only one variable is being used and requires concurrency (such as an incrementer), SyncLock is the better choice. But for something more complex, such as many lines of code that deal with many variables, only a few of which need concurrency, Monitor is a better choice. It is preferable to let other threads do as much of that code as they can, blocking only on certain parts, than to have the entire block of code held up. Mutex has capabilities that do not exist in Monitor or SyncLock; for further information, check the documentation.
J.Ja