Hardware

Multithreading tutorial, part six: Monitor performance


This is the sixth installment of a multi-part series demonstrating multithreading techniques and performance characteristics in VB.Net. Catch up on the previous installments:

So far, we have covered single thread performance, and SyncLock and Mutex for maintaining data concurrency. This time, we will be using Monitor for concurrency. The differences between the three methods are subtle, but important. SyncLock ensures that a particular block of code is only being run by one thread at a time. Mutex is a class of its own with methods to ensure that only one thread is using it at a time; by attempting to lock the Mutex and unlocking it when finished, we are assured that only one thread is performing an operation at a time. In contract, Monitor is a singleton class that performs the locking itself, using objects as a key for the locking.

One very important note about using the Monitor class: do not use primitives as the object to lock! Because the primitive gets automatically wrapped in an object when passed to Monitor, each time you attempt the lock on the primitive, it is considered a separate object, despite the fact that the same primitive was passed in. As a result, your lock will not properly occur, and concurrency will be lost.

Here is the code used for this test:

Public Sub MonitorMultiThreadComputation(ByVal Iterations As Integer, Optional ByVal ThreadCount As Integer = 0)

  Dim twMonitorLock As MonitorThreadWorker

  Dim IntegerIterationCounter As Integer

  Dim iOriginalMaxThreads As Integer

  Dim iOriginalMinThreads As Integer

  Dim iOriginalMaxIOThreads As Integer

  Dim iOriginalMinIOThreads As Integer  twMonitorLock = New MonitorThreadWorker  Threading.ThreadPool.GetMaxThreads(iOriginalMaxThreads, iOriginalMaxIOThreads)

  Threading.ThreadPool.GetMinThreads(iOriginalMinThreads, iOriginalMinIOThreads)  If ThreadCount > 0 Then

    Threading.ThreadPool.SetMaxThreads(ThreadCount, ThreadCount)

    Threading.ThreadPool.SetMinThreads(ThreadCount, ThreadCount)

  End If  For IntegerIterationCounter = 1 To Iterations

    Threading.ThreadPool.QueueUserWorkItem(AddressOf twMonitorLock.ThreadProc, Double.Parse(IntegerIterationCounter))

  Next  While MonitorThreadWorker.IntegerCompletedComputations < Iterations

  End While

  Threading.ThreadPool.SetMaxThreads(iOriginalMaxThreads, iOriginalMaxIOThreads)

  Threading.ThreadPool.SetMinThreads(iOriginalMinThreads, iOriginalMinIOThreads)

  twMonitorLock = Nothing

  IntegerIterationCounter = Nothing

End Sub

And the MonitorThreadWorker class:

Public Class MonitorThreadWorker

  Private Shared ObjectStorageLock As New Object

  Private Shared ObjectComputationsLock As New Object

  Public Shared IntegerCompletedComputations As Integer = 0

  Private Shared DoubleStorage As Double  Public Property Storage() As Double

    Get

      Threading.Monitor.Enter(ObjectStorageLock)

      Return DoubleStorage

      Threading.Monitor.Exit(ObjectStorageLock)

    End Get

    Set(ByVal value As Double)

      Threading.Monitor.Enter(ObjectStorageLock)

      DoubleStorage = value

      Threading.Monitor.Exit(ObjectStorageLock)

    End Set

  End Property  Public Property CompletedComputations() As Integer

    Get

      Return IntegerCompletedComputations

    End Get

    Set(ByVal value As Integer)

      IntegerCompletedComputations = value

    End Set

  End Property  Public Sub ThreadProc(ByVal StateObject As Object)

    Dim ttuComputation As ThreadTestUtilities    ttuComputation = New ThreadTestUtilities    Storage = ttuComputation.Compute(CDbl(StateObject))

    Threading.Monitor.Enter(ObjectComputationsLock)

    CompletedComputations += 1

    Threading.Monitor.Exit(ObjectComputationsLock)

    ttuComputation = Nothing

  End Sub

  Public Sub New()

  End Sub

End Class

Here are the results of our tests. All tests are for 1,000,000 iterations, and the results are in milliseconds per test run

TEST 1

This test allows the ThreadPool to manage the total number of minimum and maximum threads on its own:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 18609.375 21125.000 15187.500 16953.125 14859.375 17346.875
System B 16890.301 13624.738 19702.747 19155.882 25280.765 18930.887
System C 16265.625 28687.500 18109.375 15765.625 19015.625 19568.750
System D 30468.945 30547.071 30422.070 30390.820 30484.570 30462.695
Average 21577.302

TEST 2

In this test, we limit the maximum number of threads to one per logical processor:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 19000.000 17875.000 16109.375 19937.500 17546.875 18093.750
System B 28765.073 20327.735 25983.876 30952.531 18812.139 24968.271
System C 22406.250 34031.250 36984.375 45703.125 38093.750 35443.750
System D 30453.320 30437.695 30484.570 30359.569 30515.820 30450.195
Average 27238.992

TEST 3

This test uses only one thread:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 17625.000 13609.375 15921.875 18000.000 15890.625 16209.375
System B 19218.381 14812.216 24437.031 20030.865 37702.401 23240.179
System C 26562.500 22828.125 24218.750 34640.625 31171.875 27884.375
System D 30453.320 30437.695 30406.445 30515.820 30562.696 30475.195
Average 24452.281

TEST 4

This test uses two concurrent threads:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 13468.750 14687.500 15796.875 17312.500 13625.000 14978.125
System B 29124.441 22187.074 21077.720 13640.363 16859.051 20577.730
System C 16625.000 15687.500 18375.000 17406.250 17296.875 17078.125
System D 30453.320 30265.819 30437.695 30468.945 30422.070 30409.570
Average 20760.888

TEST 5

Here we show four concurrent threads:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 24515.625 25187.500 15546.875 26234.375 25125.000 23321.875
System B 33061.865 34436.839 31327.524 32593.124 18484.020 29980.674
System C 24375.000 21062.500 20656.250 23750.000 20531.250 22075.000
System D 30406.445 30390.820 30468.945 30328.319 30531.445 30425.195
Average 26450.686

TEST 6

This test uses eight concurrent threads:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 26156.250 25593.750 25328.125 25906.250 26109.375 25818.750
System B 44889.763 34108.720 22905.810 19077.759 16843.427 27565.096
System C 34796.875 34343.750 30812.500 33718.750 21296.875 30993.750
System D 30625.196 30531.445 30328.319 30468.945 30406.445 30472.070
Average 28712.417

TEST 7

Finally, this test runs 16 simultaneous threads:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 26109.375 25421.875 25640.625 25203.125 25281.250 25531.250
System B 31296.274 22093.326 16359.061 42827.303 20687.103 26652.613
System C 41296.875 32125.000 34078.125 32781.250 29984.375 34053.125
System D 36890.861 50687.824 50672.199 50547.199 50578.449 47875.306
Average 33528.074

System A: AMD Sempron 3200 (1 logical x64 CPU), 1 GB RAM System B: AMD Athlon 3200+ (1 logical x64 CPU), 1 GB RAM System C: Intel Pentium 4 2.8 gHz (1 logical x86 CPU), 1 GB RAM System D: Two Intel Xeon 3.0 gHz (2 dual core, HyperThreaded CPUs providing 8 logical x64 CPUs), 2 GB RAM

It is extremely important to understand the following information and disclaimers regarding these benchmark figures:

They are not to be taken as absolute numbers. They are taken on real-world systems with real-world OS installations, not clean benchmark systems. They are not to be used as any concrete measure of relative CPU performance; they simply illustrate the different relative performance characteristics of different multithreading techniques on different numbers of logical CPUs, in order to show how different processors can perform differently with different techniques.

You will see that while Monitor is consistently a bit slower than SyncLock, it is nearly as fast, and is much faster than Mutex. So why have different methods at all? Well, there are some very good reasons for it. Each method has its own purposes. Although SyncLock is marginally quicker than Monitor, Monitor locks on a per-object basis while SyncLock locks entire blocks of code at a time. For a single line where only one variable is being used and requires concurrency (such as an incrementer), SyncLock is the better choice. But for something more complex, such as many lines of code that deal with many variables, only a few of which need concurrency, Monitor is a better choice. It is preferable to let other threads do as much of that code as they can, blocking only on certain parts, than to have the entire block of code held up. Mutex has capabilities that do not exist in Monitor or SyncLock; for further information, check the documentation.

J.Ja

About

Justin James is the Lead Architect for Conigent.

2 comments
Me0001
Me0001

Is the SyncLock statement the same as the C# lock statement? and if so then it would also be implemented using a Monitor as it is in C#? If this is the case it surprises me that the Monitor and SyncLock statements do not have closer results.

Tony Hopkinson
Tony Hopkinson

Monitor is better choice in some scenarios, it does morre so it costs more. In principle it gives you more granular control of locking. If you only have one grain it's pointless, if your grains are the size of boulders it's also pointless. :p If you are comparing these times to what you are getting now, well one or three things have changed since 2006....