Software Development

Multithreading tutorial, part four: SyncLock Performance


This is the fourth installment of a multi-part series demonstrating multithreading techniques and performance characteristics in VB.Net. Catch up on previous installments: Introduction to multithreading, The Application Skeleton, and Single Threaded Performance.

In today's post, we will take a look at using the SyncLock system to maintain data integrity during operations. We will also be showing off the .Net ThreadPool object as well. Initially, I planned to write my own version of the ThreadPool. However, there is really no reason to do so within the scope of this series. The .Net ThreadPool works great for our needs and is well designed to manage the number of active threads. The .Net ThreadPool is a very smart way of managing multiple threads, and it allows us to set the maximum number of executing threads, which allows us to achieve our goal of seeing what the effects of limiting or expanding the number of running threads has on performance. Writing a thread pool on our own is a rather difficult task, and is not needed unless your project has very specialized needs.

Here is the code for the SyncLockThreadWorker object that performs the computations and atomic operations:

Public Class SyncLockThreadWorker

  Public Shared objStorageLock As New Object

  Public Shared objCompletedComputationsLock As New Object

  Public Shared IntegerCompletedComputations As Integer = 0

  Private Shared DoubleStorage As Double  Public Property Storage() As Double

    Get

      SyncLock objStorageLock

        Return DoubleStorage

      End SyncLock

    End Get

    Set(ByVal value As Double)

      SyncLock objStorageLock

        DoubleStorage = value

      End SyncLock

    End Set

  End Property  Public Property CompletedComputations() As Integer

    Get

      Return IntegerCompletedComputations

    End Get

    Set(ByVal value As Integer)

      IntegerCompletedComputations = value

    End Set

  End Property  Public Sub ThreadProc(ByVal StateObject As Object)

    Dim ttuComputation As ThreadTestUtilities    ttuComputation = New ThreadTestUtilities    Storage = ttuComputation.Compute(CDbl(StateObject))

    SyncLock objCompletedComputationsLock

      CompletedComputations += 1

    End SyncLock

    ttuComputation = Nothing

  End Sub

  Public Sub New()

  End Sub

End Class

Inspection of the class shows that we use SyncLock in two places. The first place is in updating the Storage property. The other place is in updating the number of completed threads. Because we do not ever read the Storage property while simultaneously writing to it, placing the lock within the property accessor is acceptable. For the CompletedComputations property, however, we read and write to it in the same line, making it necessary to lock it as we work with it, as opposed to within the accessor.

This is the code that creates and runs the threads, as well as waits for them all to finish before completing. If we do not wait for all of the threads to be finished, the Sub will exit as soon as the threads are queued.

Public Sub SyncLockMultiThreadComputation(ByVal Iterations As Integer, Optional ByVal ThreadCount As Integer = 0)

  Dim twSyncLock As SyncLockThreadWorker

  Dim IntegerIterationCounter As Integer

  Dim iOriginalMaxThreads As Integer

  Dim iOriginalMinThreads As Integer

  Dim iOriginalMaxIOThreads As Integer

  Dim iOriginalMinIOThreads As Integer  twSyncLock = New SyncLockThreadWorker  Threading.ThreadPool.GetMaxThreads(iOriginalMaxThreads, iOriginalMaxIOThreads)

  Threading.ThreadPool.GetMinThreads(iOriginalMinThreads, iOriginalMinIOThreads)  If ThreadCount > 0 Then

    Threading.ThreadPool.SetMaxThreads(ThreadCount, ThreadCount)

    Threading.ThreadPool.SetMinThreads(ThreadCount, ThreadCount)

  End If

  For IntegerIterationCounter = 1 To Iterations

    Threading.ThreadPool.QueueUserWorkItem(AddressOf twSyncLock.ThreadProc, Double.Parse(IntegerIterationCounter))

  Next

  While SyncLockThreadWorker.IntegerCompletedComputations < Iterations

  End While

  Threading.ThreadPool.SetMaxThreads(iOriginalMaxThreads, iOriginalMaxIOThreads)

  Threading.ThreadPool.SetMinThreads(iOriginalMinThreads, iOriginalMinIOThreads)

  twSyncLock = Nothing

  IntegerIterationCounter = Nothing

End Sub

Here are the results of our tests. All tests are for 1,000,000 iterations, and the results are in milliseconds per test run.

TEST 1

This test allows the ThreadPool to manage the total number of minimum and maximum threads on its own:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 17250.000 16562.500 13984.375 14875.000 15531.250 15640.625
System B 16234.375 20343.750 16718.750 16656.250 13906.250 16771.875
System C 17529.413 23174.478 17998.532 19030.594 17685.786 19083.761
System D 33734.590 33609.590 33281.463 33297.088 33328.338 33450.214
Average 21236.619

TEST 2

In this test, we limit the maximum number of threads to one per logical processor:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 13484.375 12987.500 14656.250 13437.500 14281.250 13769.375
System B 15906.250 19593.750 13953.125 15859.375 14265.625 15915.625
System C 17799.610 20312.852 25457.524 27521.648 17748.335 21767.994
System D 33203.337 33265.837 30765.821 31172.074 33203.337 32322.081
Average 20943.769

TEST 3

This test uses only one thread:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 13125.000 14093.750 15390.625 14046.875 14875.000 14306.250
System B 13031.250 11859.375 13000.000 14546.875 12015.625 12890.625
System C 16481.714 12681.850 14652.150 12728.762 14386.316 14186.158
System D 33343.963 21953.265 21687.638 22609.519 23625.151 24643.907
Average 16506.735

TEST 4

This test uses two concurrent threads:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 13656.250 14031.250 13515.625 14484.375 13953.125 13928.125
System B 17656.250 14453.125 17093.750 16828.125 18656.250 16937.500
System C 21673.297 22689.722 30649.108 20844.520 33792.205 25929.770
System D 22906.396 22922.021 22172.016 24515.781 25484.538 23600.150
Average 20098.886

TEST 5

Here we show four concurrent threads:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 27968.750 24359.375 24406.250 21515.625 22765.625 24203.125
System B 33078.125 23546.875 31687.500 34000.000 33953.125 31253.125
System C 30649.108 33510.733 31462.247 34011.127 24097.079 30746.059
System D 25390.787 25437.662 25484.538 25515.788 25609.538 25487.663
Average 27922.493

TEST 6

This test uses eight concurrent threads:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 25484.375 26031.250 26125.000 25312.500 25843.750 25759.375
System B 35812.500 36312.500 34875.000 21578.125 23328.125 30381.250
System C 34402.060 34730.443 34152.920 34726.065 34556.444 34513.586
System D 31234.574 31578.327 31328.325 31468.951 31359.575 31393.950
Average 30512.040

TEST 7

Finally, this test runs 16 simultaneous threads:

Test 1 Test 2 Test 3 Test 4 Test 5 Average
System A 25671.875 25562.500 25796.875 27156.250 26078.125 26053.125
System B 31750.000 13171.875 38953.125 43703.125 35500.000 32615.625
System C 33615.818 33710.465 33959.402 33974.959 34021.628 33856.454
System D 50422.197 52656.587 44015.906 50953.451 44859.662 48581.561
Average 35276.691

System A: AMD Sempron 3200 (1 logical x64 CPU), 1 GB RAM

System B: AMD Athlon 3200+ (1 logical x64 CPU), 1 GB RAM

System C: Intel Pentium 4 2.8 gHz (1 logical x86 CPU), 1 GB RAM

System D: Two Intel Xeon 3.0 gHz (2 dual core, HyperThreaded CPUs providing 8 logical x64 CPUs), 2 GB RAM

It is extremely important to understand the following information and disclaimers regarding these benchmark figures:

They are not to be taken as absolute numbers. They are taken on real-world systems with real-world OS installations, not clean benchmark systems. They are not to be used as any concrete measure of relative CPU performance; they simply illustrate the different relative performance characteristics of different multithreading techniques on different numbers of logical CPUs, in order to show how different processors can perform differently with different techniques.

Compared to our previous test results, it is very easy to see that atomic operations have a very high cost. Additionally, each active thread above and beyond the number of logical processors degrades performance. This is because the OS (I say "the OS" instead of "Windows" because this will apply to any multithreading OS that uses time slices to manage multithreading) has to perform quite a bit of work to manage each thread that runs per processor. It is also easy to see that the ThreadPool object does not make the best possible decisions regarding the number of active threads. While this is not evidence that the ThreadPool object is poorly written, it is evidence that it is wise to override its settings in order to tune performance to the application's needs, based on the type of operations occurring in the running threads. An application running threads waiting on asynchronous I/O, for example, will be able to sustain many more active threads than one such as this one that is performing raw computations. It would also be interesting to force a number of concurrent threads, instead of allowing ThreadPool to manage it. It is difficult to tell if the relatively static results beyond a few threads are because ThreadPool is deliberately throttling the number of running threads or if the SyncLock is creating a traffic jam that no amount of pooling can resolve. The only way to test this would be to rewrite the test to use a homegrown thread pool, and to possibly have each thread perform a much higher work-to-locking ratio. It is also interesting to see that the dual Xeon system still plods behind the other systems. It may be that with a much higher number of forced concurrent threads that the Xeon truly shines.

My next post will have extremely similar code, but use the Mutex object instead of SyncLock to manage the atomic operations.

J.Ja

About

Justin James is the Lead Architect for Conigent.

0 comments

Editor's Picks