Developer

Implementing multithreading in .NET: Two major factors to consider

Justin James shares an interesting reader question about multithreading in .NET. Check out Justin's detailed response, which includes a downloadable PowerPoint presentation about the Parallel Extensions Library and sample code.

 

A reader recently e-mailed me a really good question about multithreading, and I thought other developers might find our exchange useful. Here's the reader's question:

I recently learned about multithreading and found it's extremely interesting. I came across your articles about Multithreading applications and think they are very useful. However, I want more complicated examples and projects so that I can really explore this subject and can make full use of its functionality. Actually, I have several past projects that were programmed single-threaded, but would be much better off if programmed multi-threaded. I have a feeling that it's not easy to implement but would be easier if I had better examples. So would you like to share some of yours with me?

Here's my response (it's slightly edited, since the e-mail was in plain text format):

You are right that it is not easy to implement. However, it does not have to be as hard as it used to be, either.

You may want to check out my slide deck and example code from a presentation I have given on the Parallel Extensions Library. It contains some sample code and explanations in a format that is fairly generic and may help you to get going.

While the examples in there are simple, they represent the most common patterns of code that can easily be turned into effective, efficient multithreaded code.

In general, your initial challenges are going to fall into two major categories: thread management and data integrity. Thread management is getting easier and easier. In the .NET Framework, you can pull threads from the ThreadPool. ThreadPool is a factory that generates threads, where calls to it block if it has created a certain number of threads that have not yet been destroyed. But how to make sure that there are not too many threads running at any given time? After all, if each thread is able to consume 100% of a CPU core, having more threads running than there are CPU cores, will simply cause the OS to start timeslicing the threads, which will cause context switching and inefficiency. In other words, two threads on the same core will not take twice as long to finish; they will probably take two times plus another 10% or so. Three threads on the same core trying to be at 100% CPU usage will probably take 3.25 - 3.5 times as long to finish as one thread. My experience has been that the curve is pretty exponential... more than a few threads per core trying to get 100% CPU, and none of them will ever finish.

So, how to manage how many threads are running?

One way to do this is to have a Semaphore object shared amongst the threads. Before the thread starts to run, it tries to call the WaitOne method of the Semaphore, and when it is finished, it releases the Semaphore. Set the Semaphore's limit to the number of cores on the CPU (using the Environment.ProcessorCount property to determine it); this will keep your system from running more threads at a time than you have cores. At the same time, pulling the threads from the ThreadPool will ensure that you are not creating too many threads at the same time. Creating too many threads at once, even if they are not running, is an easy way to waste system resources since each thread consumes resources. Using a Semaphore, the general pattern will look like this:

static Semaphore threadBlocker;
static void Execute(object state)

{

    threadBlocker.WaitOne();

    //Do work

    threadBlocker.Release();

}
static void RunThreads()

{

    threadBlocker = new Semaphore(0, Environment.ProcessorCount);

    for (int x = 0; x <= 2000; x++)

    {

        ThreadPool.QueueUserWorkItem(new WaitCallback(Execute));

    }

}

There are other ways of going about this as well. One approach that I tried some time ago was to maintain a List<T> of objects. Each object represented the full state of a "worker" object. The worker object would be populated with data for when it executed, and when it finished, it would set a property to indicate that it was done. The main thread would scan that list of objects, and if the number of running threads was low enough, it would start another. To be honest, while this system worked, it was a nightmare to code and debug, and I do not recommend it in the slightest.

There is now an even easier way to accomplish this. Microsoft has the Parallel Extensions Library in CTP. This library makes using these types of patterns very simple. Instead of worrying about the thread management yourself, it takes care of that for you. However, you will still need to handle data integrity issues on your own. An article in the October 2008 edition of MSDN Magazine has some very good information on the topic.

Overall, the big issues that you will need to worry about with data integrity are race conditions and deadlocks. Race conditions are caused by multiple threads trying to update the same object at the same time, which will cause problems. Imagine the following piece of code:

int x = 5;
x = x + 10;

Now, what happens if Thread A and Thread B run this code at the same time? It could work just fine, or it could work incorrectly. How would it work incorrectly? Well, each thread does not execute the entire statement at once and keep the other thread from doing so. So, we could have the following order of operations:

  1. Thread A retrieves the value of x (5).
  2. Thread B retrieves the value of x (5).
  3. Thread A assigns x + 10 (15) to x.
  4. Thread B assigns x + 10 (15) to x.
  5. x is now equal to 15.

Hmm. Alternatively, the exact same code could follow a different sequence:

  1. Thread A retrieves the value of x (5).
  2. Thread A assigned x = 10 (15) to x.
  3. Thread B retrieves the value of x (15).
  4. Thread B assigns x + 10 (25) to x.
  5. x is now equal to 25.

The easiest, most common way to work around race conditions in the .NET Framework is to use "Critical Sections." In VB.NET, the statement is "SyncLock" and in C# it is "lock". Both statements take an Object as a parameter; and other critical sections (including this one in a different instance) trying to lock using the same Object instance will block until the lock is released, allowing only one critical section to run at a time. Our previous piece of code now looks like:

int x = 5;

object lockObject = new object();

lock (lockObject)

{

    x = x + 10;

}

Now, Thread A (or Thread B) has to 100% finish executing the contents of the block before any other thread can enter the block. There are other approaches to race conditions, and you may find a need for them as your projects become more advanced, but the critical section approach is more than adequate for probably 80% of your needs.

Of the remaining 20%, 19% of it can be covered by the Monitor object. Monitor has a method called Enter, and once the Monitor object has been entered, any calls to Enter will block until the thread that called Enter calls Exit. Like the critical sections, Monitor requires an object to be passed to it. So again, our code would look like this:

int x = 5;

object lockObject = new object();

Monitor.Enter(lockObject);

x = x + 10;

Monitor.Exit(lockObject);

What does Monitor give us that critical sections do not? Nothing, unless you need more fine-grained control over when the lock is ended. Some pieces of complex code can either need a lock for a long time or a short time, depending upon conditions not known until run time, such as a variable value. In those circumstances, Monitor is a better choice than critical sections.

The other really big concern in data integrity is a deadlock, which is when multiple threads have locked resources in a way than none of them can continue. For example:

Thread A:
Monitor.Enter(object1);

Monitor.Enter(object2);

//Do work

Monitor.Exit(object1);

Monitor.Exit(object2);
Thread B:
Monitor.Enter(object2);

Monitor.Enter(object1);

//Do work

Monitor.Exit(object1);

Monitor.Exit(object2);

If Thread A and Thread B both call their first statements and complete simultaneously, neither of them will ever be able to call their second statements — that's a deadlock. Being careful while writing your code and really thinking hard about how you are writing your code are beneficial. Deadlocks are frequently caused (in my personal experience) from a novice at threading trying to over engineer their locking strategy and getting too granular. Having code with nested locks usually indicates code that needs to be seriously examined.

Hope this helps!

J.Ja

Disclosure of Justin's industry affiliations: Justin James has a working arrangement with Microsoft to write an article for MSDN Magazine. He also has a contract with Spiceworks to write product buying guides.

———————————————————————————————————————————————————————-

Get weekly development tips in your inbox Keep your developer skills sharp by signing up for TechRepublic's free Web Developer newsletter, delivered each Tuesday. Automatically subscribe today!

About

Justin James is the Lead Architect for Conigent.

Editor's Picks