Software Development

A beginner's guide to threading in C#

Threading can increase your application's performance and scalability if it is implemented correctly. Find out the basic concepts and see how you can use them in your applications.

Using multiple threads can help you achieve greater performance, scalability, and responsiveness in your applications—but you need to be careful. This article begins a series on the tools and techniques involved in threading. I'll start with an introduction to the concept and a survey of some of the more common constructs and how to use them.

The yin and yang of threading
It's been said that one of the best things about Java is that it makes threading easy, but—at the same time—that one of the worst things about Java is that it makes threading easy. When Microsoft developed C#, it brought this ease-of-use dilemma to a whole new platform. There are more primitives to play with in C# than in Java, but the basic Java primitives of the  Thread object and synchronization monitors are there in similar form and function and provide more than enough steel to hoist yourself on your own petard. So be very, very careful in making the decision to utilize explicit multithreading in your application.

Why not to multithread
The first point to remember when deciding whether to avoid multithreading is that, unless you are doing weekend play-coding, do not use threading simply because it is cool. It gets hot soon enough, and if you’re not careful, your boss will, too. Second, you should not use multithreading to make things faster until you have proven to yourself (and hopefully a few others around you) that a single threaded implementation is unacceptably slow. And finally, before venturing into an explicitly multithreaded mechanism, remember that Microsoft provides an apartment model that allows an object, written as a single-threaded construct, to run in a multithreaded environment. So you may not need to explicitly code for multithreading. The apartment model is a subject for another article.

If not done right, multithreading opens a Pandora’s box of ill effects. With no apparent repeatability, values can turn to utter garbage. Counters can fail to increment. Your application can suddenly freeze. Resources such as database connections can unexpectedly close or become exhausted. Some of the most challenging puzzles in a senior developer’s career arise from sleuthing a threading issue. The big problem is that these puzzles usually take time to solve, which can have a serious effect on product delivery dates, or worse, on product reliability.

Why to multithread
You may have a good candidate for multithreading if your application has operations like any of the following:
  • ·        In series, can take an unacceptably long time to complete
  • ·        Can be made parallel
  • ·        May spend an appreciable amount of their execution time waiting for network, file system, or user or other I/O response

But before crossing the Rubicon, make sure that each of the above three circumstances holds.

If your code is fast enough, but you think you could make it really fast (you do have performance specifications, don’t you?), resist the urge. If you aren’t sure that you can make your operations parallel (such as performing simultaneous database updates into the same table, when your database does table-level locking), fight the temptation. If you don’t know whether your application is spending a lot of time waiting for input or output to complete, determine that first. Three threads, each performing a computation of pi to the millionth place, will actually take longer to complete than repeating the computation three times in the same thread. This scenario fails the third criterion above—there are no idle cycles during one computation that a second parallel computation might be able to use.

The one exception to this rule is that if you are writing for a multiprocessor machine, you might stand to gain by making suitable operations parallel, even if each of the operations is a CPU hog.

Basic thread management tools
Having provided ample warning and set the stage for when and when not to use multiple threads, I will now describe some of the tools you have at your disposal if you choose to do so.

Thread
The .NET libraries provide an object called System.Threading.Thread, which represents a single thread of execution. You can start a thread, seeking to accomplish a task in that thread while the current thread continues. This would be useful for an application that needed to print a document or save a large file but wanted to acknowledge the user’s request and return control to the user. We demonstrate this mechanism in Listing A.

We first create a method, SayHello, that does what we want to accomplish. Its signature must match that of the System.Threading.ThreadStart delegate. Note the Thread.Sleep(int numMillisecs) call in the SayHello method. This is a useful construct and will appear often in these samples.

In the Main routine, we create a new thread with a ThreadStart delegate made from the SayHello method, and call Start on that thread. The thread we created is started, and our main thread continues on to completion in this example.

Many times, you will have a slightly different task to perform in each thread and will want to pass each thread a parameter of some sort to differentiate its task from that of the others. While there are several reasonable ways of doing this, the most straightforward is to create a Task object that holds the thread, the unique parameter, and the work method that provides the ThreadStart delegate. From the work method, you can read the supplied parameter, as it is a member of the Task object and is therefore unique to that thread. By making the thread a public field, you have full access to all the thread’s members without having to write additional wrapper code. See Listing B for an example of this technique.

You can even provide a return value of sorts from the Task object by defining a field in the task to hold it, setting the value before the thread completes, and reading it from the thread that started the task after the task completes.

You can pause one thread, waiting for other threads to complete what they are doing. You might do this when you want to collect return values as described above or when you spread a database update across three separate threads but don't want to proceed until all threads are done. This technique is shown in Listing C.

Here, we build on the code from Listing A. This time, we launch two threads, each with the same task as before. Following the calling of both threads’ Start() methods, though, we call their Join() methods. Calling Join() on a thread causes the calling thread to pause execution until the called thread has completed. So the thread1.Start() method causes the main thread to pause until thread1 has completed. We then do the same thing to thread2. As a result, the main thread does not complete until both thread1 and thread2 have completed.

Here are two parting thoughts on this example. First, a thread may not call Join on another thread until that thread has been started. Second, there are two more forms of Join that allow specification of a timeout after which the calling thread will continue even if the called thread is still running.

Computer science frequently employs the concept of a watchdog—an entity whose responsibility is to ensure the correct function, or handling of incorrect functions, in another entity. A common pattern is the watchdog timer, usually responsible for making sure that another task completes in a reasonable time. Listing D shows a simple mechanism for implementing a watchdog timer.

After thread1 is started, we join with it, but provide a 10-second timeout. Since thread1 has a 15-second pause built in, it will still be alive when the join expires. The main thread tests thread1.IsAlive and, if it's still alive, terminates the thread.

Synchronization and monitors
Synchronization refers to the practice of ensuring that only one thread executes in a section of code at a time. While discussion of all of them is beyond the scope of this article, a surprising number of constructs must take place inside a single threaded block to be reliably safe. Unfortunately, most of them work fine almost all of the time if outside of such a block, so the old “If it compiles and I get the answer I expect, it’s right” mantra doesn’t hold here. This is part of why multithreading is so dangerous.

A monitor is the most basic synchronization construct. Any object can have a monitor associated with it, and no monitor can be associated with more than one object. Monitors have a “lock,” which may be acquired by only one thread at a time. It must be released by that thread before another thread can acquire it. You can guard a section of code by declaring an object that is visible to all threads, such as a class field, and having a section of code acquire the lock from that monitor before performing some operation and then release the lock when it completes. This construct is demonstrated in Listing E.

We declare an object, myLockObject, whose sole purpose is to provide a monitor for synchronization. In the SayHello method, we allow both threads to print “Hello” whenever they want. However, we control the printing of “Wonderful” and “World” with a monitor associated with myMonitorObject so that one thread must complete both prints before another is allowed to begin.

Two other techniques are available for accomplishing this mechanism—the lock() keyword and the MethodImplAttribute attribute. See Listing F for an example.

We replace the Monitor.Enter(…) and Monitor.Exit(…) constructs with a lock(…){ … } construct. These constructs are identical in effect—the latter is simply shorthand for the former. We also add a method, SayHello2(), which has an attribute attached to it, MethodImpl. This attribute specifies that the entire method is to be synchronized. This is equivalent to forcing the calling code to acquire a lock on the monitor associated with the type object that contains the method before it is allowed to make the call. This is cleaner than enclosing the method body in a lock(){…} statement. Note that the documentation defines the attribute as being called MethodImplAttribute, but its implementation has it called MethodImpl instead. According to the stated convention for declaring attributes, it appears that a developer at Microsoft may have goofed.

Summary
This article has covered a lot of ground. I have discussed the reasons for and against explicitly using multiple threads, as well as shown some of the primitive constructs you will need if you choose to do threading. I introduced the Thread object and explained how to run several threads to accomplish the task of your choice. I described the monitor concept and showed how to use it to achieve synchronization around a block of code. I also described two shorthand means of accomplishing the same thing in specific cases, the lock keyword and the MethodImpl attribute.

In future articles, I will describe several other basic constructs, implement a thread pool, and explore more advanced constructs, such as thread-local storage and overlapped I/O.
0 comments