Software Development

How to use the Parallel class for simple multithreading

In this programming tutorial about Parallel Extensions in .NET 4, Justin James focuses on imperative parallelism as presented by the Parallel class.

Multithreaded application development is a topic I have written about on TechRepublic a number of times. I published a seven-part series on how to write multithreaded code in VB.NET, and I discussed the Parallel Extensions (which shipped with .NET 4), but never with any code samples for whatever reason.

At the request of a reader who asked for some more up-to-date multithreading code, in this column I will highlight some features of Parallel Extensions in .NET 4 with hands-on code. (The code is from my presentation on Parallel Extensions that I've given in the Carolinas the last two years.) There is a lot to cover, so in this column, I will focus on what is called imperative parallelism as presented by the Parallel class (which is part of the System.Threading.Tasks namespace).

The Parallel class

The static Parallel class contains three very useful methods: For, ForEach, and Invoke. For and ForEach operate upon an Action object at their heart; Invoke works on an array of Actions. For and ForEach mimic the functionality of the loops for which they are named.

  • Parallel.For accepts a start boundary, an end boundary, and an Action<int> as arguments. The Action will be called once for each number between the start and the end numbers, and each of those numbers will be passed into the Action as its argument.
  • Parallel.ForEach works with an IEnumerable<T> and an Action<T> (where T is the same for both), and calls the Action once for each item in the IEnumerable, passing in that item to the Action.
  • Parallel.Invoke is a little less complex; it simply calls each Action in the array once.

For all three of these methods, the order of execution is not guaranteed. It may be completely random, or in order, or partially in order. If your code requires that it be run in a particular order, it is not a good candidate for parallel operation. Let's look at Parallel.For first.

A normal for loop usually looks something like this:

static void SequentialGeneration(int startNumber, int endNumber)

{

int iteration = -1;

DateTime startTime = DateTime.Now;

for (int counter = startNumber; counter <= endNumber; counter++)

{

iteration++;

System.Console.WriteLine("Iteration " + iteration + ": BEGIN");

long fibValue = Fibonacci(counter);

System.Console.WriteLine("Iteration " + iteration + ": " + fibValue);

System.Console.WriteLine("Time since execution began: " + new TimeSpan(DateTime.Now.Ticks - startTime.Ticks).TotalSeconds);

}

}

This loop is going to spit out the Fibonacci numbers for each item in the range of startNumber to EndNumber. Typical enough, right? Well, if we wanted to do this with the traditional threading model, we would need to do a lot of work to make it happen relative to the contents of the loop. We would need to create threads, start them with a delegate to a function, and maybe add in code using a Semaphore to keep our active thread count limited to the number of logical CPU cores in the system.

In another example I use that is nearly identical, the code around this loop goes from 18 LOC (including whitespace and braces) to a whopping 79 LOC, which involves two functions and a class with five properties, a function, and a constructor. That's a massive amount of code bloat! Even worse, because of the use of a delegate, the code is completely abstract and indirect -- it is very difficult to trace the execution of a piece of code with its caller. Every person I have talked to who has worked within this model really does not like it. With Parallel.For, we are only going to go to 24 LOC with no substantial increase in complexity. Here's what the code looks like:

static void ParallelGeneration(int startNumber, int endNumber)

{

object lockObject = new object(); //Needed for atomic operations on iIteration

int iteration = -1;

DateTime startTime = DateTime.Now;

Action<int> forLoop = counter => //Begin definition of forLoop

{

lock (lockObject) //Lock iIteration

{

iteration++;

}

System.Console.WriteLine("Iteration " + iteration + ": BEGIN");

long fibValue = Fibonacci(counter);

System.Console.WriteLine("Iteration " + iteration + ": " + fibValue);

System.Console.WriteLine("Time since execution began: " + new TimeSpan(DateTime.Now.Ticks - startTime.Ticks).TotalSeconds);

}; //End definition of forLoop

Parallel.For(startNumber, endNumber++, forLoop);

}

You might not be familiar with how Actions work or the locking that is happening, so we'll walk through the changes.

  • Introduction of the lockObject variable: This is needed because we want to make sure that only one thread is updating the value of iteration at a time to prevent what is called a race condition. Race conditions are rare, but they lead to data corruption.
  • The declaration for forLoop: Where we originally had the for statement, we now declare an Action<int>. Notice the use of the lambda syntax, they are the same thing. All we did was use the contents of the for loop as the body of the Action, and end that block with a semicolon since that is legally the end of the declaration statement.
  • The lock block: We wrapped the code that updates the iteration value in the lock block (SyncLock in VB.NET). The lockObject is the "key" to the lock. Two lock blocks using different keys can execute simultaneously, but if multiple blocks using the same key try executing at the same time, they go one at a time. This prevents the potential race condition around the iteration++ statement. A good rule of thumb is to use lock every time you want to update a variable. If you need more granular control over the release of the lock, use Monitor.Enter() and Monitor.Exit() instead.
  • The calling of Parallel.For(): Instead of having a for statement, we pass the start number, end number, and the Action<int> to Parallel.For().

As you can see, it required very little effort to convert our existing for loop to use Parallel.For(). If you are using .NET 4, you can do this right now and see nearly effortless performance gains with existing and new applications.

Parallel.ForEach is even easier. Here is the original code, which creates a list of numbers in order and then prints them to the screen:

static void SequentialGeneration(int startNumber, int endNumber)

{

var numbers = new List<int>();

for (var counter = startNumber; counter <= endNumber; counter++)

{

numbers.Add(counter);

}

foreach (var number in numbers)

{

Console.WriteLine(number);

}

}

And here is a version that runs in parallel:

static void ParallelGeneration(int startNumber, int endNumber)

{

var numbers = new List<int>();

for (var counter = startNumber; counter <= endNumber; counter++)

{

numbers.Add(counter);

}

Action<int> forEachLoop = number => //Begin definition of forLoop

{

Console.WriteLine(number);

};

Parallel.ForEach(numbers, forEachLoop);

}

This is about as easy as it gets, folks!

Parallel.Invoke() is a slightly more confusing idea, but just as easy in practice. Parallel.For and Parallel.ForEach are used when you have the exact same code that needs to be called multiple times with different input parameters. Parallel.Invoke is more useful for calling different pieces of code at the same time. For example, if you are reading data from a network location, calling a Web Service, and drawing something on the screen, this is a good way to do them all at once and save time. All you do is create an array of Action<> objects and pass the array to Parallel.Invoke():

static void InvokeExample()

{

Action action1 = () =>

{

var rng = new Random();

var endNumber = rng.Next(500);

for (var counter = 0; counter <= endNumber; counter++)

{

Console.WriteLine(counter);

}

};

Action action2 = () =>

{

var client = new WebClient();

var data = client.DownloadString("http://www.techrepublic.com");

// Store data somewhere

};

Action[] actions = { action1, action2 };

Parallel.Invoke(actions);

}

This will download from the URL, while simultaneously printing to the screen. It is a trivial example, but I think it is clear how this can be very useful in daily programming. Again, with only a light amount of effort to wrap the code into Action objects, we are getting parallel operation. Not bad for a few minutes' worth of work!

J.Ja

Disclosure of Justin's industry affiliations: Justin James has a contract with Spiceworks to write product buying guides; he has a contract with OpenAmplify, which is owned by Hapax, to write a series of blogs, tutorials, and articles; and he has a contract with OutSystems to write articles, sample code, etc.

About

Justin James is the Lead Architect for Conigent.

14 comments
Tony Hopkinson
Tony Hopkinson

this in my general work. Then in a new pet project, I was struggling... Cheers, I is gonna give it a go. Load up Multiple Lists from 2 - n databases (partitioned from paradox days, now in SQL Server) And then do a compare, deduplication, renumber and relink in preparation for a merge. Should be a head stretcher. :D

wrapper
wrapper

Thanks Justin for the cool thread. I haven't done much multi-threading lately since my clients didn't need such apps. However, I couldn't help but notice that when I run my supposedly single threaded apps I get all my CPU cores busy. I'm not sure what triggers it, but it seems that .NET 3.5 and 4.0 parallelize apps to some extent. So, I put the theory to the test, tried both of your code examples and changed the endNumber to 40 so that it would calculate til the 40th Fibonacci sequence. I had to modify the code a bit since you got the number of loops wrong in the parallel code example. And moved all the unnecessary write-to-console commands which crunched the process to 1/5 the time! The classic loop done it in an average of 19 sec, while the parallel loop done it in an average of 11 sec. Note however that the Fibonacci function itself is not parallized, it implements a classic recursion model. I observed the CPU usage very carefully, and found that the classic code pushes all cores at an average of 25% for 19 sec, while the parallel implementation pushes all cores at an average of 50% for 11 sec. Apparently, that's because we know what to parallelize better than whatever engine there that tries to parallelize the classic code. Note, however, that up til the 25th Fibonacci sequence, the classic code has been faster than the parallel code; only then the table started turning.

aikimark
aikimark

Since your original article was VB.Net, why change languages with this article?

dawgit
dawgit

Thanks. ;) Mow if this method could be applied to over Languages... ]:)

Tony Hopkinson
Tony Hopkinson

Aside from the elements in my xml document coming out in different order about which I could care less about it was slightly slower. But I'm not doing all the work yet, hardly any test data, and it's on my dev machine which is also my database server. Got some speed back by adding an attribute to my datamodel classes, so for the trivial ones I could turn parallelisation off. if (_parallelise) { Action foreachloop = item => { lock(sb) { sb.AppendLine(item.AsXml); } }; Parallel.ForEach(_items, foreachloop); } else { foreach (MergeBase item in _items) { sb.AppendLine(item.AsXml); } }

Justin James
Justin James

Yup, you'll notice some oddities with performance. At lower numbers, the threads finish so quickly that the performance hit to create threads with Parallel Extensions is higher than the gain from running parallel. I bet if you started at, say, 50 and ran to 100, you'd see that the serial code never has a performance advantage. I've also noticed that .NET code, at least in version 4 of the CLR, does *not* keep all of the work from the main process thread on the same core. It is definitely spreading out over the cores to begin with, but I have no clue how or why, or for what reason. It could be Windows 7, and not .NET, that is doing it. Perhaps W7 automatically parallelizes long-running math or loops? J.Ja

Justin James
Justin James

I think it was early 2008 when I switched from VB.NET to C#, both personally and for code samples in my articles. At the time, my only reason for the change was that I was working on a project which called for C#. Since then, I've gotten the impression (perhaps incorrect) that the majority of the readers here prefer C# to VB.NET. Also, up until .NET 4, this particular example didn't really work well with VB.NET, because it couldn't handle multi-line lambdas. In fact, the sample project I did AGES ago to really learn multithreading originally was in VB.NET, but then I converted the core processing to C# when Parallel Extensions was first released as a CTP, so I could replace the old threading with Par. Ext. J.Ja

Justin James
Justin James

... what you mean by "other languages". It will work in VB.NET and I bet it works in F#, and quite probably in any (if not all) .NET languages. But if you mean Java or Python (non-.NET Python), it would be nice but I doubt the devs will copy it any time soon. This was in development for a LONG time (the first time I played with the CTP was early 2008, or perhaps sometime in 2007), it would be a lot of work to do something similar for other systems. It's not the basic aspect of it (which is simple enough), it's the subsystem that does things like handle how many threads run at once, the cancellation and message passing system (which I didn't discuss), and a ton of other items which they put into the package that would be a lot of work to replicate. :( That being said, I know that Mono has this as well, or is planning to have it. That's what I was told about a year ago, at least. It's funny, I've written about this before, but it's not until you see the code that you realize how sweet this is. It's the first system I've seen that gets you from non-parallel to parallel REALLY easily, and allows you to retrofit existing code without massive headaches. J.Ja

Justin James
Justin James

Tony - Glad that you like it! It really is stupid simple. Unless you've got a quad core (or more) PC and a great disk I/O system in it, running two DB queries at the same time will indeed be slower than one at a time when the DBs are on the same machine. When you are in production you should see the expected advantages. Poke around a bit in the docs, look for "degree of parallelization" or something similar. I am not 100% sure if it works with the Parallel class, but it allows you to modify the number of threads used, which will be preferable to your if statement. :) J.Ja

wrapper
wrapper

Yeah, maybe it's W7. I've been using W7 x64 on my development machine. Yet, the multi-threaded-like performance doesn't only occur when you have math or loops. I've noted the behavior in all my different applications. I think it's something similar to what RapidMind's compiler does (RapidMind has been acquired by Intel!) http://software.intel.com/en-us/data-parallel/ It uses some algorithm to multi-thread the code across all available CPUs AND GPUs (Pretty cool for coders not familiar with OpenGL/OpenCL.) The first time I tired to run the application I used 50, but it seemed it would take too long, so I lowered the scale. I did however expect that classic code would run faster in smaller numbers, and that the advantage of parallelism would only show as you push for more processing demand. I guess because serially-executed code fills the CPU pipeline more efficiently at low processing loads. While when the CPU is overwhelmed, it takes advantage of parallel code to fill the pipeline more efficiently.

bart haesen
bart haesen

In my opinion it's always good practice to show snippets or small apps in both languages. There are some constructs in C# that are hard to translate to VB.NET.

dawgit
dawgit

I've seen where you had posted this concept before. (ok, it was a while ago) But that you're now laying it out was terrific, thanks. Anyway, it does look like a good way, in general to solve a lot of multi-threading woes. (and there seems to be many)

Tony Hopkinson
Tony Hopkinson

A lot of the blurb in terms of parallel queries is about how sql server optimises over multiple processors, which is an internal decision algorithm. In terms of the parallel class itself most of it is engineering around using collection (with it's inherrent lock for the enumerator) or an array which doesn't have that overhead. This was more about the cost of being parallel, than how parallel I could be. I think I may be guilty of trying to optimise too early, having read some of the blurb I can see some other ways to go if required. Consider my hand slapped, first get it working, that's going to be hard enough as it is.

Tony Hopkinson
Tony Hopkinson

C++, Ruby, Python, Asp.net, ECMA script, F# bugger it lets see the IL...