Learn how to have .NET process LINQ statements in parallel with 13 characters, and how to use Tasks to create more complex multithreaded applications.
Last week I showed how to turn the common for and foreach loops into marvels of parallel processing and how to run blocks of code in parallel with minimal effort. This week, I explain how to have .NET process your LINQ statements in parallel with 13 characters worth of effort, and how to use Tasks to create more complex multithreaded applications. I know that you're intrigued by the 13 character statement, so we will start with that tip and then move on to Tasks.
The .NET Parallel Extensions introduces a technology called PLINQ, which stands for Parallel LINQ. PLINQ spreads the processing of your LINQ statements across the CPUs in your computer as appropriate.
Using PLINQ is a snap: In your normal LINQ statement, when you refer to your IEnumerable<T> object in the "in" clause, append ".AsParallel()" at the end (that's our 13 characters). As soon as you do that, LINQ will run the query in parallel if possible. This is known as declarative parallelism because it uses LINQ's declarative syntax.
Here's an example that selects the even numbers from a list:
static void ParallelSelect(int startNumber, int endNumber)
var numbers = new List<int>();
for (var counter = startNumber; counter <= endNumber; counter++)
var results = from item in numbers.AsParallel()
where item % 2 == 0
foreach (var result in results)
If this sounds too good to be true, here are four reasons why it isn't:
- If your code relied on LINQ always spitting back the results in the same order, you need to use AsOrdered to still make that happen. This does have a performance cost.
- It is fairly tricky to tell if your query will really run in parallel; the engine makes that decision based on a number of factors (you can force parallelism if you are convinced that it really does work better with it). MSDN has the full details about this issue.
- PLINQ only works with LINQ to Objects.
- You need to test, test, test; just because a query can use multiple cores does not mean that it will work better running in parallel. Multithreading has costs of its own, such as the time and overhead of firing up and shutting down threads, and it many cases, those costs outweigh the benefits. As a rule of thumb, the more CPU intensive the query contents, the more speed gain you will get out of PLINQ.
Tasks are the most complex part of the new .NET 4 multithreading. Tasks are a big enough topic that Microsoft groups the functionality under the Task Parallel Library umbrella. But don't let that scare you -- Tasks are really not that difficult. Tasks, which are very similar to the old Thread model, allow the code to be treated as a unit of work with a variety of control and communications options; this is a huge improvement over Thread.
The foundation is the Task class. A Task gets created along with a Func object, which represents the Task's code when run. One of the best benefits of Task compared to Thread is that, as a result of using Func, Tasks get proper output and proper input parameters. With Thread, all you could do was pass in a single state object (typed as Object to make matters worse), which would transport values in and act as a repository for any output.
In addition, Tasks can be queried for their status. Tasks can also have a CancellationToken object passed into them, which allows you to signal a Task to be shut down, and the code inside the Task can query that object to safely stop execution. This is much better than the old way of doing things, which was to simply abort a Thread.
Tasks support continuations, which is code that gets run when the task is finished (like a callback delegate). If you create your Tasks in a TaskFactory, it can create them in batches with some common factors and special tricks around continuations.
For more in-depth details, MSDN has a number of useful examples that illustrate these concepts very well.Tasks basics
You can create a task from the TaskFactory using the StartNew() method, or you can create one by calling the constructor. If you create it through the factory, it is started and running as soon as it comes out; if you instantiate it manually, you will need to start it with the Start() method. In either case, this is when you pass in the input parameters.
Once the Task is started, you can call its Wait() method, which blocks until the Task is finished; you can also have Wait() move on after a specified period of time, even if the Task is not done. When you need the output from the Task, the Result property has it. The neat thing here is that if you try to get Result before it is done running, the call blocks until the execution is finished. So you really do not need to keep checking the Task's status if you are ready for the output.Continuations
You can specify an Action object as a continuation. The continuation gets run when the Task is finished, regardless of the reason it finished (completion, cancellation, internal exception, etc.); you might think of it as a "finally" block for the Task. If you have a need for post-processing, this is a great way to do it and encapsulate it all within the Task structure.
If you are creating Tasks with the TaskFactory, you can specify ContinueWhenAny, which runs when any of the started Tasks complete, or ContinueWhenAll, which runs when all of the started Tasks complete. Again, this provides a useful organization of logic and structure that is not available with Thread.Task status and exceptions
You can poll a Task for its status with the IsCancelled, IsCompleted, and IsFaulted properties. For more granular information, the Status property returns a TaskStatus enumeration, which lets you know exactly where the Task is in the processing timeline. If a Task has thrown an exception, IsFaulted will be true, and the Exception property will contain the unhandled Exception. The unhandled exceptions get thrown when you are calling Wait(), WaitAll(), or WaitAny() on your Tasks, or if you try getting Result. If your Task could throw an exception, you'll need to wrap these calls in try/catch.Cancellation
You can give Tasks a CancellationToken, which can be polled to find out if the Task should be cancelled. If you use a CancellationToken, you look at the IsCancellationRequested property, and if it is true, your Task should take the appropriate steps to exit cleanly and quickly. CancellationTokens are created by the CancellationTokenSource object. You also use CancellationTokenSoure to issue the cancellation request, using the Cancel() method.
The new multithreading options in .NET 4 are quite exciting and, in my opinion, some of the greatest parts of the platform. The Parallel class is your best bet for retrofitting existing, serial code or getting a "quick hit" of parallelism in a project. PLINQ, while even easier at the code level, requires a bit more testing and twiddling to ensure that the performance is better. Tasks give you the most control possible over your threading and operations, but require the most work and learning to take full advantage of them. Without a doubt though, if you were intimidated by Thread, now is the time to take a fresh look at parallel programming on the .NET platform.
Disclosure of Justin's industry affiliations: Justin James has a contract with Spiceworks to write product buying guides; he has a contract with OpenAmplify, which is owned by Hapax, to write a series of blogs, tutorials, and articles; and he has a contract with OutSystems to write articles, sample code, etc.