I think multithreading in application development is the wave of the future — even though not all applications lend themselves to being multithreaded.
In a blog post on MSDN about symmetric multicore processing being a 'dead end,' the author makes a lot of excellent points, not the least of which is that all too many applications still need one big core to work effectively. However, the author fails to see where the CPU market is headed and that the days of Moore's Law applying to individual cores is over for the time being.
I have been working on a very nifty little application for an article I am writing that will take me about a month's worth of weekends. The application does some image editing, but the editing it performs is rather CPU intensive. The CPU crunch shows a rather intriguing pattern.
As you can see in this Task Manager screenshot (Figure 1), one of my cores (this is a Core 2 Duo system) is cranking pretty hard. The other core is spinning but not so hard. The difference between the two is that the hard working core (the one on the right) is running my application. The "lazy" core is me doing the rest of my work on the PC, as well as Visual Studio handling the overhead of my application running within it (I am still debugging). For example, the two big spikes on the "lazy core" are two times where UAC kicked in (I was moving items on the Program menu).
Figure 1 (Click image for full size)
What can I do? My first option is to leave the situation as it currently stands. Sure, the application is taking 10+ hours to run through a relatively lightweight operation, but it is rock solid on reliability, and the RAM usage is insanely stable, which reduces a ton of page file hits. My next choice is to try multithreading some portion of the work. As you can see, I have one core working hard, but one of them is not doing a thing to help me — there is a lot of untapped power in my hardware. The system is still extremely responsive, and I am not seeing any slowdown. Looking at the graph (and the process list), my application is hovering around 48% or so of CPU, which is nearly 100% of the core that it is on. Popping off a second thread would make the system really slow (since I would be taking 100% on both cores), but it may allow me to finish the task in half the time. That is a tough trade off to consider.
I think there is a middle road in this case. Looking at the workload, there is no one particular portion of the workflow that could be performed asynchronously. Even if there was, none of the operations are long enough for the cost of context switching to be offset. Now that being said, the overall operations are performed in near total isolation of one another, and while they work with a common point of memory, that access is fairly quick and is only a small portion of the functionality. In other words, it is a low contention area. So having identified only one potential contention area (and being a rare case of a block occurring), we have found our middle road.
I am going to try to split up the loop, following my general rule on the number of threads: single thread CPU usage * number of logical cores. So, if it takes 50% of one logical core to run the processing thread, you do not want more than four threads running on a system with two logical cores; this puts your workload at 100% of all cores. I often decrement that number by 1 to ensure that the base OS and other applications are not completely squeezed. After all, does your customer care that your application is as efficient as possible if it effectively renders his/her PC useless while it is running? Probably not.
In this case, however, I am going to try something different. I will usually split the loop evenly and have each loop do its work in perfect parallel. This time, I am going to have one thread marked as a special "stutter thread" (I am making up words here — I am not sure if there is a proper term for this). The work will not be evenly divided; the "stutter thread" will have only half of its "fair share" of work to do, and the other threads will have the other half evenly divided amongst themselves. In exchange for having less work to do, the "stutter thread" will only "work" 50% of the time. The other 50% of the time, it will remain idle, freeing up a touch more CPU for the OS and other applications. I will most likely use a semaphore tied to another thread (say, thread 0 signals the "stutter thread" on every other iteration). I could also use a monitor (thread 0 enters the monitor on every other iteration, and the "stutter thread" also tries entering the monitor). I think either approach would be viable.
I will let you know in a few weeks how this goes. (I am going out of town this weekend, so I will not be working on it.)
Justin James is the Lead Architect for Conigent.