The world of threads and parallelism is a topic that I keep coming back to, over and over again. In the same day, George Ou sent me a link to an interesting (yet all too short) article on an automatic parallelizing compiler, and I sent him a link to an article about FreeBSD slapping Linux silly in MySQL multithreaded performance. The two are much closer related than you would think.

The fact of the matter is, if you are not writing, or learning how to write parallelized code, you are quickly being cut off from one of the most important performance techniques out there. And unlike many of the traditional code hot-rodding tricks, this one is only getting more important as dual core machines become more commonplace, and quad core machines are poised to hit the mainstream very soon.

As useful as I feel that a traditional theory-oriented education can be, the fact is, no one ever has to know how to write a sort algorithm because every language out there already has an optimized algorithm built into its library. Unless, of course, you have a special case scenario in which you know your needs are different from what the compiler or library has built in. And that is where the theory knowledge can be a decided advantage.

Parallelism is a black art in code writing. Every piece of code is a special case scenario. Even worse, every hardware combination can alter the decisions you make. Up until recently, the vast majority of programmers could assume that their code would be running on a single CPU machine with more or less the same constant ratios of costs associated with multithreaded operations. Now, that assumption simply cannot be made any longer. A cheap desktop from the local shop with a single core x86 or x64 CPU is going to perform radically different than a system running a Core 2 Duo CPU, which will perform different from a 1U server with dual Xeon DP CPUs.

There really are no hard and fast, immutable laws out there on the performance end. Sure, there are some guidelines. Heck, I did a series on it a while back. Indeed, the overwhelming principle simply is: if the cost of context switching is less than the benefit gained, then multithread. Which is a really obvious statement, and of little help without knowing an awful lot about the runtime environment.

The idea of a compiler which automatically parallelizes code has seductive allure for me. But I know that it is doomed to failure in the current CPU environment. Maybe in five years which CPUs are scale about the same, and you can take a core count to determine the size of your thread pool at run time and use that, you will be fine. But right now, you would need to fork huge portions of your code and switch which chunk you will use at run time, because even the cost of lifting your entire code into one thread can take a heavy toll on your performance on a single core CPU, from my experience.

The FreeBSD versus Linux comparison illustrates this point perfectly. George started asking around, and he got responses back like, “oh, did they use such-and-such configuration?