CXO

More multithreaded thoughts


The world of threads and parallelism is a topic that I keep coming back to, over and over again. In the same day, George Ou sent me a link to an interesting (yet all too short) article on an automatic parallelizing compiler, and I sent him a link to an article about FreeBSD slapping Linux silly in MySQL multithreaded performance. The two are much closer related than you would think.

The fact of the matter is, if you are not writing, or learning how to write parallelized code, you are quickly being cut off from one of the most important performance techniques out there. And unlike many of the traditional code hot-rodding tricks, this one is only getting more important as dual core machines become more commonplace, and quad core machines are poised to hit the mainstream very soon.

As useful as I feel that a traditional theory-oriented education can be, the fact is, no one ever has to know how to write a sort algorithm because every language out there already has an optimized algorithm built into its library. Unless, of course, you have a special case scenario in which you know your needs are different from what the compiler or library has built in. And that is where the theory knowledge can be a decided advantage.

Parallelism is a black art in code writing. Every piece of code is a special case scenario. Even worse, every hardware combination can alter the decisions you make. Up until recently, the vast majority of programmers could assume that their code would be running on a single CPU machine with more or less the same constant ratios of costs associated with multithreaded operations. Now, that assumption simply cannot be made any longer. A cheap desktop from the local shop with a single core x86 or x64 CPU is going to perform radically different than a system running a Core 2 Duo CPU, which will perform different from a 1U server with dual Xeon DP CPUs.

There really are no hard and fast, immutable laws out there on the performance end. Sure, there are some guidelines. Heck, I did a series on it a while back. Indeed, the overwhelming principle simply is: if the cost of context switching is less than the benefit gained, then multithread. Which is a really obvious statement, and of little help without knowing an awful lot about the runtime environment.

The idea of a compiler which automatically parallelizes code has seductive allure for me. But I know that it is doomed to failure in the current CPU environment. Maybe in five years which CPUs are scale about the same, and you can take a core count to determine the size of your thread pool at run time and use that, you will be fine. But right now, you would need to fork huge portions of your code and switch which chunk you will use at run time, because even the cost of lifting your entire code into one thread can take a heavy toll on your performance on a single core CPU, from my experience.

The FreeBSD versus Linux comparison illustrates this point perfectly. George started asking around, and he got responses back like, "oh, did they use such-and-such configuration?

About

Justin James is the Lead Architect for Conigent.

22 comments
mdhealy
mdhealy

A crucial distinction that must be kept in mind is that between high performance computing and high throughput computing. The classic analogy I've heard for this is a sports car versus a bus: for taking one or two people 60 miles (100 km), a sportscar is very fast (assuming you don't get a speeding ticket). But for carrying a group of 30 people over the same distance a bus is better. High-performance computing is the sports car -- doing one task as fast as possible. High-throughput computing is the bus -- doing as many tasks in 24 hours as possible. For high-performance computing, multithreaded applications are the way to go, but for high-throughput applications it can be much simpler to run multiple processes in parallel. It all depends on the nature of the workflow being optimized. There are jobs that can only be done with the fine-grained parallelism of multithreaded code, but this performance does come at a price. Many of our data-mining tasks can be done just as effectively with course-grained parallelism: divide the query and the database into chunks, then parcel-out the chunks into N processes. For really big jobs the processes should be given a low priority so they can run in the background for however long they take without taking CPU time away from higher-priority jobs. Another key issue in high-throughput computing is whether to have one big honking symmetric multiprocessing server with, say, 64 dual-core CPUs and umpteen gigabytes of shared RAM or have 64 computers with one dual-core CPU each. Lots of commodity boxen will cost much less and therefore deliver more bang for the buck, but system administration and job management will be a major pain. In my department we also found that with the "server farm" approach keeping data synchronized so all of them were using the same version of the search database was a major pain in the rear. We use our farms for certain jobs, mostly doing many many many similar searches of databases that get updated at predictable intervals. But for general purpose data mining, we find a big SMP box is vastly easier to use and therefore worth the added cost. We also have some custom FPGA systems that can do massively-parallel calculations, which are superb for certain tasks, but these lack the flexibility of general-purpose computers, so they are a supplement to our main servers rather than being our primary computational resources.

olgalucia
olgalucia

For all of you that think this is new, think again, mainframes have been doing that for decades, old programmers never thought about non parallel programming, in the contrary, this was a nuisance for us in the PC environment, it was time they copy the advance feature of mainframes in a smaller scale...

Wayne M.
Wayne M.

I see the true benefit of multiple processors not in helping an individual application run faster, but to allow multiple applications to run side by side. I would prefer my virus scanners not to be written to use multiple processors. I would prefer they go there merry way using one processor and leave the other(s) to me. I would prefer that report generation tools not use up all of the processing power, but rather leave processors left over for normal queries. In short, I think I would prefer it if most applications limited themselves to a single processor, leaving another free to run different applications.

Mark Miller
Mark Miller

From a purely theoretical standpoint I agree with this notion. It would be better to have the VM decide what to multithread. With the way things are done now, multithreading is something that has to be designed into the app. from the get-go. I think it's going to take different expectations on the part of developers though. Most developers are trained to use a linear model of program execution. One thing happens after the next. The problem this causes is what if it would be best to multithread a portion of the app. that is CPU intensive? What if there are side-effects in that code? The side-effects don't matter if the code is executed in a linear fashion, but will likely cause problems if it's executed concurrently. I've had a little experience with this. Years ago I worked on an MFC app. with a mysterious bug. It was a GUI app of course. There was a database operation on it that was taking a long time (it would run for about a minute), but for some reason the UI was not locked during this operation. This did not occur with any other database operation in the app. The problem was that users could get impatient waiting for this operation to finish, and start selecting other operations from the pull-down menus, and crash the app. At first I looked for the obvious. I figured this operation must've been put on a background thread. I could not find any threading code (at the application level) related to this operation. The background operation appeared to start right when the database access code was executed. It was really odd and the only solution I could think of was to disable every menu option (grey them out) I could think of that would cause the app. to crash, and then re-enable them when it was finished. I didn't like this solution, but it was the only one I had access to. For whatever reason, it appeared to me that the library code was on its own deciding to run the database operation in the background. What was most likely happening was that the database layer was allowing the Windows message pump to continue to send messages to the UI. The reason I say this is that it wasn't as if the routine that initiated the database call was allowed to continue to execute after the call was made. That didn't continue execution until the call was finished. However, menu options could continue to be selected, and the message handlers were able to respond. It was multithreading of a different sort...more like the way Windows 3 handled cooperative multitasking. I have a vague memory of talking to a fellow developer about this, and he said something about how, yes, there had been situations where the database layer had mysteriously allowed the message pump to continue to send messages to the app. He couldn't explain why either. A bug perhaps. Anyway, this was a case of a low-level layer deciding to "multithread" something when nobody had given it explicit permission to do so, and no provision was made for what if another database operation was initiated during the process. If the VM is going to run code concurrently, then the expectation needs to be set, I think at the language level, that this is going to happen. I think that things at the system level would need to be concurrency-friendly as well, like database access, file access, maybe even memory access. Otherwise developers are still going to have to worry about concurrency issues. I think it would also be imperative for the VM to allow the developer to opt out of concurrency in some situations. Of course some discipline on the developers' part will be required. There will need to be a larger focus on encapsulation and loose coupling, to prevent side-effects from causing problems when things are run concurrently.

Editor's Picks