Leadership

More multithreaded thoughts


The world of threads and parallelism is a topic that I keep coming back to, over and over again. In the same day, George Ou sent me a link to an interesting (yet all too short) article on an automatic parallelizing compiler, and I sent him a link to an article about FreeBSD slapping Linux silly in MySQL multithreaded performance. The two are much closer related than you would think.

The fact of the matter is, if you are not writing, or learning how to write parallelized code, you are quickly being cut off from one of the most important performance techniques out there. And unlike many of the traditional code hot-rodding tricks, this one is only getting more important as dual core machines become more commonplace, and quad core machines are poised to hit the mainstream very soon.

As useful as I feel that a traditional theory-oriented education can be, the fact is, no one ever has to know how to write a sort algorithm because every language out there already has an optimized algorithm built into its library. Unless, of course, you have a special case scenario in which you know your needs are different from what the compiler or library has built in. And that is where the theory knowledge can be a decided advantage.

Parallelism is a black art in code writing. Every piece of code is a special case scenario. Even worse, every hardware combination can alter the decisions you make. Up until recently, the vast majority of programmers could assume that their code would be running on a single CPU machine with more or less the same constant ratios of costs associated with multithreaded operations. Now, that assumption simply cannot be made any longer. A cheap desktop from the local shop with a single core x86 or x64 CPU is going to perform radically different than a system running a Core 2 Duo CPU, which will perform different from a 1U server with dual Xeon DP CPUs.

There really are no hard and fast, immutable laws out there on the performance end. Sure, there are some guidelines. Heck, I did a series on it a while back. Indeed, the overwhelming principle simply is: if the cost of context switching is less than the benefit gained, then multithread. Which is a really obvious statement, and of little help without knowing an awful lot about the runtime environment.

The idea of a compiler which automatically parallelizes code has seductive allure for me. But I know that it is doomed to failure in the current CPU environment. Maybe in five years which CPUs are scale about the same, and you can take a core count to determine the size of your thread pool at run time and use that, you will be fine. But right now, you would need to fork huge portions of your code and switch which chunk you will use at run time, because even the cost of lifting your entire code into one thread can take a heavy toll on your performance on a single core CPU, from my experience.

The FreeBSD versus Linux comparison illustrates this point perfectly. George started asking around, and he got responses back like, “oh, did they use such-and-such configuration?

About

Justin James is the Lead Architect for Conigent.

22 comments
mdhealy
mdhealy

A crucial distinction that must be kept in mind is that between high performance computing and high throughput computing. The classic analogy I've heard for this is a sports car versus a bus: for taking one or two people 60 miles (100 km), a sportscar is very fast (assuming you don't get a speeding ticket). But for carrying a group of 30 people over the same distance a bus is better. High-performance computing is the sports car -- doing one task as fast as possible. High-throughput computing is the bus -- doing as many tasks in 24 hours as possible. For high-performance computing, multithreaded applications are the way to go, but for high-throughput applications it can be much simpler to run multiple processes in parallel. It all depends on the nature of the workflow being optimized. There are jobs that can only be done with the fine-grained parallelism of multithreaded code, but this performance does come at a price. Many of our data-mining tasks can be done just as effectively with course-grained parallelism: divide the query and the database into chunks, then parcel-out the chunks into N processes. For really big jobs the processes should be given a low priority so they can run in the background for however long they take without taking CPU time away from higher-priority jobs. Another key issue in high-throughput computing is whether to have one big honking symmetric multiprocessing server with, say, 64 dual-core CPUs and umpteen gigabytes of shared RAM or have 64 computers with one dual-core CPU each. Lots of commodity boxen will cost much less and therefore deliver more bang for the buck, but system administration and job management will be a major pain. In my department we also found that with the "server farm" approach keeping data synchronized so all of them were using the same version of the search database was a major pain in the rear. We use our farms for certain jobs, mostly doing many many many similar searches of databases that get updated at predictable intervals. But for general purpose data mining, we find a big SMP box is vastly easier to use and therefore worth the added cost. We also have some custom FPGA systems that can do massively-parallel calculations, which are superb for certain tasks, but these lack the flexibility of general-purpose computers, so they are a supplement to our main servers rather than being our primary computational resources.

olgalucia
olgalucia

For all of you that think this is new, think again, mainframes have been doing that for decades, old programmers never thought about non parallel programming, in the contrary, this was a nuisance for us in the PC environment, it was time they copy the advance feature of mainframes in a smaller scale...

Justin James
Justin James

You are 100% right. The issue is that the client/server world has been mostly stuck in this single threaded mindset. At best, multiple threads are only used to the extent needed to implement a "cancel" button (and many apps prove that even a "cancel" button is too hard!). Meanwhile, the mainframe guys do this all day. It is one reason why the T1 CPU is so weird, it is really in the context of "a ton of light threads" but too many developers program to "a few heavy threads" to take advantage of it, so it lags in tests. J.Ja

TheGooch1
TheGooch1

We really do. Look at me, this is me, caring. So, do you care that I care?

MadestroITSolutions
MadestroITSolutions

I believe there is a difference between multi-processing and multi-core. Both have certainly been around for a while. I think the subject of the conversation though is how software (and us, developers) can benefit from having a multi-core system. From an Intel article titled "Dual vs. Multiprocessor chips: What's the difference?": "The primary target of the dual processor chip is computationally intensive workloads?those that benefit from high clocks and fast buses. While the Intel Xeon processor MP targets systems that are database and transaction capable" You can find the article here: http://www.intel.com/cd/ids/developer/asmo-na/eng/52515.htm?page=1 They have some interesting numbers as a result of benchmark testing. Make sure you look at those. So basically what we are saying is that it really depends on what you are doing. As far as development is concerned however, we still need to make efficient use of the resources, whatever the case may be. I don't really agree with the author of this blog simply because ultimately the runtime does not have any knowledge of the business logic. The decision to multi-thread is (in my opinion and experience) usually based on the task at hand. The runtime cannot handle concurrency issues because it does not know what a "concurrency issue" is in the context of the application being executed. Operating systems can handle multiple processes and threads but only at a very basic, raw level. I think They actually manage them based on their physical attributes more than their purpose (except for hierarchy situations such as layer management). Take the Intel article as an example, I can have a beautiful piece of hardware running a software which does not fully utilize its capabilities. I could get a far cheaper machine and get the same performance.

Justin James
Justin James

You are absolutely correct in that there is a difference. The multicore CPUs tend to run at a much lower clock speed. The key differentiator is that very few apps actually spike the CPU, but there tend to be a lot of threads going at once. Doubling the core count cuts the context switching by half (hopefully), so more clock cycles go into actually running the processes instead of managing them. In other words, multiproc is for apps that intensely pound the CPU in a few threads, multi core is for a lot of lighter threads. That's why I say that now is a great time to learn it. The CPUs are finally out there where it is cheap to do this like keep the UI fresh AND do your processing, or do a lot of asynchronus I/O or background processing/analysis as the user sits there. Things like Clippy (not suggesting he be revived!) can be done without brining the system to a halt, just with the switching costs. J.Ja

wdewey@cityofsalem.net
wdewey@cityofsalem.net

A lot of things appear to be going back to mainframe concepts (Citrix, thin clients). The only problem with this is people aren't willing to spend the money on the development and engineering that went into mainframes. They want a mainframe at a fraction of the cost and then wonder why they have bugs and inconsistencies. Bill

Justin James
Justin James

For some reason, people think it makes "sense" to run anotherr 1.5 GB worth of RAM using OS in a VM to isolate 30 MB worth of application... meanwhile, mainframes have been doing this since forever. I agree, in too many ways, we are trying to replicate the mainframe environment, but doing it with the overhead of client/server application ecosystems, and it is really, really wasteful. J.Ja

adornoe
adornoe

The mainframers, you know, those dinosaurs that wrote COBOL and a few other legacy languages that are now deemed obsolete and wordy and cumbersome, were the original users and coders of multi-threaded applications. As an example, the Tandem Computer platform (now a part of HP) was easy to code for and most applications were developed using COBOL85. Many of those applications are still in use and many more are still coming online. ATM machines use Tandem Computers in the background for processing many thousands of transactions concurrently. Credit Card authorization require the use the Non-Stop, massively parallel machines from HP. Those machines can process hundreds of thousands of transactions per minute. Obviously, if the number of transactions is that many, then the machine has to be able to manage a huge number of parallel transactions. But the neat part of the Non-Stop Tandem machines, now HP Non-Stop, was that the programmer didn't have to code for the parallelism. The OS and middle-ware handled to multi-threading for the applications. There was no need to learn specialized multi-threading skills. So, it didn't matter if the Non-Stop machine came with 2 or 4 or 8 or 16 cpus. The OS and the middle-ware handled the transactions and the load balancing on the cpus. .The programs, whether written in COBOL or other platform specific languages, didn't have to contain specialized multi-threading coding techniques.

Wayne M.
Wayne M.

I see the true benefit of multiple processors not in helping an individual application run faster, but to allow multiple applications to run side by side. I would prefer my virus scanners not to be written to use multiple processors. I would prefer they go there merry way using one processor and leave the other(s) to me. I would prefer that report generation tools not use up all of the processing power, but rather leave processors left over for normal queries. In short, I think I would prefer it if most applications limited themselves to a single processor, leaving another free to run different applications.

Scott
Scott

I would say that most applications should operate in at least two threads - one for GUI and another for processing. Nobody wants an application with a "frozen" screen and the words "Not Responding" next to it in the Task Manager. And both of your examples would benefit from being multi-threaded. Don't you want to use some of the other tools bundled into your virus software (like configure firewall settings) while a scan is taking place? Do you ever want to run more than one report at a time? The GUI should remain responsive while these other tasks that you've requested take place. At least, that's my preference.

MadestroITSolutions
MadestroITSolutions

Or any other serious application for that matter. Just imagine what Excel,Access or Outlook would be like if they didn't multi-thread?

C_Tharp
C_Tharp

Multiple processors are best used by an operating system that is designed to use them. Jobs are assigned to processors based on load and run completely on one processor. The operating system manages how jobs are scheduled, prioritized, and time sliced. The user benefits from the increased computational power of multiple processors with a limited increase in expense. Most applications do not need or benefit substantially from multithreading. It does not make sense to spend the effort to make them operate in that fashion. To do so increases their complexity tremendously with no appropriate gain for the user. The cost of development, maintenance, and enhancement is increased substantially. And, of course, the user must pay for this. There are few developers who understand the complexity of multithreading. They should work on operating systems and on sophisticated applications that truely need multithreading, not the typical business or personal applications which are the majority. Fortunately, the market will control this development. Software companies will not make the investment unless there is a substantial return. I don't think that we will have to worry about a big change in the near future.

cedkhader
cedkhader

Agree. MT Theory is much more harder to be used in SW Process.

Kirk W.
Kirk W.

I worked for one of the pioneers of SMP, and the philosophy that worked best was to allow the operating system to manage the parallelism. The true advantage of multiple processors, comes into play when you have multiple processes. If a single application tries to take advantage of multiple processors, it will more than likely bring the overall performance of the machine to slow crawl. Do software engineers need to know and understand multi-processing? Definitely, they need to know when and where it is appropriate so that they don't create a nightmare for the rest of us.

mdhealy
mdhealy

Was the "pioneer of SMP" for whom you worked perchance SGI? I've consumed Lord knows how many CPU-months on SGI systems over the last 8-plus years...

Mark Miller
Mark Miller

From a purely theoretical standpoint I agree with this notion. It would be better to have the VM decide what to multithread. With the way things are done now, multithreading is something that has to be designed into the app. from the get-go. I think it's going to take different expectations on the part of developers though. Most developers are trained to use a linear model of program execution. One thing happens after the next. The problem this causes is what if it would be best to multithread a portion of the app. that is CPU intensive? What if there are side-effects in that code? The side-effects don't matter if the code is executed in a linear fashion, but will likely cause problems if it's executed concurrently. I've had a little experience with this. Years ago I worked on an MFC app. with a mysterious bug. It was a GUI app of course. There was a database operation on it that was taking a long time (it would run for about a minute), but for some reason the UI was not locked during this operation. This did not occur with any other database operation in the app. The problem was that users could get impatient waiting for this operation to finish, and start selecting other operations from the pull-down menus, and crash the app. At first I looked for the obvious. I figured this operation must've been put on a background thread. I could not find any threading code (at the application level) related to this operation. The background operation appeared to start right when the database access code was executed. It was really odd and the only solution I could think of was to disable every menu option (grey them out) I could think of that would cause the app. to crash, and then re-enable them when it was finished. I didn't like this solution, but it was the only one I had access to. For whatever reason, it appeared to me that the library code was on its own deciding to run the database operation in the background. What was most likely happening was that the database layer was allowing the Windows message pump to continue to send messages to the UI. The reason I say this is that it wasn't as if the routine that initiated the database call was allowed to continue to execute after the call was made. That didn't continue execution until the call was finished. However, menu options could continue to be selected, and the message handlers were able to respond. It was multithreading of a different sort...more like the way Windows 3 handled cooperative multitasking. I have a vague memory of talking to a fellow developer about this, and he said something about how, yes, there had been situations where the database layer had mysteriously allowed the message pump to continue to send messages to the app. He couldn't explain why either. A bug perhaps. Anyway, this was a case of a low-level layer deciding to "multithread" something when nobody had given it explicit permission to do so, and no provision was made for what if another database operation was initiated during the process. If the VM is going to run code concurrently, then the expectation needs to be set, I think at the language level, that this is going to happen. I think that things at the system level would need to be concurrency-friendly as well, like database access, file access, maybe even memory access. Otherwise developers are still going to have to worry about concurrency issues. I think it would also be imperative for the VM to allow the developer to opt out of concurrency in some situations. Of course some discipline on the developers' part will be required. There will need to be a larger focus on encapsulation and loose coupling, to prevent side-effects from causing problems when things are run concurrently.

TheGooch1
TheGooch1

"And unlike many of the traditional code hot-rodding tricks, this one is only getting more important as dual core machines become more commonplace, and quad core machines are poised to hit the mainstream very soon" This applies in some cases, but in the last 2 companies that I have worked for, hardware is the last thing they upgrade, and only when it is so far behind the technology curve that is takes 10 minutes to open up a plain text email. Corporations are all about bottom line, e.g. "how much money did we save this quarter?" aka "next quarter can take care of itself" ). I do agree that in the future, auto-threadsafing apps at the compiler level and making it simple to auto-scale at runtime would be good features to have in the future. Its just that its more in the future for some than others. JohnG

MadestroITSolutions
MadestroITSolutions

I totally agree. Hardware is the last thing to get upgraded...

Justin James
Justin James

... it means that a slow business (one with a 5 year refresh cycle) will be having dual cores at an 80% deployment rate in 3 more years, since it has been about a year since the dual core Intel CPUs became pretty common. Three years coincides nicely with the major version releases of many, many applications. In other words, one more rrelease to be ready for even the slowest adopters to be overwhelmingly dual (or more) core. J.Ja

Dr Dij
Dr Dij

I've been ticked off for years that they have been increasing processing speed but not adding dual or more processors to the bulk of PCs out there. I think system stability is enhanced in addition to speed. And today's gui OSes use so much processing power. Even stupid things on web sites. On one of my office PCs, non-dual, somewhat old, I went to a developer website where they have scrolling text. For some idiotic reason they did this in a way that takes LOTS of processor time, my old system going to 90 to 100% in use. The system is really dragging, and this is the kind of stuff that multiprocessors will help with. I've seen other sites with scrolling text that don't max out my PC so must have been the way they did it was different. the multicore processors will speed things up. We've put dual (separate) processor workstations for certain heavy production job uses years ago, and it seems that even tho the jobs we run don't multitask, having the OS parcel out the job to the less used processor and having separate processor for system stuff speeds it up quite a bit, more so than simply having a very fast processor does. So end users should see quite benefit with new dual and quad core processors, tho of course not as much as if they were compiled to multiple processors.