Multithreading and parallel processing projects are of special interest to me because they present a unique challenge — that is, as a developer, I focus on the code and not peripheral issues like UI design. When I wrote the Mongoose engine that powers my Rat Catcher application, I used the .NET Parallel Extensions Library (PFx) in order to speed the execution of searches. While Mongoose is not computationally intensive, it does perform a large number of network requests that can have a long wait period, so running them in parallel can deliver substantial performance gains.
I recently looked into rewriting the Mongoose engine in a language other than C#. Even though I’m very happy with the existing C# implementation, I’ve been thinking about moving Mongoose onto a cloud provider, and I’m seeing enough of a price difference between Linux and Windows cloud servers to justify the rewrite. While using Mono is an option now that 2.8 has been released with support for PFx, I also thought it would be a nice opportunity to try Python or Ruby for this purpose. Along the way, I learned a bit about how threading works in these languages, and how it may affect a developer’s decision to use them for a project.
In Ruby and Python’s standard implementations (MRI and CPython, respectively), the languages make use of a Global Interpreter Lock (GIL). The GIL mechanism performs timeslicing and scheduling of threads. Here’s how it works: Each thread has exclusive access to the GIL for a bit of time, does some work, and releases it. Meanwhile, every other thread is on hold, waiting to get a chance to access the GIL. When the GIL is released, a random thread will get to access it and start running.
There are two major advantages to using this system. The first is that you can write code in these languages that use threading, and it will run on an operating system that does not natively support threading with no modifications needed. The second is that, because only one thread is running at a time, there are no thread safety issues, even when working with a non-thread safe library.
There are some major downsides, though. The biggest one is that multiple threads will never run at the same time. While your application may look like it is running in parallel, in reality, the process never has more than one thread, and it is just wildly bouncing around within one thread doing different things. This brings us to our second issue, which is speed. You will not see any speed advantage on multicore or multiprocessor machines because only one thread is running at a time; you will see a slowdown due to the context switching costs.
The use of the GIL makes a threaded application a bad idea in many (if not most) cases. Fortunately, there are options. For one thing, the GIL is not mandated by the language specifications. There are some implementations that do not use the GIL (JRuby and IronRuby, for example). Also, you can easily fall back on the process model that Ruby and Python both support, using the traditional fork/join mechanisms. While it may not be ideal (or possible) to use a different implementation or write your application to rely upon forking, it is good that there are alternatives to make truly parallel programs possible in Ruby and Python.
Disclosure of Justin’s industry affiliations: Justin James has a contract with Spiceworks to write product buying guides; he has a contract with OpenAmplify, which is owned by Hapax, to write a series of blogs, tutorials, and articles; and he has a contract with OutSystems to write articles, sample code, etc.