As the need to distribute software systems among multiple systems has increased over the last 10 years, the system architect’s task has also become increasingly more difficult. I remember developing the first set of architectural recommendations for an early client-server system. These systems primarily consisted of fat clients running on fat servers on a local area network. The most expensive resource was time, so the standard recommendation was to open a database connection and hold it open until you closed the application. Doing so made systems perform extremely well, as long as you didn’t need more than a couple of hundred users. As the number of users increased, the load on the servers rose so dramatically that it was virtually impossible to scale the system.
.NET case study
This article continues a series concentrating on .NET development. Check out other titles in the series:
- Case study: Designing scalable presentation services
- Case study: Developing scalable and reliable .NET services
- .NET business services help power the WiredChurch system
When three-tier systems became the rage, it was necessary to change the data access methodology from “open once and hold” to “open and close as many times as required.” This methodology allowed increased scale, but at the cost of both real and perceived performance. It became clear to Microsoft that it would never be able to scale applications to Internet scale as long as its systems required synchronicity. Microsoft Message Queue Server (MSMQ) was released as an additional product but very quickly found its way into the operating system. Now, it is a critical element of the .NET strategy.
The curse of synchronicity
Whether most of us realize it or not, we no longer live in a synchronous world. In a synchronous world, everyone answers the phone when you call. Also, most work is accomplished in meetings and not via e-mail conversations or discussion groups. The problem with synchronicity is that you and at least one other person must be in the same place (physically or electronically) at the same time to get work done. A natural bottleneck occurs whenever work can occur only in a synchronous fashion.
Let’s look at it from a system’s perspective (see Figure A). If you’re running an online commerce system and every database lookup, every order, and every acknowledgement runs through your middle tier to a single database, you have a natural bottleneck. Maybe it performs well now, but if you’re really successful, you have no way to scale the system other than counting on hardware upgrades—a scale-up strategy.
|The database becomes the bottleneck as the system grows.|
The blessing of asynchronous processing
Let’s look at the online commerce example as an asynchronous implementation. In this implementation, as online orders are received from the Web site, orders are added to a queue that can be processed as database cycles are available (see Figure B). Because the time to place objects on the queue is imperceptible vs. the time it takes to process the transaction in the database, the perceived speed of the system is dramatically increased. The objects used to place orders in the queue can deliver an immediate confirmation back to the customer, such as “your order has been received.”
|Requests are placed on the queue and processed by data in the database when the next time slice is available.|
The other advantage of this queued configuration is that you can scale the back end in one of two ways. You can employ the same scale-up strategy from the synchronous design and simply add more power to a single back end. But the asynchronous-approach design gives you the opportunity to also employ a scale-out strategy. Instead of having a single back end that processes the queue, you can have multiple back ends picking off queue data and processing it.
It’s important to point out the primary disadvantage to this kind of system design. Asynchronous systems count on near-real-time access to data, not real-time access. For example, our online commerce customers would have to be content with placing an order based on our best current knowledge of stock levels and not an actual transacted exchange in which we committed inventory at the exact point in time that the customer placed the order. In reality, the cost to achieve this kind of precision would make the system too expensive to develop or deploy anyway—so why not take advantage of the scale benefits of asynchronous processing?
Asynchronous implementations in WiredChurch
Queuing theory was never that difficult to grasp, but implementing it using Visual Studio 6.0 and Visual Basic was somewhat of a black art. Now that queuing is built into the .NET Framework at a system level, the process is much more straightforward. But once you get the queue opened, how do you decide the format of the data you place into the queue? Before .NET, this decision was problematic. You had to write both the serialization and deserialization code that placed data on the queue at one end and pulled it off at the other. With .NET, you can either use the native XML objects to do the serialization, or you can place .NET objects in the queue directly, and the CLR will take care of the serialization automatically. .NET was built from the ground up to support the development of asynchronous, disconnected systems.
At WiredChurch.com, we architected the engine processing to take place asynchronously for the express purpose of allowing us to scale as the number of transactions grew. For example, the WorkFlow engine (implemented as a Windows Service) scans the Event database looking for event notifications, invitations, and reminders that need to be delivered in the near future. The Communications engine (also implemented as a Windows service) picks message objects off of the Event queue and delivers them. In our load testing, we could dramatically increase the number of objects we could pull from the Event queue by simply starting another instance of the Communications Engine Windows Service on another machine and pointing it at the Event queue.
In this particular implementation, we want a guarantee of delivery from the queue. By default, message queues are stored in memory. Thus, a machine failure could mean the loss of messages queued but never delivered. By simply setting the Recoverable property of a Message object, we can force MSMQ to create the queue on disk and give us the ability to guarantee delivery. We could also have used Transactional Queues if we needed the queue to participate in an externally controlled transaction, but chose not to wrap functions in transactions. By architecting the system to use queuing as its core, we were able to create a system with a great deal of deployment flexibility (from 1 to n boxes) and scalability (up to millions of transactions).