Hardware

Web site scalability: Procedures are key

Having all the right hardware and software to enhance the scalability of your Web site doesn't guarantee a smooth process. You have to establish the correct procedures, too. TechRepublic's Kevin Brown explains why.


When someone asks about Web site scalability, pictures of expensive computer equipment, custom software, and even hordes of highly skilled workers may come to mind. While at least some of those things are necessary, there is one element often missing from the typical “scalability visions.”

The procedures for developing and maintaining a Web site solution can be just as important for achieving scalability goals. An actively managed Web site requires proper procedures for code changes, monitoring, load testing, content replication, and machine installs to ensure effective implementation.

Managing changes
Every programmer understands change management. Keeping a lock on changes helps increase the accuracy of estimates and allows the programmer to focus on the problem. But change management can also be an important function for scalability. Proper change procedures are necessary for any scalable Web site solution because they provide a tracking capability that aids in determining how well scalability is achieved.

Advertisement
IBM Corporation is the exclusive sponsor of TechRepublic's special series on Web Hosting. IBM's e-business Hosting gives you the freedom to customize an array of services into a solution that is shaped by your business, not ours. For more information, check out TechRepublic's Web Hosting Center, or visit IBM's e-business Hosting site

IBM Corporation is the exclusive sponsor of TechRepublic's special series on Web Hosting. IBM's e-business Hosting gives you the freedom to customize an array of services into a solution that is shaped by your business, not ours. For more information, check out TechRepublic's Web Hosting Center, or visit IBM's e-business Hosting site

Often, a new code change causes issues with the load that individual servers can manage. Other changes, such as configuration, system upgrades, and additional memory, can adversely affect the performance and scalability of the components that make up the entire Web site. Controlling and documenting change creates an important trail to aid in investigation of scalability issues.

Testing
Testing is also important for scalability. Every change to a complex environment requires testing to ensure that regression errors do not occur. (A regression error is the introduction of an issue into a previously problem-free area of the system.)

The part of testing that most directly applies to scalability is load or stress testing. Any high-traffic Web site must invest in suitable stress testing. I once inherited some code that, according to the owner, was tested for several thousand users. The source of this information was the programmer who had written the code in question. The interesting thing about this claim is that upon investigation, it turned out that the tests to verify the multiple-user functionality had been performed using no more than two machines. One of the machines was the Web server and database; the other machine apparently simulated over a thousand users! This is obviously not a valid stress test.

Proper stress testing requires a simulated environment that can accurately duplicate the system and the users of that system. The important point about stress testing is that it must be part of the procedures used to control changes. As new code is released, it must be tested in the stressed environment before it can be allowed on the production Web servers.

Systems monitoring
Hand in hand with stress testing comes the monitoring of systems. Proper monitoring assures that scalability is maintained. A complex system must be continuously monitored, and historical data must be kept in order to analyze potential weaknesses in the system. A Web site does not start out with the ability to handle the largest load of user traffic. Instead, it grows to that ability over time.

One key to that growth is the proper monitoring of critical performance statistics. As the traffic increases, the performance data can show the weak points in the system implementation and allow adjustments. As the adjustments are made, the performance statistics can be used to show proof of improvement or lack thereof.

You might want to consider looking into an ASP (application server provider) for system monitoring. Several companies, such as Luminate, offer monitoring services that handle all the collection and reporting of data. Using an ASP for monitoring can free valuable resources on the operations team to concentrate on making system adjustments instead of becoming slaves to all the various logs and reports related to performance. But no matter what methods you use to monitor the system, some monitoring must be in place in order to support and continue to achieve scalability goals.
TechRepublic is featuring a series of articles on this topic in every republic this month. If you’d like to see what your IT colleagues are doing with Web hosting, click here.
Replicating content
Any scalable Web architecture is going to be designed around the concept of adding additional Web servers in order to scale out the solution. Scaling out offers more capacity for the Web site than the alternative of scaling up, which involves adding more capacity to existing equipment. (For a detailed look at some of these hardware issues, see “Achieving scalability for your Web infrastructure.”) Adding multiple Web servers increases the need for proper procedures to handle content replication.

Regardless of the solution, the ability to increase the number of copies of Web site content dramatically increases scalability. From the operations perspective, the maintenance of multiple copies is more difficult. The best time to start thinking about content replication is early in the design before the need is urgent. There are many software products on the market to assist with the replication of files from one machine to another. Usually, these products will be more than adequate for the distribution of HTML and graphics files.

Where the procedures will be most needed is for the distribution of code changes. With interpreted languages such as Java, Javascript, Perl, and VBScript, the replication issues are less complex. However, with compiled code like C++ and Visual Basic, there may be additional issues such as restarting the Web server program. With either interpreted or compiled languages, procedures must be in place to handle the addition of new code.

Watch out for data corruption
In a highly interactive environment, a new piece of code on one machine out of a dozen could result in data corruption. Consider a new piece of code that uses a new field in the database. What happens if this code is mixed with the current version and both are running against the same database? Unless the Web site can afford to take down all machines for such upgrades, the answer to this question will drastically influence the way that upgrades are written and distributed to the Web farm. It is vital to maintain the scalability and integrity of the Web site by documenting proper procedures for replication of code and content to the Web farm.

Conclusion
A truly scalable Web site is a complex beast that can only be maintained by establishing procedures for every piece of the system. Without these procedures, the Web site will not be able to meet scalability requirements and may not function with acceptable uptime.
If you'd like to share your opinion, start a discussion below or send the editor an e-mail.

Editor's Picks