Enterprise application (EA) architects often face challenges in designing scalable applications that can accommodate a growing number of users and provide 24/7 availability. In this article, I’ll explain the meaning of scalability and availability in EA and explore different architectures for achieving them.
As an example, I’ll use a fictitious company, PetWebStore (www.petWebStore.biz), which is opening a pet store in North America and has a goal to become the largest Web-based pet store in the world. PetWebStore naturally wants its initial investment to be in the hardware and software that can support its North American customers. At later stages, the company wants to add hardware and software to serve customers in other regions. Essentially, the company’s goal is to scale its EA to a changing consumer base without interrupting 24/7 availability.
For this kind of application, I’d use Web server clustering architecture. A cluster is a logical group of servers running Web applications simultaneously and appearing as a single server to the world. The servers may or may not have communication with their peers in the cluster. You can dynamically add or remove servers to the cluster, depending upon the load (or customer base).
Load balancers are used to distribute customer requests among different servers in the cluster. These balance servers load by distributing client requests among multiple servers. Load balancers come in several varieties—hardware based, software based, or a combination of the two. Some manufacturers also sell firewalls and load balancers combined in a single piece of hardware.
Scalability is an application's ability to support a growing number of users. It’s a measure of a range of factors, including the number of simultaneous users a cluster can support and the time it takes to process a request. If PetWebStore takes 10 to 30 milliseconds (ms) to respond to one request, how long will it take to respond to 10,000 concurrent requests? Infinite scalability would allow it to respond to such a large number of requests in 10 to 30 ms, but in the real world it's somewhere between 10 ms and a logjam. Well-designed applications should be able to meet performance goals despite a growing number of users.
High availability can be defined as redundancy. In PetWebStore’s case, if one server fails while handling requests, other servers in the cluster should be able to handle those requests as transparently as possible. A failed server is removed from the cluster as soon as it fails so that future requests are not routed to the failed server. In EA failover, resiliency and availability are of the utmost importance.
In EA, we achieve scalability and availability by using load balancers and by clustering applications into different tiers.
Enterprise application tiers
Enterprise application at the server end can be divided into several logical “tiers.” These tiers are logical divisions of the application services and not necessarily physical divisions between hardware and software. In some cases, all these tiers may be running on the same machine.
This tier provides static content (static HTML pages) to the client/customer. This is normally the EA’s front end. A simple EA has a Web tier that can have one or more machines running Apache, Netscape Enterprise Server, or IIS. The load balancer passes requests to the Web tier.
The presentation tier provides dynamic content (i.e., servlets or JSP) to EA clients. Generally, the presentation tier comprises a cluster of servers (i.e., Tomcat, WebLogic, WebSphere, etc.) that hosts servlets and/or JSPs. If the cluster also serves static HTML pages for your application, it encompasses both the Web tier and the presentation tier.
The object tier provides Java objects (i.e., EJBs, RMI classes, JDBC pool, etc.) and their associated business logic to an enterprise application. A J2EE-compliant server cluster that hosts EJBs provides an object tier (such as WebLogic, WebSphere, Jboss, etc.).
By logically clubbing these tiers into one or many tiers, we can come up with various clustering architectures.
On the basis of the above-mentioned logical tiers, we have various architectures to cluster EAs. Selection of the architecture for an EA largely depends upon usage pattern and the type of the application. PetWebStore has static as well as dynamic content and huge transaction/e-commerce activity. So we see strong usage of all three tiers.
Single-tier clustering (basic clustering)
Single-tier clustering is the simplest form of clustering. In this architecture, every machine in the cluster runs all tiers simultaneously (Figure A). The load balancer distributes request to servers in the cluster. Basic clustering is easy to administer. Performance is high because all the tiers are collocated, so there’s no network traffic for intertier communication. It’s also very easy to scale up.
Essentially, the load balancer distributes the client’s requests to the server, in most cases through round-robin distribution. It doesn’t consider resource usage for load balancing. So in this architecture, servers may be unevenly loaded at some point in time, which can degrade server performance. And server failure during request processing can be visible to the client because the server can’t transfer requests to other servers in case of failure.
So in this architecture, opportunities for providing load balancing and failover resiliency are reduced. This architecture is less often used in enterprise application.
In two-tier clustering, three basic tiers are grouped into two logical tiers. Because we have more tiers than logical groups and our tiers can work only in a specific sequence, we have two ways of grouping three tiers into two with the conditions described above.
In this architecture, the Web and presentation tiers run on individual machines, working in conjunction in a cluster named the Web server cluster. An object tier is running on all machines simultaneously in the cluster (Figure B). In this case, the presentation tier and object tier are on different machines, so we have the opportunity to balance the load on the object tier. This is achieved by using “replica-aware” stubs; the application server creates these stubs during deployment of objects (EJBs). Replica-aware stubs know other servers in the cluster hosting objects and contain a load-balancing algorithm. All method calls to the object tier are load balanced by replica-aware stubs. So, essentially, these stubs are working as the load balancer between the presentation and object tiers.
|Two-tier clustering with tiers on individual machines|
Two-tier clustering is widely used across different enterprise applications. Let’s say we’re implementing a Web bookstore. How do visitors use the bookstore? They go to the home page, which is static or perhaps dynamic in content, so we’ll use the Web tier. The users will search for a book, so we’ll use the presentation tier to create a dynamic page. The users might read book reviews from other readers, which may be static or dynamic content. Then they may decide to make a purchase, so we'll use the object tier (EJBs, etc.) at this point.
From the usage pattern, we can see that most of the clients will use the Web and presentation tiers in a session, so we can combine these two tiers into one cluster. Because usage of the object tier is independent of the Web tier or presentation tier, we can keep the object tier on another cluster. In this architecture, we have two levels of load balancing, and we can achieve high scalability and availability for our bookstore-type application.
In Figure B, the Web tiers run on every server in the Web server cluster. The presentation and object tiers run in conjunction on every machine in the cluster. The load balancer distributes client requests to the Web server tier, which serves static content only. Requests for dynamic content are proxied via the proxy plug-in (PPI) to the application server cluster.
|Two-tier clustering with tiers on one machine|
In Figure C, the PPI does load balancing (round-robin distribution) between the Web and presentation tiers. But the presentation and object tiers are running on the same machine, so in this architecture opportunities for load balancing between the presentation and object tiers are reduced. Because of this, the architecture in Figure B is preferred.
In multitier clustering, every tier (Web, presentation, and object) runs on a separate machine, and every tier forms a cluster of its own machines. In this architecture, we have three levels of load balancing:
- Load balancing prior to the Web server cluster
- Load balancing between the Web and presentations tiers via PPI
- Load balancing between the presentation and object tiers via replica-aware stubs
Figure D contains the most complex and difficult-to-administer architecture we’ve yet discussed in this article, but it’s the most scalable and it’s highly available. For an application where each tier’s usage is different and there are large numbers of clients, this architecture may be the best fit.
In PetWebStore, the usage pattern would be something like this: The client comes to the home page and reads about the different species of pet and pet food supplies. This is largely static content, so use the Web tier. The client may then search for food supplies or check the availability of pets. This is dynamic content, so use the presentation tier. After searching, the client may decide to make a purchase. Use the object tier at this point.
In this application, every tier’s usage is different. We anticipate a large number of clients, so multitier architecture is the best fit for PetWebStore’s deployment. As we can see, for this application it’s likely that the largest number of clients will hit the Web tier; a lesser number will hit the presentation tier; and fewer still will hit the object tier. So the number of servers in each cluster may vary depending upon the usage pattern of each tier.