Scaling an Internet architecture
August 3, 2005, 10:12pm PDT | Length: 00:05:34
Any great website eventually runs out of capacity. Horizontal scalability with load balancers between the tiers offers redundancy -- keeping the site up and running.
Hi I'm Ted Cahall, Chief Information Officer and Senior VicePresident at CNET Networks, and today I'd like to talk about scaling anInternet architecture.
One of the big problems about building a great Internet siteis that, eventually you run out of capacity and most people start their siteout on one single box. In this case I'll talk about an open source solution orat least a free solution, where all the parts are available on the Internet;you download them, all you had to do was buy the actual piece of hardware. In thiscase let's say we're using a version of Linux which is free and on that versionof Linux we've put in an Apache Web Server, which will serve the static pagesoff a disc, and then we're using a Java Virtual Machine running some type ofJ2SE container for servlets on to a MySQL database to serve the pages.Eventually what'll happen is, we get enough users, enough people that want tocome to our site and we run out of capacity.
So how do we address that, and make it where we can figureout what ran out of capacity, and how do we take it to the next level. So acommon way to do that is to then take this same site here, and build it out,but move each one of these processes out on to a separate machine so then we'llbe able to scale those machines independently. And that would look somethingalong these lines which would be, we'd make an Apache tier which is our webserver. We then make that connect down to a Java tier, which would be ourapplication level, it could be PHP if we want it to which would stay up in theApache tier, but this is more of a classical three-tier physical architecture,then down to a data tier to where we're holding our data; for this we chose aMySQL database and we've now got three independent tiers, which we can thenstart to scale or we can at least add different types of horsepower in terms ofvertical scalability by adding CPU's, adding more RAM, adding faster discs etc.But eventually, even this scalability method falls down and we need to furtherscale the system.
The type of scalability I'll describe to you now is calledhorizontal scalability. Horizontal scalability means I'll take each one ofthese tiers and I'll move them out horizontally and scale them. Now to do that,I need a piece of hardware called a load balancer. So we'll look at thisinfrastructure and we've got the Internet coming in and the first thing it hitsis this box called a load balancer, and the load balancer has these virtual IPswhich has one IP address that fans out to multiple IP address in the tier belowit. So in this case it would be multiple Apache boxes which are the webservers. So there would be, let's say we took three initially, and we havethree Apache boxes and the load balancer although it looks like one IPaddressed to the Internet, is distributing this load across these three Apacheboxes.
Now the Apache boxes will serve the static content off theirlocal drives which were pushing out with an rsync or some other method, buteventually we want some dynamic content and that dynamic content comes throughthe Java Virtual Machine with a J2SE servelet container on it so then we knowwe need another load balancer because we'll need a scale. We've scaled theApache tier horizontally, we now need to scale the Java and J2SE tierhorizontally.
So let's say we'll start with again three boxes, and we putthe Java tier in, and they're connected to the load balancer, as well as theApache connects back into the load balancer to move towards the Java tier. Nowwe've got the Java tier scaled out horizontally as you can see, and thequestion is how do we get the data tier scaled horizontally. Well that's alittle trickier, but again we connect back to a load balancer and this pictureis more shown logically than physically, this mail will just be three differentVIP's in the very same load balancer, but I'm trying to give you a logicalrepresentation of how all this is connected, and we say we have three MySQLreplica slaves that hold this data. And the slave term is a MySQL term, I'm notjust using that. That's not some crazy computer science term that it's really avendor term so we've now got it to where we can pull the data from the JavaVirtual Machines up through the MySQL.
The question now is how do we update all three of thesedatabases. Well MySQL itself has a master slave relationship and the machinethat you write to is always referred to the master and through itsinfrastructure, automatically updates these slaves through a process calledreplication. So as long as all of the rights of the data through the tools thatmaybe some of the tech producers or the tools that some of the business peopleare using, they're writing data into this master. Maybe there's some feeds thatare coming from some merchants and these feeds also are going into this master database,MySQL is replicating that up to the slaves, the load balancer is distributingthe load to the MySQL databases, the Java tier is able to pull that and we'venow not only got horizontal scalability, but we have redundancy. If any ofthese boxes were to fail our site will still be up as the load balance willtake the failed box out of rotation, continue to distribute the load to theboxes that are still working and keep our site completely up and functional.
So, the key critical component to scaling out an Internetarchitecture is to have load balancers between the tiers that are scaled outhorizontally.