Servers

A server virtualization project success story

Westminster College's VMware ESX-based server consolidation project is in place and running very smoothly. The college's CIO details their solution and discusses why it's a success.

Westminster College's recently completed virtualization project is the second part of what began quite a while back as an ad hoc way to retire some critically aging servers. The servers were still hosting Web applications that we were in the process of phasing out; as such, we didn't want to buy new servers and completely redeploy those services, so we put into place a couple of VMware ESX (3 & 3.5) servers and used PlateSpin's physical-to-virtual (P2V) software to remove the potential of hardware failure from the equation.

I'll start at the beginning

In early 2007, shortly after my arrival at Westminster College, it became apparent that my plan to phase out an existing portal application was going to take a whole lot longer than I had hoped. The supported services were intertwined in many different processes; in fact, three years later, we're still running one of the applications in production, but it's the last one. Supporting this portal application were a couple of really, really old servers that were well past their warranty expiration date. On top of that, completely redeploying the portal application was one of the last things I wanted to do since it was only tenuously held together, and the people that had implemented the solution were long gone and had left behind only basic documentation. I also wanted to reduce the number of servers we were running in our small data center; even the older servers were running at only a fraction of their capacity but still needed to be replaced on some kind of cycle and be plugged into electrical outlets consuming power.

The desire to move to newer hardware without breaking the bank, reduce electrical consumption, and not have to redeploy all of our existing services led to the phase one virtualization rollout. Once we had that solution in place, we ran in that configuration for a while. Over time, we virtualized a number of newer servers as well, also using the P2V method. As new services were brought on line, we generally deployed them on one of the virtual hosts.

The hosts were simple containers to house virtual machines and were not connected to a SAN; all of the storage was local. That said, these were Westminster's first steps into VMware and, they accomplished the necessary goals at the time.

On to the next steps

Over the years, I've become a big believer in the "virtualize everything whenever possible" motto. The great success of the first phase led me to decide to expand virtualization to encompass everything that we could, but I wanted to do so in a much more robust way.

Our initial foray did not implement any availability methods, which was fine for the purpose, but as we moved into our "virtualize everything" mode, we needed SAN-backed ESX servers and a bit more robustness. To achieve our availability goals, we wanted to make sure that we didn't have any single points of failure. To that end, everything is redundant, and we've deployed more servers than are necessary to support our current virtual workloads. We have room for growth, which we will need.

Again, we're a small environment, so the architecture is pretty simple, but here's what we have:

  • An EMC AX4 SAN - iSCSI, dual controllers, 12 x 300 GB SAS + 12 x 750GB SATA. Fully and 100% redundant.
  • 3 x Dell M600 blade servers, 2 x 2.66 GHz Quad Core Intel Xeon processors, 32 GB RAM each, 6 NICs each (chassis houses 6 x Dell M6220 switches - 1 for each NIC in each server)
    • 2 x NICs for front-end connectivity
    • 2 x NICs for connectivity to AX4 (iSCSI)
      • Each of these is connected to a separate Ethernet switch.
      • Each NIC connects to a different storage processor on the AX4.
      • Each storage connection resides on a different physical network card.
    • 1 x NIC for vMotion
    • 1 x NIC for Fault Tolerance
We're running 28 virtual machines across these three hosts. Of the processing resources we have in this three host cluster, we're using, on average, about 10% of the computing power available to us (Figure A), so there is plenty of room for growth, and we have no worries about performance if one of the physical hosts fails. On the RAM side, we're using just over 30% of the total RAM available in the cluster, but I think we can bring that down by paying more attention to how individual virtual machines are provisioned (Figure B). Figure A

We're using about 10% of our computing resources. (Click the image to enlarge.)
Figure B

We're using a bit over 30% of the RAM resources of the cluster. (Click the image to enlarge.)

In Figures A and B, note that there are two periods during which we experienced a problem with vCenter that affected statistics gathering. Also, while each machine has 32 GB of RAM, one of our hosts has Dell's RAM RAID capability turned on, which helps protect the host in the event of a RAM problem. As a result, that server reports only 24 GB of available RAM. Due to having host-level redundancy, we'll be disabling this feature during a maintenance window in order to have the benefit of the full 32 GB of RAM.

In Figure C, you'll see a look at the full infrastructure. The 50 and 51 are simply internal identifiers. Figure C

The whole ESX environment. (Click the image to enlarge.)

This summer, we'll make some changes to our environment to increase overall availability, including:

  • A migration from our single-server (physical) Exchange 2007 system to a multi-server (virtual) Exchange 2010 environment. The only service that will remain physical is unified messaging.
  • We're using SharePoint for a lot of stuff, including our public-facing Web site. Our existing SharePoint environment consists of two servers: a dedicated database server and the MOSS server running the other components. As we explore SharePoint 2010, we'll more likely than not migrate away from the physical SharePoint infrastructure as well.

Even if I have to add additional ESX hosts to support newer initiatives (though I don't think I will), the availability advantages are too great to ignore.

The virtualization project at Westminster exceeded all of my original goals. We've been able to very easily extend the life of aging applications, reduce power consumption, increase availability, and make a huge dent in the budget for equipment replacement in the data center.

Want to keep up with Scott Lowe's posts on TechRepublic?

About

Since 1994, Scott Lowe has been providing technology solutions to a variety of organizations. After spending 10 years in multiple CIO roles, Scott is now an independent consultant, blogger, author, owner of The 1610 Group, and a Senior IT Executive w...

7 comments
myrdhrin
myrdhrin

Hi, The memory and CPU stats are quite interesting but depending on the application model and software being used you normally need to take in account the I/O of those VMs too. Do you have any stats on the I/O for your platform? Thanks!

kalev
kalev

Just consider me a slight bit paranoid after reading about various breakins but it would seem that providing too much detail makes it easier for hackers. Thus your security software needs to be robust.

RGRinc
RGRinc

What metrics were used for measuring the economic impact of virtualization such as the drop in power consumption for computing and or cooling?

Michael Kassner
Michael Kassner

Yet, as a security nut. I see that everything is in one basket. I would love to learn what you did to provide privacy and protection for the members. My concern is that once a bad person breaches the network they have the keys to the kingdom. Whereas if it was not virtualized, that would not be so.

Scott Lowe
Scott Lowe

For just that reason, there are some key details missing from the diagram. These details don't affect the overview.

Scott Lowe
Scott Lowe

Michael, I'm running out of the office in a minute, so I'll post a couple of minor (and pretty basic, I know) things for now: * Both iSCSI storage networks are physically separate from our production network; only hosts with a need to be on the iSCSI network have access to it. Other ports are disabled. Our iSCSI storage networks don't route to anything. * The vMotion network and HA networks are similarly isolated. * We did not take significant steps toward isolating the VMs any more than we would have for physical systems but we are in the planning stages for implementing additional security mechanisms (another firewall layer, etc) to help improve the security of the data center as a whole. * Individual VMs are VLANd as necessary for their function. * Staying current with patches across both hosts (ESX) and VMs (Windows) I hope this provides some clarification. Scott

jsimonelli
jsimonelli

I always explain it like this. Every Blade/rackmount server you add to the cluster only acts as worker bee. Processor and Memory only, systems are added on demand to keep up with the demand of the load, iSCSI/SAN will take care of the data, and the VM's have different networks which make it more fluid.

Editor's Picks