It would be nice to be able to install a new server fresh out of the box and hope that it will work. As Ronald Reagan used to say, "Trust, but verify." Here's how to verify your new server will work properly.
When I first got started in IT, I was working as a network administrator trainee for a large insurance company. Even though I was only eighteen years old at the time, I was very ambitious and did everything in my power to prove to my boss that I was ready for bigger challenges. I was finally rewarded for my hard work by being put in charge of deploying a new server with a brand new application. Although I had been given three weeks to complete the project, I did it in two days, rushing to get the job done so that I could impress my boss. For a while, everything was fine, but a couple of days later, the new server crashed. As it turns out, the new server had a defective processor. Although I had performed the deployment correctly, I had rushed through it. Had I taken my time I could have discovered the issue before it had a chance to cause a production server to crash.
Looking back on that incident, I can see how ignorant I really was. Experience over the years has made me realize that there was a reason why my boss had given me three weeks to complete the deployment. It wasn’t that it takes three weeks to load a network operating system and install an application. Instead, it was because there are a lot of tests that need to be done on a server prior to placing it onto a production network. You need to verify that the hardware is functional, that the application runs properly, and that the server can handle the demands that will be placed upon it. Here are the various tests that you should consider performing prior to deploying a new server.
Most of the time when I order a new server, Windows comes preloaded on it. I have never been a big fan of preloaded operating systems and all of the other garbage that gets pre-loaded at the factory, so I always reformat every volume on the entire system and load Windows from scratch. My reason is that by doing so, I know exactly what is installed on the server and there are no surprises.
Since I am going to reformat the server anyway, I like to start my testing process by doing a little informal testing. What I like to do is to order a couple of pizzas and call a few friends over for a video game night. While this probably sounds like an excuse to goof off (it is), there is some solid reasoning behind having a game night. Very few applications are as demanding as a cutting edge video game. If a new server has trouble keeping up with a video game or exhibits other types of problems, then it’s a sign that you need to watch for other potential performance or stability problems. The exception to this is video problems. Many people order servers with very low-end video cards, which were never designed for playing video games.
Unless you are going to give your new server the video game test, the first thing that you should do when you get a new server is to run a burn-in test. A burn-in is simply a series of tests designed to push the hardware to the max and use every available component. The idea behind this type of testing is that if you perform a series of hardware intensive tests for a period of twelve hours or more, any component that is defective would show up during the burn-in period. I recommend performing the burn-in test before you go through the trouble of reformatting the system and performing a clean installation of Windows.
There are lots of burn-in utilities available for free on the Internet. Figure A shows one particular burn-in utility, BurnInTest 4.0, running multiple simultaneous tests on one of my servers.
|Burn-in testing is designed to help spot defective components.|
Up to this point, you have unboxed and set up the server, performed a burn-in test, and maybe played a few video games. Here is where the testing gets a bit more intense and a lot scarier though. You need to test your server’s fault tolerant systems.
The exact technique that you would use to test fault tolerance varies widely depending on the hardware. A low-end server may have hardly any fault tolerant hardware, while a high-end server may offer full redundancy. For the purpose of this article, I will assume that you have a mid-grade server with partial redundancy.
I recommend beginning the process by launching a high-demand application. Video games work well, but so does burn-in software. Once the software is running, test the power supply by pulling the server’s plug. If your server has multiple power supplies, the second power supply should take over instantaneously. Plug the cord back in and try pulling the other power cord just to make sure that both power supplies are functional.
While you are at it, disconnect your UPS from the electrical outlet. Make sure that the battery is capable of powering the server. You should let the battery completely discharge so that you can see how long the server is able to run under battery power. You should also verify that the UPS is able to communicate with the server and shuts the server down gracefully before the battery goes dead.
Next, try testing your RAID array. Assuming that you have hot swappable hard drives arranged in a fault tolerant configuration, try pulling out one of the hard drives while the application is running (make sure that the application is running off of the hard disk array). What you are looking for is to make sure that the server doesn’t crash when you pull the drive. You should also pay attention to how much (if any) the application slows down with a drive missing. Wait a few minutes and then re-insert the drive. Make sure that the server recognizes the drive and that all drives in the RAID array resynchronize.
Finally, if your server has multiple network cards, try disconnecting them one at a time and make sure that you are still able to browse the network and surf the Internet. Your server should continue to have network access as long as at least one NIC has a good connection to the network.
At this point, it’s a good idea to perform a stress test against the server. A stress test (also called a load test) is a test that’s designed to see how the server will behave when subjected to a heavy workload.
There are a lot of different types of stress tests that you can run. Some focus on software and others focus on hardware. I’m trying to keep this article general enough that it will be relevant to a wide audience, so I don’t want to spend too much time talking about specific software stress tests. What I will tell you, though, is that there are tools available for stress testing Exchange Server, Internet Information Server, and SQL Server. Best of all, many of these tools can be downloaded from the Internet for free.
Regardless of what software is running on the server, I strongly recommend doing hardware stress test. A hardware stress test is similar to a burn-in except that your goal isn’t to find out if the system works, but rather how well it holds up under a load.
There are a lot of different utilities available for stress testing, but I personally like a shareware utility called Passmark Performance Test. What I like about this software is that it offers the standard burn-in type tests, but also has customizable tests that are much more relevant to a real world situation.
For example, many server level applications tend to be very database intensive. If you are expecting the server to have a heavy workload then it’s important to find out how well the server will be able to keep up with database requests. In high demand environments, database requests can flood the server more quickly than they can be committed to disk.
Just about any stress testing or burn-in software will clock the rate at which the hard disk can be accessed. What makes the Passmark software different though is that you can specify things like the size of the test file, the size of the data blocks, whether you want sequential or random disk access, and the actual access method, as shown in Figure B.
|You can configure the disk test to behave similarly to the database that your server will be hosting.|
This is important because these factors make a huge difference in the test’s outcome. For example, a large block size usually means that there are fewer requests and the disk will appear to perform better. Likewise, if the data is sequential, the disk will also tend to perform better. Many stress testing applications do not give you these kinds of choices and you are left wondering how valid the results really are.
Another test that’s important to run is a networking test. The fastest hard disk in the world won’t do you a bit of good if the network connection is a bottleneck for inbound requests. The Passmark software allows you to set up the server to accept requests and to then set up a workstation to flood the server with data. The software will then measure data throughput and the impact on the CPU.
Again, there are lots of different applications that can perform this type of testing. I often rely on NetIQ’s Qcheck. What I like about the Passmark software is that you can specify a variable block size, a port number, and the duration of the test as shown in Figure C.
|You can run tests to determine your new server’s network throughput.|
Two other types of tests that are important to run are memory and CPU tests. There really isn’t anything special about testing the server’s memory other than making sure that the memory works and that it can be accessed in a timely fashion.
When you test the CPU, you want to make sure that the CPU is functional and that it doesn’t bog down easily under a heavy workload. The Passmark software contains an excellent CPU test, as shown in Figure D, that involves running multiple, high-demand applications simultaneously. What’s nice about this test is that the workload will be automatically distributed across multiple processors, giving you a chance to find out how your machine will really perform under stress.
|You can test your machine’s processors to find out how they will perform under a heavy load.|
Up until now the server should have been running a default installation of Windows and maybe a couple of test applications. The reason is that many of the tests that you have been performing could potentially crash the server. If the server were running a live application with real data, the database could potentially become corrupted.
The time for cautious testing has passed though. Now it’s time to reformat the server and reinstall Windows. You should configure Windows as it will be configured when the server enters a production environment. You should also go ahead and install any applications that will run on the server. Once the server is all loaded up, you need to perform a few simple tests to make sure that the application is running correctly and that clients are able to connect to it.
I am assuming that the application has been thoroughly tested prior to placing it on this server. If the application is a commercial one such as Microsoft Exchange Server or Microsoft SQL Server, then the application has already been thoroughly tested and should run fine as long as you have all of the service packs installed and your server hardware meets the minimum requirements and is listed on the Windows hardware compatibility list.
If the application was developed in house, there are no guarantees as to its stability. Hopefully, the application has been thoroughly tested in a lab environment prior to your placing it onto a production server. Even if the application has been tested though, it has yet to be tested on your new server and you should schedule a testing phase before going completely live with the application. In such a situation, the testing phase usually consists of a limited deployment in which a small subset of users are given access to the application and are asked to start using it, but to report anything strange that may happen. This allows you to see how the application will behave in a real-world environment, but to do so in a way that minimizes the damage that will be caused should the application fail.
A dress rehearsal
Now that the server has been thoroughly tested, and you have loaded your network operating system and applications, it’s time to do what I like to call a dress rehearsal. The idea is that while a small subset of users are using the application, you should test the server under the absolute worst possible conditions to see how well it holds up. Before you perform these tests though, I strongly recommend backing up your server. I also recommend that those who are helping to test the server enter only test data during these tests to prevent the loss of real data.
Begin by running a disk test similar to what you did earlier. The difference is that this time you are trying to stress the hard disk while people are actually using the server. If the server has fault tolerant hard drives, then I recommend removing and replacing one drive at a time during the test. The idea is that you want to see just how much the server will slow down should a hard disk fail during an intense period of operations.
Next, try running the CPU test again. You are trying to make sure that the CPU does not bog down to the point that those using the server application are unable to work. When the CPU test completes, try doing the network test. Verify that your users can still access the server even when it is being bombarded with requests.
Finally, conclude the tests by unplugging the server’s UPS from the electrical outlet. Verify that the server shuts down gracefully before the battery runs out. Yes, you did do this test earlier, but at that time no applications were loaded on the server. Some applications take a really long time to shut down and it’s important to find out now whether or not you need to adjust the shutdown time threshold so that the applications and Windows have ample time to shut down.