When I first got started in IT, I was working as a network administrator trainee for a large insurance company. Even though I was only
eighteen years old at the time, I was very ambitious and did everything in my
power to prove to my boss that I was ready for bigger challenges. I was finally
rewarded for my hard work by being put in charge of deploying a new server with
a brand new application. Although I had been given three weeks to complete the
project, I did it in two days, rushing to get the job done so that I could
impress my boss. For a while, everything was fine, but a couple of days later,
the new server crashed. As it turns out, the new server had a defective
processor. Although I had performed the deployment correctly, I had rushed
through it. Had I taken my time I could have discovered the issue before it had
a chance to cause a production server to crash.

Looking back on that incident, I can see how ignorant I
really was. Experience over the years has made me realize that there was a
reason why my boss had given me three weeks to complete the deployment. It
wasn’t that it takes three weeks to load a network operating system and install
an application. Instead, it was because there are a lot of tests that need to
be done on a server prior to placing it onto a production network. You need to
verify that the hardware is functional, that the application runs properly, and
that the server can handle the demands that will be placed upon it. Here are
the various tests that you should consider performing prior to deploying a new
server.

Informal testing

Most of the time when I order a new server, Windows comes
preloaded on it. I have never been a big fan of preloaded operating systems and
all of the other garbage that gets pre-loaded at the factory, so I always
reformat every volume on the entire system and load Windows from scratch. My
reason is that by doing so, I know exactly what is installed on the server and
there are no surprises.

Since I am going to reformat the server anyway, I like to
start my testing process by doing a little informal testing. What I like to do
is to order a couple of pizzas and call a few friends over for a video game
night. While this probably sounds like an excuse to goof off (it is), there is
some solid reasoning behind having a game night. Very few applications are as
demanding as a cutting edge video game. If a new server has trouble keeping up
with a video game or exhibits other types of problems, then it’s a sign that
you need to watch for other potential performance or stability problems. The
exception to this is video problems. Many people order servers with very
low-end video cards, which were never designed for playing video games.

Burn-in

Unless you are going to give your new server the video game
test, the first thing that you should do when you get a new server is to run a
burn-in test. A burn-in is simply a series of tests designed to push the
hardware to the max and use every available component. The idea behind this
type of testing is that if you perform a series of hardware intensive tests for
a period of twelve hours or more, any component that is defective would show up
during the burn-in period. I recommend performing the burn-in test before you
go through the trouble of reformatting the system and performing a clean
installation of Windows.

There are lots of burn-in utilities available for free on
the Internet. Figure A shows one particular burn-in utility, BurnInTest
4.0, running multiple simultaneous tests on one of my servers.

Figure A

Burn-in testing is designed to help spot defective components.

Fault tolerance

Up to this point, you have unboxed and set up the server,
performed a burn-in test, and maybe played a few video games. Here is where the
testing gets a bit more intense and a lot scarier though. You need to test your
server’s fault tolerant systems.

The exact technique that you would use to test fault
tolerance varies widely depending on the hardware. A low-end server may have
hardly any fault tolerant hardware, while a high-end server may offer full
redundancy. For the purpose of this article, I will assume that you have a mid-grade
server with partial redundancy.

I recommend beginning the process by launching a high-demand
application. Video games work well, but so does burn-in software. Once the
software is running, test the power supply by pulling the server’s plug. If
your server has multiple power supplies, the second power supply should take
over instantaneously. Plug the cord back in and try pulling the other power
cord just to make sure that both power supplies are functional.

While you are at it, disconnect your UPS from the electrical
outlet. Make sure that the battery is capable of powering the server. You
should let the battery completely discharge so that you can see how long the
server is able to run under battery power. You should also verify that the UPS
is able to communicate with the server and shuts the server down gracefully
before the battery goes dead.

Next, try testing your RAID array. Assuming that you have hot
swappable hard drives arranged in a fault tolerant configuration, try pulling
out one of the hard drives while the application is running (make sure that the
application is running off of the hard disk array). What you are looking for is
to make sure that the server doesn’t crash when you pull the drive. You should
also pay attention to how much (if any) the application slows down with a drive
missing. Wait a few minutes and then re-insert the drive. Make sure that the
server recognizes the drive and that all drives in the RAID array
resynchronize.

Finally, if your server has multiple network cards, try
disconnecting them one at a time and make sure that you are still able to
browse the network and surf the Internet. Your server should continue to have network
access as long as at least one NIC has a good connection to the network.

Stress test

At this point, it’s a good idea to perform a stress test
against the server. A stress test (also called a load test) is a test that’s
designed to see how the server will behave when subjected to a heavy workload.

There are a lot of different types of stress tests that you
can run. Some focus on software and others focus on hardware. I’m trying to
keep this article general enough that it will be relevant to a wide audience,
so I don’t want to spend too much time talking about specific software stress
tests. What I will tell you, though, is that there are tools available for
stress testing Exchange Server, Internet Information Server, and SQL Server.
Best of all, many of these tools can be downloaded from the Internet for free.

Regardless of what software is running on the server, I
strongly recommend doing hardware stress test. A hardware stress test is
similar to a burn-in except that your goal isn’t to find out if the system
works, but rather how well it holds up under a load.

There are a lot of different utilities available for stress
testing, but I personally like a shareware utility called Passmark Performance Test.
What I like about this software is that it offers the standard burn-in type
tests, but also has customizable tests that are much more relevant to a real
world situation.

For example, many server level applications tend to be very
database intensive. If you are expecting the server to have a heavy workload
then it’s important to find out how well the server will be able to keep up
with database requests. In high demand environments, database requests can
flood the server more quickly than they can be committed to disk.

Just about any stress testing or burn-in software will clock
the rate at which the hard disk can be accessed. What makes the Passmark
software different though is that you can specify things like the size of the
test file, the size of the data blocks, whether you want sequential or random
disk access, and the actual access method, as shown in Figure B.

Figure B

You can configure the disk test to behave similarly to the database that
your server will be hosting.

This is important because these factors make a huge
difference in the test’s outcome. For example, a large block size usually means
that there are fewer requests and the disk will appear to perform better.
Likewise, if the data is sequential, the disk will also tend to perform better.
Many stress testing applications do not give you these kinds of choices and you
are left wondering how valid the results really are.

Another test that’s important to run is a networking test.
The fastest hard disk in the world won’t do you a bit of good if the network
connection is a bottleneck for inbound requests. The Passmark software allows
you to set up the server to accept requests and to then set up a workstation to
flood the server with data. The software will then measure data throughput and
the impact on the CPU.

Again, there are lots of different applications that can
perform this type of testing. I often rely on NetIQ’s Qcheck. What I like about
the Passmark software is that you can specify a variable block size, a port number,
and the duration of the test as shown in Figure C.

Figure C

You can run tests to determine your new server’s network throughput.

Two other types of tests that are important to run are
memory and CPU tests. There really isn’t anything special about testing the
server’s memory other than making sure that the memory works and that it can be
accessed in a timely fashion.

When you test the CPU, you want to make sure that the CPU is
functional and that it doesn’t bog down easily under a heavy workload. The
Passmark software contains an excellent CPU test, as shown in Figure D,
that involves running multiple, high-demand applications simultaneously. What’s
nice about this test is that the workload will be automatically distributed
across multiple processors, giving you a chance to find out how your machine
will really perform under stress.

Figure D

You can test your machine’s processors to find out how they will perform
under a heavy load.

Application integrity

Up until now the server should have been running a default
installation of Windows and maybe a couple of test applications. The reason is
that many of the tests that you have been performing could potentially crash
the server. If the server were running a live application with real data, the
database could potentially become corrupted.

The time for cautious testing has passed though. Now it’s
time to reformat the server and reinstall Windows. You should configure Windows
as it will be configured when the server enters a production environment. You
should also go ahead and install any applications that will run on the server.
Once the server is all loaded up, you need to perform a few simple tests to
make sure that the application is running correctly and that clients are able
to connect to it.

I am assuming that the application has been thoroughly
tested prior to placing it on this server. If the application is a commercial
one such as Microsoft Exchange Server or Microsoft SQL Server, then the
application has already been thoroughly tested and should run fine as long as
you have all of the service packs installed and your server hardware meets the
minimum requirements and is listed on the Windows hardware compatibility list.

If the application was developed in house, there are no
guarantees as to its stability. Hopefully, the application has been thoroughly
tested in a lab environment prior to your placing it onto a production server.
Even if the application has been tested though, it has yet to be tested on your
new server and you should schedule a testing phase before going completely live
with the application. In such a situation, the testing phase usually consists
of a limited deployment in which a small subset of users are given access to
the application and are asked to start using it, but to report anything strange
that may happen. This allows you to see how the application will behave in a
real-world environment, but to do so in a way that minimizes the damage that
will be caused should the application fail.

A dress rehearsal

Now that the server has been thoroughly tested, and you have
loaded your network operating system and applications, it’s time to do what I
like to call a dress rehearsal. The idea is that while a small subset of users
are using the application, you should test the server under the absolute worst
possible conditions to see how well it holds up. Before you perform these tests
though, I strongly recommend backing up your server. I also recommend that
those who are helping to test the server enter only test data during these
tests to prevent the loss of real data.

Begin by running a disk test similar to what you did
earlier. The difference is that this time you are trying to stress the hard
disk while people are actually using the server. If the server has fault
tolerant hard drives, then I recommend removing and replacing one drive at a
time during the test. The idea is that you want to see just how much the server
will slow down should a hard disk fail during an intense period of operations.

Next, try running the CPU test again. You are trying to make
sure that the CPU does not bog down to the point that those using the server
application are unable to work. When the CPU test completes, try doing the
network test. Verify that your users can still access the server even when it
is being bombarded with requests.

Finally, conclude the tests by unplugging the server’s UPS
from the electrical outlet. Verify that the server shuts down gracefully before
the battery runs out. Yes, you did do this test earlier, but at that time no
applications were loaded on the server. Some applications take a really long
time to shut down and it’s important to find out now whether or not you need to
adjust the shutdown time threshold so that the applications and Windows have ample
time to shut down.