Imagine for a moment that it’s a Tuesday morning after a long, relaxing, three-day weekend. You come back into the office and discover that no one can get onto the network. You quickly try to see which servers are functioning as you ignore the constant ring of the telephone and the people who keep sticking their heads into your office to ask when the network will be back up. It looks like it’s going to be Monday, but on a Tuesday.

Anyone who’s managed a network for any length of time is all too familiar with this situation. In the past, when a single server was enough to run an entire office, a situation like this was fairly easy to diagnose. However, today there are very few things that can bring down an entire office. One thing that can cause such a catastrophe, though, is a failed DHCP server. In this Daily Drill Down, I’ll explain why a failed DHCP server is a big deal. I’ll also explain how you can troubleshoot and correct the problem.

What’s the big deal?
Right now, you might be wondering, “What’s the big deal about a DHCP failure?” Consider the nature of what a DHCP server does: It issues IP addresses to network clients. In the situation I described above, where everyone is coming back from a long weekend and the DHCP server has failed sometime during the course of the weekend, no one would be able to log into the network because they wouldn’t be able to get an IP address.

It’s obvious that if the DHCP sever failed at night or on the weekend, it would cause problems, but what would happen if the DHCP server were to fail during the middle of the day while everyone was logged on? Well, if a client is presently logged on to the network then they’ve already been issued an IP address. Keep in mind though that DHCP servers don’t usually give a client an IP address, they lease the address to the client for a period of time. Therefore, if a DHCP server were to fail in the middle of the day, the clients would probably be okay until the lease expired. What happens then would depend on the actual operating systems involved, but the client would probably relinquish the IP address lease and not be able to get another address or a lease extension, thus leaving the client unable to reach the network.

You can counter the total nightmare situation that I just described by having more than one DHCP server online. If one server fails, another server picks up the slack. Unfortunately, though, DHCP servers don’t share pools of IP addresses. Therefore, to provide ample fault tolerance through multiple DHCP servers, each server must have enough IP addresses available to service every network client.

Microsoft recommends using something called the 75/25 rule. As you might know, DHCP servers require you to implement a separate scope for each subnet. The 75/25 rule states that 75 percent of the IP addresses available through each DHCP server in a multiple DHCP server environment should be available to the local subnet. The remaining 25 percent of the available addresses should be for remote subnets. The idea is that clients should always be able to get an IP address no matter which DHCP server might crash.

DHCP failure prevention
While it’s important to have some troubleshooting techniques on hand that you can use to recover from a DHCP failure, it’s even more important to take precautions to prevent such a problem from occurring in the first place .Let me explain some preventative measures you can use to make sure DHCP keeps running.

One of the major causes of DHCP server failure is that DHCP servers may not be able to keep up with the demand. In such a situation, the server never actually drops offline or runs out of addresses, but it is simply so bogged down that it can’t deal with the requests as they come in. Fortunately, there are several things that you can do to improve the performance of a DHCP server.

First, look at the server’s hard disk performance. Remember that a DHCP server maintains a database of which IP addresses have been assigned and to whom they have been assigned. Like just about any other database, the ones used by the Windows 2000 version of DHCP are hard disk intensive. Therefore, make sure the server running DHCP has a fast hard disk, preferably a RAID array, so it can keep up with the demand.

Another performance related problem might be that the DHCP server seems sluggish. This may be related to a lack of bandwidth or a lack of processing power. One way to correct this is to disable the server-side conflict detection. DHCP contains a function that can check for the presence of an IP address on the network before it assigns that address to a client. By disabling server-side conflict detection, you can ease the workload on the server. If your clients are routinely being assigned IP addresses that are already in use, however, it might not be a bad idea to keep the server-side conflict detection enabled until you’ve fixed the problem.

This method for easing the burden on a DHCP server may sound counterproductive, but it might fix your problem: Try running the DNS services on the same server that is running DHCP. Of course, doing so consumes more processing power, memory, and hard-disk time than simply running the DHCP services by themselves, but you can free up some bandwidth. Remember that Windows 2000 uses dynamic DNS. Dynamic DNS makes an entry into the DNS database every time that a computer comes online. When a client comes online, it must obtain an IP address before it can make a DNS entry. Why not let one server take care of the entire process?

Troubleshooting a DHCP failure
There are many different types of DHCP failures. These failures range anywhere from the occasional client not getting an IP address to the infamous Blue Screen of Death. There’s no one set procedure for fixing all DHCP-related problems. Instead, you have to look at the symptoms of the problem and work from there.

A great place to begin is looking at the event logs. By default, Windows 2000 logs service-related events pertaining to the DHCP services. Therefore, if the DHCP services started behaving badly, it would show up in the event logs. You might be surprised by what you find in these logs. Even if a log entry seems vague, try going to and doing a search on the event number. Many times, there will be a Knowledge Base article explaining the event in detail with discussions on how to fix the problem.

One of the biggest challenges that you may face when diagnosing DHCP problems is the case of duplicate IP addresses. In this situation, the DHCP server tries to assign an address to a client, but the address is already in use on the network. What makes this an even bigger problem is that in many cases, the actual DHCP server is working fine. The actual problem might be that someone hard coded an IP address onto a network client or that another DHCP server could have an overlapping scope, or worse yet, someone could have set up an unauthorized DHCP server. You can get around the problem by implementing the server-side conflict detection. However, doing so is nothing more than putting a Band-Aid on the real problem. Even if you can prevent your DHCP server from duplicating the address, you need to figure out why the address is already being used.

Finding an address
The first step to find an address is to determine if the address is really in use. The easiest way of doing this is to open a command prompt window and ping the address. If the ping is returned, then you know that the address is in use and that the PC using the address is currently on.

Now comes the fun part, locating the PC. One way to do this is to enter the following command at the server’s command prompt:
ARP –A IPaddress

Just replace IPaddress with the TCP/IP address of the PC you’re trying to find. When you do, Windows will display some information about the IP address. Among this information is the physical address of the computer that’s using the IP address. You can see a sample output from this command below:
C:\>ARP -A
Interface: on Interface 0x2
  Internet Address      Physical Address      Type        00-a0-4b-05-a6-b6     dynamic

As you can see from the sample output, the command will return information on the physical adapter address and tell you if the address is static or dynamic. However, be cautioned that the static/dynamic listing isn’t always accurate.

Another option is to use the PING –A IPaddress command, again replacing IPaddress with the address of the target PC. Doing this works the same as any other ping operation, but it also returns the host name that’s associated with the machine. In the sample output below, I’ve pinged the same IP address that I ran the ARP –A command on earlier; notice that in the output, the command told me that the host name of this machine was HENRY.
C:\>ping -a
Pinging HENRY [] with 32 bytes of data:
Reply from bytes=32 time<10ms TTL=128
Reply from bytes=32 time<10ms TTL=128
Reply from bytes=32 time<10ms TTL=128
Reply from bytes=32 time<10ms TTL=128
Ping statistics for
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum =  0ms, Average =  0ms

Now you know the physical adapter address and the host name of the machine that’s using the unauthorized IP address. From here, I recommend using a protocol analyzer to track down the offending machine. If you don’t have access to a protocol analyzer or you need more help, try using the TRACERT command to narrow down the search. The TRACERT command traces the path that TCP/IP uses to communicate with the machine. You can see a sample of the use of TRACERT below:
Tracing route to over a maximum of 30 hops
  1   <10 ms   <10 ms   <10 ms
Trace complete.

In this sample, there was only one hop recorded. This means that the PC in question exists on my local subnet. However, if the offending PC was further away, TRACERT would have displayed a list of all routers that it had to pass through to access the PC. This would give you a clue as to the subnet on which the PC actually exists.

Once you track down the PC, check to see if the IP address has been hard coded. If so, simply switch the PC to use a dynamic IP address.

DHCP can be a very handy service to have running. It can minimize the amount of time it takes to administer TCP/IP addresses in a large networking environment. However, a failed DHCP server can quickly bring an organization to its knees. In such a situation, you need to know how to diagnose the actual cause of a DHCP failure and repair it.