Web Development

SolutionBase: Troubleshooting a DNS server failure on Windows Server 2003

DNS is a vital service on your network. When it goes wrong, you've got big problems. Here are some strategies you can use to diagnose and repair DNS problems on your network.
This article is also available as a TechRepublic download.

DNS servers are true network workhorses. They are essential both for resolving Internet domain names, and for the functionality of the Windows Active Directory. When a DNS Server fails, it can be disruptive to say the least. In this article, I will walk you through some steps that you can use to troubleshoot DNS failures.

Narrow down the problem

When you first notice that names are not being resolved, the very first thing that I recommend doing is taking a few minutes to narrow down the problem. There are a wide variety of DNS related problems, so knowing what the symptoms are of the problem that you are having is going to be essential to helping you to resolve the problem quickly.

Initially, there are two things that you will want to check for. First, you need to determine whether all of the computers in the office are having name resolution problems, or if the problem is isolated to a particular network segment. It is important to keep in mind that fully qualified domain names can be cached, so it is important to try to resolve a computer name that the machine in question would not normally need to resolve.

The second thing that I recommend checking on is to see whether the computers on your network are having trouble resolving internal names, external names, or both.

Performing these two tests will give you a good starting point from which to begin diagnosing the problem. For example, if you determine that the problem is isolated to a particular network segment, then the next logical step in the troubleshooting process would be to check to see if there is a dedicated DNS server that services that segment, or if the users on that network segment use the same DNS server as everyone else. If the workstations on the segment that is having problems use the same DNS server as workstations on other network segments, then there is most likely a communications problem of some sort. There might be a router down or a firewall may have been inadvertently configured to block name resolution traffic.

The test to determine whether machines on your network have trouble resolving internal names, external names, or both is also designed to help you to figure out where you should start troubleshooting the problem. For example, if you are having trouble resolving the names of other computers on your internal network, then you most likely have a true DNS failure. If, on the other hand, internal name resolutions are working, then the DNS server is obviously functional. It could be that the forwarder is set incorrectly or that an external DNS server is down.

Problems with your DNS server

Let's pretend that you have done these two tests and you determine that both internal and external name resolutions are failing for every computer on the network. If that's the case, then all signs point to a problem with your DNS server.

As simple as it sounds, the first thing that I recommend doing is taking a quick glance at your DNS server to make sure that the monitor is not displaying the Blue Screen of Death or some other type of catastrophic error message.

If the DNS server appears at first glance to be functional, then select the Services command from the server's Administrative Tools menu to open the Service Control Manager. Scroll through the Service Control Manager and make sure that the DNS service is running. If for some reason the DNS service is not running, then you can right-click on the service and select the Start command from the resulting shortcut menu. Hopefully, doing so will start the DNS service and fix your problem. Even if it does though, you need to take some time and look through the server's System log for clues as to why the service has failed.

Assuming that the server appears to be functional and the DNS service is running, then the next step is to test for a communications problem between the DNS server and other machines on your network. The easiest way of accomplishing this is to go to one of the workstations that is experiencing name resolution problems and open a Command Prompt window.

When the Command Prompt window opens, enter the IPCONFIG /ALL command. This will display the IP configuration for each network adapter on the system. There are two things that you need to look for when this information is displayed. You need to make sure that the workstation itself has a valid IP address, and that the IP address of the DNS server is correct.

The reason why this is important is because in most companies IP configurations are assigned by DHCP servers. If the workstation has an invalid IP address or if the IP address of the DNS server is incorrect, then then problem might be with your DHCP server, not with your DNS server. If your DHCP server is failing or if it has been misconfigured and is assigning an incorrect IP address scope or an incorrect IP address for the DNS server, then the symptoms of the problem can mimic that of a DNS server failure.

If the workstation's TCP/IP configuration appears to be correct, then the next step in the troubleshooting process is to ping the DNS server's IP address. If the ping is returned, then it means that there is a functional communications path between the workstation and the DNS server.

If the ping fails, then it doesn't automatically mean that there is a communications problem. It could be that a firewall between the workstation and the DNS server is blocking ICMP traffic. One way of testing this is to ping a known good server (by IP address not by fully qualified domain name) that is in close proximity to the DNS server.

If the ping continues to fail and you have ruled out firewall restrictions as the cause, then there is most likely a communications problem of some sort going on. I recommend returning to the DNS server, opening a Command Prompt window, and entering the IPCONFIG /ALL command.

Doing so will display the TCP/IP configuration for each of the server's network interfaces. You should verify that the IP address that is bound to the server's primary network interface matches the address that network workstations are configured to use as their DNS server.

If everything checks out, then try pinging the server's primary IP address from the Command Prompt window. Since you are pinging the server's own IP address, the ping won't verify network connectivity. What it will do though is to verify the integrity of the TCP/IP stack. If this ping should fail then it means that either some of the files that make up the TCP/IP stack might be corrupt, or it could mean that the IP address is not being bound to the network adapter correctly.

If the self ping succeeds, then try pinging some other IP addresses on your network (especially the IP address of the workstation that was unable to ping the DNS server earlier). If these pings fail, then a communications problem is definitely to blame. You might make sure that the patch cable is connected securely to the server's network adapter. If that doesn't solve the problem, you might try plugging the DNS server into a different port on your switch, or replacing the network adapter and patch cable.

What if the ping tests are successful though? In a situation like that, communications are definitely functioning. I recommend going to one of the workstations that is having problems, and opening a Command Prompt window. Upon doing so, try using the NSLOOKUP command to resolve some names on your network to IP addresses. This might seem pointless at first since we have already established that name resolutions are failing, but I like doing the NSLOOKUP test anyway because it allows you to gather a little bit more information about the problem. For example, when you perform the NSLOOKUP, name resolution might fail entirely, or the name might be resolved to an incorrect IP address.

If the name used in the NSLOOKUP query is resolved to an incorrect IP address, then there are a couple of different things that could be going on. One possible cause is that the DNS server involved contains one or more typos in its records. Normally, this should only be a problem if the IT department manually creates DNS records though. This is fortunate because the only real way to test for this problem is to manually review the various host records and make sure that they are correct.

Another situation that could cause an NSLOOKUP query to return an incorrect IP address is that dynamic updates may be failing. Dynamic updates are typically used because the majority of the workstations on a corporate network typically receive their TCP/IP configuration from a DHCP server. As such, a workstation's IP address may change frequently. That being the case, the DNS server simply can not use static host records for these machines. Instead, dynamic updates are used to insure that a computer's host record matches its current IP address.

If dynamic updates are failing, then the DNS server database will contain outdated (often invalid) IP addresses for various host records. The easiest way of forcing an update of a host record is to go to one of the machines that has an outdated host record associated with it and open a Command Prompt window. At the command prompt, enter this command: IPCONFIG /REGISTERDNS This command should force a host record update. If updates continue to fail, you should make sure that your DNS server is configured to accept dynamic updates.

One other issue that can cause the DNS database to contain incorrect IP addresses is that zone transfers might be failing. Normally though, this will only be an issue if the DNS server is incorrectly resolving names from a secondary zone. If a zone transfer failure occurs then outdated host records will remain in the secondary zone database file.

If you suspect a zone transfer problem then you can try to manually force a zone transfer or try rebooting all of the DNS servers involved. If zone transfers have never worked between the two zones, then it could be that you have incompatible DNS server types. Although DNS name resolution itself is universal, some types of DNS servers use different compression formats or resource record types than others.

If none of these techniques help the DNS server to start supplying the correct IP addresses, then it could be that the DNS server has cached the incorrect IP addresses. You can manually clear the DNS server's cache by opening the DNS console, right-clicking on the DNS server in question, and selecting the Clear Cache command from the resulting shortcut menu.

External name resolution failures

In some cases, a DNS server will have no trouble resolving names on your local network, but may be unable to resolve Internet domain names. If that is the case, the problem is most likely related either to your forwarders or to a failure of either your ISP's DNS server or a router between you and your ISP's DNS server.

To troubleshoot this problem, open the DNS console, right-click on the listing for your DNS server, and select the Properties command from the resulting shortcut menu. When you do, Windows will display the server's properties sheet. Select the properties sheet's Forwarders tab and make note of the list of forwarding IP addresses.

You can try pinging these IP addresses to make sure that there is a communications path to them. If nothing seems amiss then you could contact your ISP to verify that they are still using those addresses for their DNS servers.

While you are looking at the server's properties sheet, you might also check the Root Hints tab to make sure that it is populated. The Root Hints tab lists the IP addresses of the root DNS servers. On a Windows based DNS server, the root hints are prepopulated, and the root addresses rarely if ever change. Even so, it's worth making sure that the root hints have not been accidentally removed.

2 comments
serendip80
serendip80

We had two workstations that could not reach our web site's new IP address. They kept trying to open the old IP. Everyone else was fine. Went thru basically your list of resolution attempts, and nothing worked nor illuminated the problem. We decided to force the address for the web by putting an entry in C:/Windows/system32/drivers/etc/hosts - where we find that there was already an entry forcing the web site to the old address. Deleted the entry, and all fixed.

rmaleshri
rmaleshri

Excellent resource for MCSE. author has explained it clearly and precisely about DNS troubleshooting.