Network Address Translation, or NAT for short, has become one of the most common networking technologies over the last several years. NAT technology, which is designed to compensate for a global shortage of IP addresses, is used in everything from Microsoft's Internet Connection Sharing component to all of the consumer grade broadband routers that crowd the shelves at the electronics stores. What's odd about NAT though is that even though it is a mainstream technology, and consumer grade NAT enabled devices are designed to be easy to use, NAT can be especially tricky to troubleshoot. In this article, I will discuss some troubleshooting techniques that you can use to solve NAT related problems.
Before I begin
Before I get started, I want to mention that there are countless varieties of NAT. Some common variations include static NAT, dynamic NAT, overloading NAT, and overlapping NAT. Furthermore, every NAT enabled device has its own set of requirements and its own way of doing things. For example, Windows Server 2003 supports both the Network Address Translation Service (part of RRAS) and Internet Connection Sharing. Although both of these services are based on NAT technology, they couldn't be more different. Table A below illustrates some of the differences:
The point is that there are huge differences in the way that NAT is implemented, even among NAT products from a common manufacturer. Even so, there is some common ground that all NAT enabled products have in common. It's this common ground that I will focus on in my troubleshooting techniques.
How NAT works
Before I begin discussing troubleshooting techniques, I want to briefly discuss how NAT works. After all, it's tough to troubleshoot something if you don't understand how it works. Since there are so many different varieties of NAT, and I have to pick one, I am going to focus my discussion on overloading NAT since that seems to be the most common type of NAT implementation. I will keep this discussion as general and non product specific as possible.
The idea behind overloading NAT is that the network contains more hosts than it has public IP addresses for. Since each node on the network requires its own IP address, the nodes on the private network will typically either be assigned non-routable addresses or addresses that the company doesn't actually own. In either case, these addresses can not be used on the Internet. Instead, the company's one legitimate address is assigned to the NAT router. It's the NAT router's job to make all outbound traffic appear to have originated from the company's one valid address, regardless of what address the host that sent the traffic is actually using. When the recipient replies to the message, the reply is sent to the NAT router, and the NAT router must figure out which computer on the private network the reply was intended for, and forward the reply accordingly.
There are all sorts of variations to this type of NAT configuration. For example, sometimes a company may have more than one legitimate address that they can use. The basic concept remains the same though.
To understand how NAT accomplishes this feat, you need to understand a little bit about how the TCP/IP protocol uses ports. There are actually two types of ports used by TCP/IP. The type of port that you hear about most often is the destination port. The destination port is the port number used by the Web service that the host is trying to communicate with. Destination ports are typically well known port numbers ranging from zero to 1023. For example, the HTTP protocol uses port 80 to access Web site content. Although there are a few exceptions, typically any time a computer is using HTTP to access a Web site, port 80 will be used.
The source port works very differently though. Whereas the destination port tends to be consistent and well documented, the source port is for the most part random. If you've ever shopped on the Internet, you've probably noticed that the online store remembers who you are and what items are in your shopping cart as you move from one screen to the next. Any given time, there probably also dozens of other people shopping at the same store. So how does the store differentiate you from other shoppers, in retain your identity as you move from page to page?
There are a couple of different ways the Web developers can design a site so that it remembers the user has that user moves from page to page. One of the most common methods though is to treat a user's connection as a session. This is where the source port number comes into play. Since the source port number is usually random, it can be used to keep track of a user session and to differentiate one user from another during an online session.
Now that I've talked a little bit about how ports are used, let's go back to talking about NAT. When a host of the private network needs to communicate with the outside world, the host first checks to see if the destination server is on the local network. Assuming that it isn't, the host forwards the outbound packet to the default Gateway. The default Gateway is usually the NAT device's internal network address.
Most NAT devices are multihomed. This means that they have two different network interfaces. One interface connects to the private network, while the other interface connects to the Internet. Each host on the local network would normally have its default Gateway setting configured the point to the NAT interface on the private network.
Once the outbound packet reaches the NAT device, the NAT device must perform the translation on the packet to make a packet appear to have originated from the company's public IP address. This public IP address is assigned to the NAT device's interface is connected to the Internet.
When the NAT device receives the outbound packet, it writes the sender's private IP address and source port to an address translation table. At this point, the NAT router replaces the computer's private IP address with the company's one and only public IP address. The NAT router then replaces the computer's source port with the source port of its own. The idea is that even though many different computers will be sending packets through the NAT device, and all of the outbound traffic will bear the same IP address as a result, the source port number will always correspond to the host that actually set the packet. Keep in mind for the outbound packet will now bears the host's original source port number, but rather a source port number that is assigned to the packet by the NAT router.
When a reply comes back, the reply packet contains a reference to the source port number assigned by the NAT router. When the NAT router receives the reply, it references the source port number against the address translation table to see what the original source port number was. Once the original source port number is determined it can be cross-referenced with the IP address of the computer that originally sent the packet that has been replied to. Once the NAT router has this information it is then able to forward the reply to the correct host on the local network.
There are two primary areas in which NAT can malfunction. You can have problems with the connection from the client to the NAT router, the connection from the NAT router to the Internet. In most cases, diagnosing the cause of a connection problem between the NAT router and the Internet is simple. The problem is almost always related to either a physical connectivity problem or an authentication problem. Most NAT routers offer a status screen within their user interface. This screen will almost always tell you whether or not the router has successfully connected to the Internet, and if not, then why.
By comparison, there are a whole lot more reasons why a connection between a client on your local network and the NAT router might fail. If you're having problems with clients on your network not being able to access the Internet, and you confirmed that your NAT router has established Internet connectivity, then it's time to begin troubleshooting the client side of your network.
The first thing that I recommend checking is physical connectivity. You have to make sure that a physical link exists between the switch that all of your workstations are plugged into and of the NAT router. Most administrators are knowledgeable enough to know that there has to be a connection between the switch and the NAT router, but the NAT router contains a lot of ports and it's easy to accidentally plug the cable into the wrong port.
After you have verified that physical connectivity exists and that your cables are good, the next thing that I recommend checking is the IP address configuration of your workstations. As I explained earlier, your NAT router is multihomed. As such, the NAT router will have to different IP addresses.
One of the addresses is the address assigned by your ISP, and the other address is valid only on your private network. You must make sure that these addresses are assigned to the proper interfaces within the NAT router, and that workstations are configured with a default Gateway address that matches your NAT router's private address. Unless the default Gateway address is set correctly on the client machines, the machines will be unable to access the Internet, even if the NAT router is working correctly.
After you have verified that the NAT router's IP addresses are bound to the correct interfaces, and that workstations are using the correct default Gateway, I recommend turning your attention to your DNS server. As I'm sure you already know, the DNS server's job is to resolve host names into IP addresses. When a DNS server is being used in a NAT environment, there are a couple of special things that need to be configured.
When a client attempts to resolve host name, it checks the DNS server for a resolution. If the DNS server does not contain information related to the host name, it has to get the resolution information from a higher level DNS. What this means is that the DNS server needs to contact another DNS server outside of your organization. In order to accomplish this, the DNS server must know the address of your default Gateway (in this case though NAT router) and it must know the address of the higher level DNS server.
You would configure the DNS server's default Gateway in the same way that you would configure the default Gateway for a workstation. As for the address of the higher level DNS server, there are couple of different ways that you can set it up. I personally tend to get the job done by adding the external DNS server's IP address to the DNS servers list of forwarders.
Assuming that the DNS server is configured correctly, the problem most likely have something to do with the way that your workstations are configured. Some of the more common problems that I have seen involve IP address range conflicts. For example, if the NAT router uses an IP address from one range and the workstations use an IP address from a different range, and there is no router between them, then the workstations will be unable to communicate with the NAT router.
Many NAT routers have built in DHCP servers. If you do not disable the built in DHCP server, and you have another DHCP server on your network then it is very possible that overlapping IP addresses could be assigned, or the IP addresses from two different ranges could be assigned. If you already have a DHCP server on your network, then I recommend disabling any DHCP capabilities in your NAT router.
The last thing that I recommend checking is to make sure that your workstations are not running any type of proxy client. Some older, software based NAT devices required a proxy client. These clients often worked by redirecting traffic over port 8080. These types of clients will cause communications with most modern NAT routers to fail.