Networking

Get IT Done: Dissecting and diagnosing TCP/IP routing problems

Find out how TCP/IP sends information across your network

Most of us take for granted the complexities of the Internet and even our own intranets when we start up a Web browser and browse the Web. However, in order for the packets to flow from your computer to the server, there are a variety of mechanisms being used by the local computer and its nearest neighbor routers that you should know about. By understanding the process in which a computer can discover routes, you can make better decisions about how to architect your network and how to troubleshoot any routing problems that may arise.

TCP/IP basics
Almost everyone who has been exposed to TCP/IP knows that there are three pieces of information that are mandatory in a networked TCP/IP environment: IP address, subnet mask, and default gateway. The function of the IP address is clear; it is a unique address that refers to the machine just like a street address refers to a house.

There is, however, a lot more confusion about how the subnet mask and the default gateway are used. The subnet mask, put simply, determines whether the destination host for a packet is local or not. The subnet mask is logically ANDed with the IP address of the local machine and ANDed with the destination IP address. If the result is the same, then the destination is local. If not, it is remote. A logical AND takes each bit and returns a one or a zero. The logical AND only returns a one when both of the numbers being ANDed are ones. Logical ANDing is done on a bit-by-bit basis. By ANDing an IP address with a subnet mask, you get only the network portion of the address—and so you can determine if the host is local or not.

If the address is local, TCP/IP uses the address resolution protocol (ARP) to determine the physical address or media access control (MAC) address. Ultimately, communication on a physical network is done by identifying the hardware address for which the packet is intended. It is for this reason that TCP/IP must broadcast to determine the physical address for an IP address. Typically, this represents only a small percentage of the number of packets on the network, because once the address is discovered, it is cached by the local machine.

If, on the other hand, the address is not local, the computer uses a local routing table to determine where it should send the packet. The default gateway is simply a special default entry in the routing table that is used whenever the computer does not have a specific entry in its routing table.

Routing table basics
When every computer boots up, it builds its own routing table. The table is used to determine how to send the packet from its source to the destination. Above, when I mentioned that the subnet mask is logically ANDed to determine whether the address is local or not, I was referring to a small part of the process where the computer consults the routing table to determine what to do with the packets.

Each routing table contains the appropriate entries to push a packet destined for the local network to the ARP protocol for the IP address to be resolved. The same routing table pushes a packet towards a router connected to the local subnet.

In the simplest form, the routing table contains entries for:
  • Every local adapter
  • The networks attached to every local adapter
  • Default gateways
  • A local loop back address
  • A multicast address

In more complicated environments, the routing table would also contain entries for the networks that have routers connected to the local network.

The local adapter entries point packets that are destined for a network that is locally attached to the computer. The loop back address entry sends packets back to an internal interface in the computer for processing. The multicast address, although rarely used, routes packets in such a way that they can be sent to multiple destinations simultaneously.

The routing table is reviewed using three criteria. First, the length of the subnet mask is considered. The more specific the entry in the routing table, the more likely that it will be used. This is necessary to allow you to have routes to specific locations and default destinations for traffic that has no specific routes. The routes to a destination have a long subnet mask associated with them. The traffic without a route uses the default gateway entry, which has a subnet mask of no length.

The second criterion is the metric associated with the route. This helps determine the cost of the route. It is used to provide standby routes in the event of a primary route loss. In other words, it is used primarily to trigger dial-up backup routes when the main line is cut. In most networks, metrics are not used on PCs. They exist only in the routing tables of the core routers.

The final criterion on a Windows computer is a random order in which items of equal subnet, depth, and metric are tested. One entry starts at the top of this list and is not bumped from its spot until Windows tries to send it to the gateway and it fails. From that point, the next entry is used until it cannot be reached. This randomness only applies when there are two routes with equal priority. This rarely occurs, unless there are two default gateways. This might occur if you have a local area network and you dial up to the network. Your local area network has a default gateway, as does the dial-up connection.

The building of a routing table
Routing tables are built through local interfaces, static routes, routing protocols, and router discovery messages. The local interfaces are automatically added when they are activated. Static routes are those routes that have been added to the routing table manually. They are added by using the ROUTE command.

Routing protocols are used for routers to communicate between one another and learn a complete set of routes. They are typically not used on computers—however, several versions of Windows servers offer some of the basic routing protocols. These protocols are not installed by default, but they can be added, and they can automatically modify the routing table.

The final way that routing tables are updated is by Internet control message protocol (ICMP) redirect messages. This message is sent back from a router when a packet is sent to a router—but it knows that it is not the best route to reach the final destination. These messages cause the computer to add the information about the new router and the route to the routing table. These messages are the reason why a network can have two different routers connected to the same network, leading to different places, with only one default gateway configured.

The making of a redirect message
Redirect messages are sent back to a computer when a router detects that it is receiving a packet from an interface where the best route would send it back out that same interface. Let us say a router has a local interface with an address of 10.55.1.1 (255.255.255.0), and it has a route in its routing table that sends 10.254.1.0 (255.255.255.0) to 10.55.1.2 for further routing. When it receives a packet from a computer on 10.55.1.3 destined for 10.254.1.13, it responds by indicating that 10.55.1.2 is the best route to the destination. The computer adds an entry to its routing table indicating that it should use the router on 10.55.1.2 to reach the host.

In effect, these ICMP redirect messages allow the client computers to be configured with only a single default router, when, in fact, there are several routers on the local network that the computer may have to communicate with in order to reach both internal and external hosts.

Route and repeat
One of the fundamentals of IP routing is that each device gets the packet closer to the destination. Each router knows a small amount about the IP addresses that are in use on the Internet. These routers route the packet to the best of their ability. The hope is that the destination is closer after the route than before. The process is repeated as the packet is transmitted from router to router until it reaches its destination.

However, this isn’t always the case. It is possible for routers to route a packet back and forth between two neighbors. This case, called a routing loop, causes the packet to be bounced back and forth until a special field in the packet, called time to live (TTL), reaches zero.

TTL is decremented by each router before it routes the packet on. When the time to live reaches zero, a response is sent to the originating computer indicating that the time to live has expired. This is the message that PING will show you when a routing loop exists. This technique is used to prevent packets from routing back and forth forever.

Address resolution protocol
Thus far, I’ve been talking about how packets are routed from one router to another until they reach their final destination. However, you should have a basic understanding of how packets are transmitted on the local network before I explore how to troubleshoot problems.

As I mentioned above, the ARP is responsible for associating TCP/IP addresses with the hardware or MAC addresses. All transmissions on a local network can be directed to a single machine or all machines on the network. All transmissions on the network use a hardware address to determine their destination. A special condition exists whereby if a packet is transmitted with all bits in the address set, every machine in the network receives a copy and processes it. This is a broadcast.

The hardware address is technically named a MAC address because the address operates at the media access control layer of the protocol stack. IP addresses live in the network layer of the OSI network protocol model. MAC addresses in an Ethernet environment are six bytes (48 bits) long. They are unique because each vendor is defined with a prefix that is three bytes (24 bits) long. Each vendor is then responsible for keeping hardware IDs with that prefix unique.

The process of resolving a hardware address from an IP address isn’t complicated, but it does involve a broadcast packet. The first step is that the computer looks in the routing table and determines that the address is a local address. From there, it transmits a broadcast packet from the appropriate interface. The packet contains the hardware address of the current system and the IP address that is being sought. The system that has the IP address in question responds to the packet by sending a packet back to the originating computer.

Only the first solicitation packet is broadcast and then all of the remaining packets are sent directly between the two computers that are communicating. This is important because switches are a common part of network infrastructure today. They forward packets to computers only if they need to see them. This is in contrast to a hub, which sends all packets to all ports. Because switches send only the necessary packets to each port, they can improve performance on a network by allowing the traffic to exceed the bandwidth of any one port. Switches must transmit broadcast packets to every port. When there are a large number of broadcast packets on a network, the value of network switches is reduced.

Once ARP has looked up an IP address, it is added to its local ARP table. The ARP table is simply a list of IP addresses and their associated hardware addresses. ARP tables are created primarily through the discovery process discussed above, but can also have static entries added.

One odd thing about ARP is that it is used even when the packet’s final destination isn’t local. This is because the hardware address of the default gateway must be located. So even if none of the packets are local, ARP will have to be used at least once.

Seeing your ARP table
If you want to see what’s happening behind the scenes, you can look at your ARP table by typing:
ARP -a

at the command line. You’ll see a response similar to:
Interface: 10.254.1.16 on Interface 0x1000004
  Internet Address      Physical Address      Type
  10.254.1.247          00-01-03-d0-b4-8f     dynamic
  10.254.1.254          00-10-5a-07-84-23     dynamic

This shows the machines on the local network that ARP has found and, thus, the hardware addresses that have been resolved. In this case, 10.254.1.247 is a domain controller and 10.254.1.254 is the default gateway on the local network. As you can see, even the default gateway’s address gets resolved.

Troubleshooting your routing
There are two basic tools used in the troubleshooting of IP networks. The first tool, which is perhaps the most often used TCP/IP network-testing tool, is PING. It’s joined by TRACEROUTE, a more informative tool that can help you diagnose the path a packet takes to its destination. On Windows operating systems this is called TRACERT.

PING
The PING command, in its simplest form, uses only one parameter. That parameter is the IP address to be pinged. PING will return one of only a few responses. The possible first response is the number of milliseconds that it took for the PING command to send a packet to the remote machine and for a response to be returned. If PING responded, then there are no problems with connectivity to the remote device.

The second possible response is No Response. This message is generated when the PING command didn’t receive a response to its request. The most likely cause of this is that the device is offline or a device, such as a firewall, between you and the device will not pass along ICMP messages. Both TRACEROUTE and PING use ICMP messages to do their work. This means that neither of the two tools that you typically have at your disposal for resolving TCP/IP problems will function. If the device is local, you should verify its connectivity to the network. If the device is remote, you’ll have to investigate what devices are between you and the destination and try to diagnose the problem from the device that isn’t allowing ICMP messages to be transmitted.

The third possible response from PING is Destination Host Unreachable. In this case, you either have not specified a valid default gateway, or one of the routers along the path to the destination has lost its connection. This response tells that the route that should lead to the destination is not working. This is most typically found when the only connectivity to the site is down. If you receive this message, you should follow up by using the TRACEROUTE command to determine which router believes the destination is unreachable.

The fourth possible response from PING is Time To Live Expired. This message typically indicates a routing loop where one router sends a packet to its peer and then the peer router sends it back. This generally indicates a routing table problem. You’ll need to use the TRACEROUTE command to locate the routers that have the problem.

There are other possible responses from PING, such as Hardware Failure. This can occur when you disconnect the network cable during the PING process. However, most of the other messages that can be generated by PING are messages that are not normally associated with the troubleshooting process.

TRACEROUTE
PING is a great tool, however, it gives a rather limited set of information. TRACEROUTE, on the other hand, can return the complete path that the packet takes on its way to the final destination. The basic execution of the TRACEROUTE command is simply the command name followed by the IP address to trace to. In the case of Windows, the command is called TRACERT; in all flavors of UNIX, it is TRACEROUTE. The command will return output similar to the following:  
 
Tracing route to penguin.datacenterdaily.com [216.37.52.229]
over a maximum of 30 hops:
 
  1   <10 ms    10 ms    10 ms  WEBMASTER [10.254.1.254]
  2   <10 ms   <10 ms    10 ms  adsl-68-23-14-174.dsl.lgtpmi.ameritech.net [68.23.14.174]
  3    20 ms    40 ms    20 ms  adsl-68-23-14-1.dsl.lgtpmi.ameritech.net [68.23.14.1]
  4    20 ms    30 ms    40 ms  dist1-vlan50.ipltin.ameritech.net [67.36.128.226]
  5    10 ms    20 ms    20 ms  bb1-fa2-1-0.ipltin.ameritech.net [67.36.128.115]
  6    30 ms    30 ms    20 ms  sl-gw22-chi-2-0.sprintlink.net [144.228.153.125]
  7    30 ms    20 ms    30 ms  144.232.10.9
  8    30 ms    50 ms    40 ms  sl-st21-chi-14-1.sprintlink.net [144.232.20.86]
  9    30 ms    30 ms    40 ms  204.255.174.153
 10    40 ms    30 ms    40 ms  0.so-3-1-0.XL2.CHI2.ALTER.NET [152.63.71.97]
 11    50 ms    40 ms    30 ms  0.so-7-0-0.XR2.CHI2.ALTER.NET [152.63.67.134]
 12   121 ms    60 ms    60 ms  192.at-6-2-0.CL2.IND6.ALTER.NET [152.63.66.217]
 13    40 ms    60 ms    50 ms  190.ATM7-0.GW5.IND1.ALTER.NET [152.63.68.245]
 14    60 ms    50 ms    50 ms  onecall-POS-core-gw1.customer.alter.net [63.122.162.214]
 15    50 ms    60 ms    50 ms  Obelisk-2-Cedar-Oc3c.Onecall.net [216.37.0.114]
 16    60 ms    50 ms    60 ms  JayQualls-55-60.OneCall.Net [216.37.55.60]
 17    40 ms    50 ms   121 ms  GM-Colo-52-229.OneCall.Net [216.37.52.229]
 
Trace complete.
 

This shows the complete path that is taken for a packet from my private network attached to Ameritech DSL to a system located in One Call Internet’s colocation facility. If the utility had returned a list of alternating routers, then you could identify that one or the other of those routers had a configuration problem. Alternatively, you may receive a message indicating Destination Unreachable. This message indicates that the router's path to the destination has been severed.

It’s not that complicated
When troubleshooting TCP/IP problems, keep in mind that it’s critical that you get the IP address, subnet mask, and default gateway correct. After those few parameters are met, the TCP/IP protocols infrastructure can begin to help your system reach every other connected computer. The PING and TRACEROUTE commands are key to diagnosing your network problems.
0 comments

Editor's Picks