A few weeks ago, I wrote an article about choosing a network troubleshooting methodology, in which I discussed three approaches to network troubleshooting. TechRepublic members posted a variety of comments to this article's discussion; a chief concern was that the article needed some real-world examples. So this week, I'm offering up some real-life applications of these methodologies.
Before we get started, let's briefly review the three methods I discussed before:
- Bottom-up: Start at the bottom of the OSI model, and work your way up.
- Top-down: Start at the top of the OSI model, and work your way down.
- Divide and conquer: Start at whichever layer of the OSI model makes the most sense, and work your way either up or down.
It's great to understand the theoretical concepts of such methodologies, but how do you use them on a real network to troubleshoot a real problem? Let's look at examples for using the bottom-up approach and the divide-and-conquer method. (Because the top-down method is basically the bottom-up approach in reverse, we'll only look at an example for the bottom-up approach.)
Situation: A user calls from a remote site and says that his PC is down. The applications he's using require network access.
Getting started: First, determine if the user has Layer 1 (i.e., the physical layer) connectivity. For example, you could ask him to verify that the Ethernet cable connects to the device and to the wall port. However, with most users, it's often easier to go to a managed switch and look for a link light than it is to explain what an Ethernet cable is.
In the best-case scenario, you have a managed switch—as well as excellent network documentation. Therefore, you know that the user connects to wall jack port 12, and you know that wall port 12 connects to switch port 11 in the wiring closet. (Note: If you don't have any of this information, you'll need to get it from your user.)
Option 1: Next, Telnet to the Cisco switch and use the show ip interface brief command. Listing A provides an example of the output.
From this output, we can determine that port FastEthernet0/11 is down. Because this indicates a Layer 1 issue, ask the user to trace his Ethernet cable from the NIC on the PC all the way to the switch port.
Option 2: But what if this isn't the problem? What if your user is connecting via switch port 14? That connection is up, so it obviously has Ethernet connectivity. What's the next step? Use the show interface fastethernet 0/14 command on the switch. Listing B provides an example of the output.
From this output, we can tell that while the connection is up, it's taking a lot of errors, which indicates a Layer 1 issue.
Option 3: So, what if the port in question is up and there are no errors? Your next step is to verify Layer 2. Here's an example:
Switch# show mac address-table interface fastEthernet 0/14 Mac Address Table ------------------------------------------- Vlan Mac Address Type Ports ---- ----------- -------- ----- 1 00c0.b768.5409 DYNAMIC Fa0/14 Total Mac Addresses for this criterion: 1 Switch#
If this information matches the MAC on the PC, next verify that there's no extraneous configuration on the switch port. Here's an example:
Switch# show run interface fa0/14 Building configuration... Current configuration : 82 bytes interface FastEthernet0/14 switchport mode access spanning-tree portfast end Switch#
While there are other possible Layer 2 issues, things are looking pretty good for this level. Next, progress to Layer 3.Check Layer 3 on the PC by using the IPCONFIG /ALL command. Listing C offers an example.
From this, we can determine t that the PC does have an IP address. However, is it the right IP address? This PC obtained an IP address via DHCP in the 10.80.x.x range. But the PC is on the 10.1.x.x subnet.
And so, we've finally found the problem: The DHCP server handed out an IP address that doesn't belong to the right subnet. This most likely occurred from moving the PC from one subnet to another; problems started when the PC requested the old IP address.
You can likely solve this problem by reserving the IP address to a fake MAC address on the DHCP server. Use the IPCONFIG /RELEASE and then IPCONFIG /RENEW commands, and the PC will likely obtain the proper IP address for this subnet—and all the network applications will work.
Divide and conquer
Situation: A user calls and says that every application works for her except Internet Explorer Web browsing. When she tries to access a Web site, she receives this error: "Page cannot be displayed—cannot find server or DNS error."
Getting started: Because this is a problem with an application, you might think you should use the top-down method and start at the application layer of the OSI model. However, there are a variety of issues that could be the culprit.
Using the divide-and-conquer method, let's determine what we know so far. The user said that everything works except Internet Explorer. What she didn't specify—and probably didn't even think about—is whether we're talking about a local network or remote network.
Option 1: Since the error specifically mentions DNS, you might lean toward a DNS problem. Since other applications are working, maybe there's a local DNS server that's allowing the local LAN applications to work.
To test this theory, we can use the nslookup command to determine if DNS is working. Here's an example:
C:\> nslookup www.techrepublic.com Server: dns.TechRepublic.com Address: 10.2.1.26 Non-authoritative answer: Name: c10-sha-redirect-lb.cnet.com Address: 188.8.131.52 Aliases: www.techrepublic.com
This shows us that DNS is indeed working, so we need to keep looking.
Option 2: Is the Web site the user's trying to access on a local or remote subnet? Let's say it's an external subnet, like an Internet Web site.
With this knowledge, let's investigate the Layer 3, the network layer, since we know that there is some connectivity and other applications are working. Let's say we've used the ipconfig command to determine that the default gateway is 10.80.2.1. Now, let's ping the default gateway. Here's an example:
C:\> ping 10.80.2.1 Pinging 10.80.2.1 with 32 bytes of data: Request timed out. Request timed out. Request timed out. Request timed out. Ping statistics for 10.80.2.1: Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),
From this, we can determine that we have a Layer 3 problem. The default gateway is down or otherwise unreachable.
If you're at the central data center, Telnet to the user's default gateway at the remote location. After logging in, use the show ip interface brief command. Listing D offers an example.
From this output, we can determine that we have a disconnected network cable, the one that connects the router to the local switch. While we thought we had a Layer 3 issue, we actually had a Layer 1 problem instead.
In my opinion, the divide-and-conquer method requires more knowledge of networking and troubleshooting in general. However, it can also yield faster results. Using a troubleshooting methodology is similar to using an access list—once you've found a match, there's no need to proceed any further.
Miss a column?
Check out the Cisco Routers and Switches Archive, and catch up on David Davis' most recent columns.
Want to learn more about router and switch management? Automatically sign up for our free Cisco Routers and Switches newsletter, delivered each Friday!
David Davis has worked in the IT industry for 12 years and holds several certifications, including CCIE, MCSE+I, CISSP, CCNA, CCDA, and CCNP. He currently manages a group of systems/network administrators for a privately owned retail company and performs networking/systems consulting on a part-time basis.