CXO

Build Your Skills: Learning more performance tools for troubleshooting

Look at the Network performance object and protocol tests to troubleshoot server problems.


In the Daily Drill Down ”Using performance-monitoring tools to troubleshoot network problems,” I explained that although Performance Monitor’s primary purpose is to keep an eye on how your server is performing, you can also use it to identify some types of network problems. In this Drill Down, I’ll continue the discussion by showing you some more Performance Monitor counters you can use for this purpose.

Before you begin
Before I explain the remaining counters, I wanted to reiterate that this article is the second part in a two-part series. “Using performance-monitoring tools to troubleshoot network problems” discusses how to use Performance Monitor. It also explains the difference between a performance object and a counter. If you’re unfamiliar with these terms or with how to use Performance Monitor, I recommend that you check out my previous Daily Drill Down before you read this one.

The Network Interface performance object
In my previous Drill Down, I discussed how to use the Redirector performance object to determine how well network packets are flowing through the Windows 2000 network redirector. Let’s suppose you test the redirector and find that no traffic is flowing through it. What does that mean?

One possible cause is that the network interface card (NIC) could be malfunctioning. Such a malfunction might be caused by something as simple as a loose or unplugged data cable or by something as catastrophic as a blown network card. In either case, the Network Interface performance object will help you narrow things down by testing the performance of the NIC.

As you look at the Network Interface performance object, you’ll notice that many of the counters have identical names to the counters that are associated with the Redirector performance object. These counters behave exactly the same except for the fact that they measure network traffic at the NIC level rather than at the network redirector level.

If everything looks all right at the network interface level but no traffic is flowing through the redirector, the problem is probably related to Windows configuration. It could also be caused by missing system files. If, on the other hand, you’re unable to track any traffic flowing through the network interface, it’s likely that the problem is hardware related.

To test the network interface, begin by looking at Bytes Sent/Sec and Bytes Received/Sec. The Bytes Sent/Sec counter will test your ability to send data across the network. As you look at this counter, try doing something that typically generates network traffic, such as opening a file that resides on a network server. The idea is to make sure that your byte count corresponds to your actions.

One big tip-off to a malfunctioning network card is that sometimes bad cards will flood the network with a constant stream of data. When this happens, your hub or switch may block the port that the PC is attached to in order to prevent that PC from congesting the rest of the network. If your hub doesn’t have this feature, you may find that your network is running very slowly because it’s become congested with random traffic generated by the ailing network card. Of course, not all malfunctioning network cards run amuck like this, but it does happen and it’s a symptom to look for.

When you test the Bytes Received/Sec counter, you’re looking for no traffic at all. If traffic shows up, it proves that the card is at least capable of receiving traffic. If no traffic is showing up, however, there’s a definite problem. Many times, a consistent Bytes Received/Sec counter reading of 0 points to a cable problem. If a cable has been severed, disconnected from the hub, disconnected from the server, or shorted out, it will be impossible for packets to reach the PC. Of course, depending on the network, you may also be able to detect a cabling problem by looking at the Bytes Sent/Sec counter, but using the Bytes Received/Sec counter is much more reliable.

Before I move on to another performance object, there are a few more counters you should check in the Network Interface area. One such group of counters is the Packets Outbound Discarded and Packets Inbound Discarded counters. These counters count legitimate packets that would have normally been sent or received with no problem that were discarded for no apparent reason. If you happen to get a couple of these every now and then, it’s no big deal. However, if you’re having packets discarded on a regular basis, it points to a serious problem. Many times the reason that the packets are being discarded is that the system is running low on buffer space and that discarding network packets is the only way that the system can deal with the problem.

Another set of counters to check out is the Packets Outbound Errors and Packets Received Errors counters. These counters count the number of packets flowing in and out of the system that are basically gibberish. It’s normal to occasionally get an error or two, but again, if you’re getting errors consistently, it points to a big problem.

If the Packets Outbound Errors counter is showing an excessive number of errors, it’s likely caused by a problem with your network card. On the other hand, if your system is getting a lot of Packet Received Errors, then the problem could be a bad network card. There are other factors that could be causing the errors, as well. For example, a PC somewhere else on the network could be generating a lot of bad packets while trying to communicate with the server. Likewise, a bad network cable could result in signal degradation and cause network errors.

Last summer, I encountered a network in which there were an excessive number of network errors. Upon closer inspection, I found that whoever set up the network was trying to use CAT 3 cable for communications that should have been carried across CAT 5 cable. On another network, communications that would have been fine for CAT 3 cable were being carried across low-grade telephone wire. In both cases, the result was an excessive number of network errors. I’ve also seen network errors caused when an RJ-45 connector was loose. In the case of coaxial-based communications, if a piece of the mesh touches the core, it can cause network errors.

Protocol tests
There’s at least one more major component that needs to be examined—the protocol. As you may know, a protocol is almost like a spoken language such as English, Spanish, or German. The protocol defines the rules of communications across the network.

When a PC places a packet onto the network, the packet is constructed based on rules defined by the protocol. These rules tell the PC in what order to put the individual bytes. For example, the protocol tells the PC how many bytes make up the packet’s header and how many bytes compose the actual message being sent.

When the packet arrives at the destination PC, the network card at the PC will receive the packet no matter what it consists of. However, the redirector must then use the rules of the protocol to dissect the packet and see if it’s intended for that PC. If the packet is intended for the PC that’s received it, the redirector must use the protocol to figure out what part of the packet contains the message and extract the message. Obviously, this explanation is grossly over simplified, but my point is that without a protocol, there would be no communications between the network card and the network redirector.

The actual method that you use to check a protocol will vary depending on what protocol or protocols you’re running. Since TCP/IP is so popular these days, I’ll show you a few TCP/IP related counters that you can use to see how well TCP/IP is working. I’ve actually seen several cases in which communications were breaking down, and it was a result of a TCP/IP failure. In each case, I was able to remove TCP/IP from the machine and then reinstall TCP/IP and communications resumed. One possible reason for this is that TCP/IP is actually a suite of protocols rather than a single protocol. Therefore, it’s composed of a variety of different files. If some of these fires were to be damaged or missing, it could negatively impact communications.

The majority of the TCP/IP-related packets are found under the IP performance object. One of my personal favorite counters to look at is the Datagrams Received Header Errors counter. This counter keeps track of the number of packets that arrive with a bad packet header. Keep in mind, though, that a bad header doesn’t necessarily mean that a packet is corrupt, although it can indicate a corrupt packet. A bad header can indicate corruption because of a checksum mismatch, but it can also be caused by a TCP/IP version conflict, or because a packet has exceeded its Time To Live (TTL) counter.

Another helpful counter to look at is the Datagrams Received Unknown counter. This counter tracks the number of packets that the server receives but that it can’t do anything with because the packets were assembled using an unsupported protocol. For example, if the server was configured with TCP/IP as its only protocol and it received an IPX/SPX packet, this counter would indicate that an unknown packet was received.

Yet another counter to look at is the Fragment Reassembly Failures counter. As you may know, TCP/IP packets have a maximum size. If a message exceeds the packets maximum size, the message must be divided into multiple packets or fragmented. Each packet contains a sequence number so that the recipient knows what order to reassemble the message in. If not all of the packets are received within a given time period, the reassembly process will fail and generate an error.

The reassembly counter keeps track of how often reassembly fails. This counter can be a good indication of how often packets are being lost or delayed in transit. Keep in mind that some versions of TCP/IP can cause false packet reassembly errors because they discard the sequence numbers as the packets are reassembled. Therefore, this counter is usually a good indicator of where the problem may lie, but don’t rely solely on it.

Conclusion
When things go wrong, you may not always know what’s causing the problem. Once you know the counters to watch, though, you’ll find that Performance Monitor can be a great tool to help you troubleshoot problems.

Editor's Picks

Free Newsletters, In your Inbox