Discussions

Troubleshooting Errors/Problems on Cisco WAN Interfaces

+
1 Votes
Locked

Troubleshooting Errors/Problems on Cisco WAN Interfaces

MWBOB
I'm a Telecom/Transport technician for a major utility. I am responsible for maintaining connectivity throughout the network. We are having serious issues at two sites. In both cases, the transport equipment (digital microwave) is running clean, with no errors. The Cisco 2621 router attached via a WIC T1 card is seeing input errors, CRCs, and frame errors.The cause for this type of problem is typically found to be some sort of configuration mismatch, but I'm not trained on the TCP/IP side of the network. I have a few questions I would appreciate help on:
-When a "show interface" command is displayed, what time frame is being covered? (15 minutes, 1 hour, 1 day, etc)
-The interface shows "frame errors". What constitutes a "frame error" in the TCP/IP world? Is it time related? Packet related?
-I have been told that errors displayed on the WIC are actually errors on Layer 2 or higher in the OSI stack. Is this true?
-The errors shown on the WIC don't seem to be severe enough to be causing a critical slowdown of data throughput.Any suggestions as to how analyze this?
Thanks in advance for any insight you guys can provide. -Bob
  • +
    1 Votes
    CG IT

    1. show interface will give you what's going on at the time you issue the command. to find out in real time what's going on, use the debug interface X.X
    2. frames are at the data link layer frame errors can be that the frame headers the router expects to find aren't what is there, or are missing - and that is just one of many types of errors found at layer 2.
    3. yes and no : data at layer 1 is encapsulated into a frame at layer 2, then into a packet at layer 3. these frames and packets are akin to envelopes with data as the contents inside the envelope.

    visit Wikipedia for an overview of layer 2 .
    http://en.wikipedia.org/wiki/Layer_2

    Slow traffic can be caused by many different issues. frame errors can cause a slow down of traffic collisions can also cause substantial slow down of traffic. convergence can cause slow downs. It's up to the network engineer to analyze and determine just what is causing a slow down.

    +
    1 Votes
    MWBOB

    1. If the "show interface" cmd shows 37 input errors, 7 CRCs and 29 frame errors (and this is not a real time view) it is critical to know what time period is being examined. The longer the period is, the less critical the errors will be to overall performance. I will have the network guys show me the results of a "debug interface" command.
    2. If I understand you correctly, then frame errors could be the result of the router not receiving data as it expects to see it. That could mean corrupted data, timing issues, or practically any other problem.
    As a matter of fact I am reading a book on basic TCP/IP
    3. Very clear explanation. Thanks for that. The book takes about three pages to say the same thing, just not as clearly.
    4. Both routers in question each have (2) T1 WAN inputs. At one site usage is exceeding available BW quite often. Can this be helped temporarily by changing buffer sizes?
    Both of these sites are small and our transport equipment is the equivalent of a telco demarc.
    5. Glad you mentioned this. The network guys tell me that the (2) T1s dropping off our microwave to this router are bonded, meaning the router effectively "sees" 3Mb of bandwidth. I am assuming that this combining function is handled at the data link layer by the routers on both ends of the circuit. Is that correct?

    Thanks a lot for your input. I have to come up with some ideas by tomorrow morning and this has helped. -Bob

    +
    0 Votes
    seanferd

    If the packets are dropped due to bandwidth being exceeded, this would explain a lot of the errors, maybe all of them. Expected packets are not being received, so no way to piece the data back together, therefore error conditions exist.

    Short answer solution: Stop exceeding bandwidth. I don't know the mechanics of these things, but if the send and receive buffers were increased and/or timeout settings are raised, the system won't lose packets or assume packets are lost when they are late.

    +
    1 Votes
    CG IT

    If a router doesn't know what to do with a packet, it drops it. packets [layer 3] are assigned a sequence number for reassembly at the receiving end so that after the packet and frame wrappers are stripped away and the payload is put on the physical medium, it's in the right order. If in reassembly of the packets in sequence at the receiving end results in a lost packet, the receiving side has to request that packet and the sending side has to resend it. This can cause a slow down. Lost packets do happen. Packets don't follow a single route because routing protocols such as OSPF find the shortest path, other routing protocols use metrics, but the gist is the sending side simply sends out the packets and hope the receiving end gets em all. If not the sending end relies upon the receiving end to ask for the missed packets. If there's a lot of packet or frame errors, this can cause network slow down.

    But, slows downs can be almost anything on a WAN interface... from a single user sending and receiving large amounts of data, such as video, to latency at next hop router or even routers farther out in the inter-network. Inter-network routers don't alway use the same routing protocols between them. Some might use distance vector routing protocols, some not. I would on off hours do a debug on the WAN T1 interface, capture some of that information, analyze it to see what's going on. you could also use wireshark and capture packets, see what's being sent out and received and if anything funky is going on with the packets that cause excessive errors.

    personally, I would look at what's being sent out and received on the WAN link. If you've got some users steaming videos, that's going to chew up your available bandwidth. I'd check your routing protocols and if one of them is very chatty. How are your routers sending and receiving route updates? if you send out route updates and receive them on your WAN link, might consider making the external side the passive side so that your router isn't getting route updates from other routers on the internet. If your router is always having to process route updates from other internet routers, that can reduce available bandwidth... Why have them when you only need to send packets not destined for the local to the next hop and let them worry about it....

    I'd find out your actual utilization of your available bandwidth and what traffic uses the most bandwidth. It could simply be that the designed network and WAN link isn't sufficent....

    without seeing some of the debug or packet capture on both inside and outside of the WAN link, there's not much to say except generalizations...

    side note: if you think of ethernet and TCP/IP as the US postal service [or really any postal service], you will "get" what frames, packets / networking does.

    +
    1 Votes
    NetMan1958

    To answer one of your questions, the time frame covered in the "show interface" command is the cumulative time since some someone manually cleared the counters with the "clear counters serial0/0/0"(replace serial0/0/0 with the interface you are interested in) command or the router was reloaded. To calculate the number of errors per time period, use that command to clear the counters and note the current time. Then after a suitable period, run the "show interface" command. Lets say you waited 5 minutes after clearing the counters before you run "show interface" then the data shown is for the last 5 minutes.

    Also, see this article and pay particular attention to the sections on clocking:
    http://www.cisco.com/en/US/docs/internetworking/troubleshooting/guide/tr1915.html

  • +
    1 Votes
    CG IT

    1. show interface will give you what's going on at the time you issue the command. to find out in real time what's going on, use the debug interface X.X
    2. frames are at the data link layer frame errors can be that the frame headers the router expects to find aren't what is there, or are missing - and that is just one of many types of errors found at layer 2.
    3. yes and no : data at layer 1 is encapsulated into a frame at layer 2, then into a packet at layer 3. these frames and packets are akin to envelopes with data as the contents inside the envelope.

    visit Wikipedia for an overview of layer 2 .
    http://en.wikipedia.org/wiki/Layer_2

    Slow traffic can be caused by many different issues. frame errors can cause a slow down of traffic collisions can also cause substantial slow down of traffic. convergence can cause slow downs. It's up to the network engineer to analyze and determine just what is causing a slow down.

    +
    1 Votes
    MWBOB

    1. If the "show interface" cmd shows 37 input errors, 7 CRCs and 29 frame errors (and this is not a real time view) it is critical to know what time period is being examined. The longer the period is, the less critical the errors will be to overall performance. I will have the network guys show me the results of a "debug interface" command.
    2. If I understand you correctly, then frame errors could be the result of the router not receiving data as it expects to see it. That could mean corrupted data, timing issues, or practically any other problem.
    As a matter of fact I am reading a book on basic TCP/IP
    3. Very clear explanation. Thanks for that. The book takes about three pages to say the same thing, just not as clearly.
    4. Both routers in question each have (2) T1 WAN inputs. At one site usage is exceeding available BW quite often. Can this be helped temporarily by changing buffer sizes?
    Both of these sites are small and our transport equipment is the equivalent of a telco demarc.
    5. Glad you mentioned this. The network guys tell me that the (2) T1s dropping off our microwave to this router are bonded, meaning the router effectively "sees" 3Mb of bandwidth. I am assuming that this combining function is handled at the data link layer by the routers on both ends of the circuit. Is that correct?

    Thanks a lot for your input. I have to come up with some ideas by tomorrow morning and this has helped. -Bob

    +
    0 Votes
    seanferd

    If the packets are dropped due to bandwidth being exceeded, this would explain a lot of the errors, maybe all of them. Expected packets are not being received, so no way to piece the data back together, therefore error conditions exist.

    Short answer solution: Stop exceeding bandwidth. I don't know the mechanics of these things, but if the send and receive buffers were increased and/or timeout settings are raised, the system won't lose packets or assume packets are lost when they are late.

    +
    1 Votes
    CG IT

    If a router doesn't know what to do with a packet, it drops it. packets [layer 3] are assigned a sequence number for reassembly at the receiving end so that after the packet and frame wrappers are stripped away and the payload is put on the physical medium, it's in the right order. If in reassembly of the packets in sequence at the receiving end results in a lost packet, the receiving side has to request that packet and the sending side has to resend it. This can cause a slow down. Lost packets do happen. Packets don't follow a single route because routing protocols such as OSPF find the shortest path, other routing protocols use metrics, but the gist is the sending side simply sends out the packets and hope the receiving end gets em all. If not the sending end relies upon the receiving end to ask for the missed packets. If there's a lot of packet or frame errors, this can cause network slow down.

    But, slows downs can be almost anything on a WAN interface... from a single user sending and receiving large amounts of data, such as video, to latency at next hop router or even routers farther out in the inter-network. Inter-network routers don't alway use the same routing protocols between them. Some might use distance vector routing protocols, some not. I would on off hours do a debug on the WAN T1 interface, capture some of that information, analyze it to see what's going on. you could also use wireshark and capture packets, see what's being sent out and received and if anything funky is going on with the packets that cause excessive errors.

    personally, I would look at what's being sent out and received on the WAN link. If you've got some users steaming videos, that's going to chew up your available bandwidth. I'd check your routing protocols and if one of them is very chatty. How are your routers sending and receiving route updates? if you send out route updates and receive them on your WAN link, might consider making the external side the passive side so that your router isn't getting route updates from other routers on the internet. If your router is always having to process route updates from other internet routers, that can reduce available bandwidth... Why have them when you only need to send packets not destined for the local to the next hop and let them worry about it....

    I'd find out your actual utilization of your available bandwidth and what traffic uses the most bandwidth. It could simply be that the designed network and WAN link isn't sufficent....

    without seeing some of the debug or packet capture on both inside and outside of the WAN link, there's not much to say except generalizations...

    side note: if you think of ethernet and TCP/IP as the US postal service [or really any postal service], you will "get" what frames, packets / networking does.

    +
    1 Votes
    NetMan1958

    To answer one of your questions, the time frame covered in the "show interface" command is the cumulative time since some someone manually cleared the counters with the "clear counters serial0/0/0"(replace serial0/0/0 with the interface you are interested in) command or the router was reloaded. To calculate the number of errors per time period, use that command to clear the counters and note the current time. Then after a suitable period, run the "show interface" command. Lets say you waited 5 minutes after clearing the counters before you run "show interface" then the data shown is for the last 5 minutes.

    Also, see this article and pay particular attention to the sections on clocking:
    http://www.cisco.com/en/US/docs/internetworking/troubleshooting/guide/tr1915.html