Troubleshooting in a Cisco switched environment: The design phase

Troubleshooting a Cisco switched environment can begin as early as the design phase. Read on as Robert McIntire helps you prevent loops and spanning tree problems.

In the first part of our series on bridging loops, “Bridging loops and spanning tree: Troubleshooting in a Cisco switched environment,” we discussed emergency procedures for remediating a loop or spanning tree malfunction on your network. In the second, “Spanning trees and bridging loops: Troubleshooting or troublemaking?,” we continued our discussion with troubleshooting techniques. In this Daily Feature, we’ll be discussing preventive measures that can be taken in the design phase to avoid a loop or spanning tree problem.

The following command-line examples are indicative of the CatOS found on the series 4000-6000 Cisco switches.

The first issue to consider in your design is the location of the root bridge. Left to its own devices, spanning tree will designate the root for you. But I don’t recommend this. You’ll want to designate a centrally located switch as your root bridge using the set spantree root command, as in the following example:
Switch> ( enable ) set spantree root

You can also set a switch as the backup root in case of a failure on the root device. To do so, use the secondary parameter with the previous command:
Switch> ( enable ) set spantree root secondary

Another thing you can do is activate logging on your switches. This is a proactive measure that will allow you to search the logs for errors when or if there is a problem with STP. If you have a full featured network management package on your network, you can also set up alerts to have the system notify you of any spanning tree errors or events that may occur. In this way, you can keep an eye on STP and resolve issues before they become full-blown problems. Keep in mind that logging requires a certain amount of disk space and possibly a dedicated syslog system, depending upon how many devices you’re logging. If you have a capacity issue in this area, you can log only the devices that are hosting blocking ports. One thing to think about with logging is what happens when a loop occurs. Generally speaking, network access will be erratic. So, if you’re logging to a syslog server, chances are good that logging will be interrupted during the loop event. For this reason, you may want to log STP locally on the switch so that you can access current STP event messages during the troubleshooting process. Here is an example of how to configure logging locally on the switch. This increases the default logging level for spanning tree events.

To speed the STP convergence process when changes occur, you’ll want to use the enhanced backbonefast and uplinkfast features provided by Cisco. Uplinkfast is set up on the access layer switches for the purpose of faster convergence of STP with regard to redundant links from this layer up to the distribution layer. Backbonefast also allows STP to speed convergence time. The key thing to remember is that backbonefast must be enabled on all switches to function properly. Following are command examples:
Switch> ( enable ) set spantree uplinkfast enable
Switch> ( enable ) set spantree backbonefast enable

Another thing we can do to make our design more robust is to limit redundant links. These are, generally speaking, good things. However, if we get carried away with redundancy, this simply creates more work for STP to perform when a topology change occurs. Cisco suggests that no more than two links are necessary between any two network devices. The more links you have, the more ports are blocked; this equates to more potential problems when the spanning tree is recalculated.

One of the final considerations in any design process is testing. Testing failover after design and configuration of STP can forewarn of any weak spots in the design. One simple method is to start a continuous ping across a redundant, active link. Then disconnect the active link at Layer 1 (unplug the cable), and watch the STP failover process in real time. You can monitor STP activity by watching the logging messages at the console of the switch hosting the blocked and/or active, redundant links. The messages we’re interested in are the ones pertaining to bridge ports. Again, it’s a good idea to do this from the console rather than over the network, because you may cut off your connection. The single most important factor we’re looking at here is the convergence time. How long will it take for the STP to detect the active link failure and activate the redundant link? And what type of delay will the end user experience? What we’d like to see is a convergence time so low that it occurs before the network clients time out.

I do recommend using spanning tree. If your network design includes Layer 2 redundancy, running STP is a moot point. But if your network topology changes often, you could experience excessive disruptions while STP converges and recalculates the network layout. For this reason, you may want to consider limiting who can make changes and who has physical access to your network infrastructure.

Remember that STP does add a certain level of traffic overhead in the effort of maintaining a loop-free network. If you don’t have good physical control over your network topology, wiring closets, and patch panels, STP becomes a very attractive choice. After all, who wants to deal with a bridging loop caused by a user or IT staffer who made the innocent mistake of misconnecting two switches? And don’t forget to monitor spanning tree regularly for anomalous activity. By following these simple tips, you can implement network redundancy without the penalty of bridging loops.

Editor's Picks

Free Newsletters, In your Inbox