Many problems can affect the performance of a network, and a wide array of tools are available to administrators to diagnose issues. But when you find serious problems occurring under minimal bandwidth usage, you might have to turn to alternative troubleshooting methods and do some extra investigative work.
That’s the situation that challenged the IT department at Indiana University Southeast (IUS), a small university located in New Albany, IN. It found that serious network problems began to occur when only 6 to 8 Mbps of bandwidth was being used on a pipeline that should not show any appreciable degradation of performance until usage reaches 45 Mbps. The problems seemed to occur at random times, but when they did, users could not log on to the university network.
This was a big problem, and Network Administrator John Petrysian was determined to get to the bottom of it. But he had to wait for the summer session to end and fewer people were on campus before he could have his staff do a load test to try to reproduce the problems.
In conjunction with its standard network monitoring tools, the IUS IT staff turned to another piece of software to help solve the problem—KaZaA. The results of their testing yielded some intriguing results, and KaZaA did indeed help them track down the possible source of their network woes.
An overview of the issue
As I discussed in an earlier article, IUS is part of the Indiana University (IU) system and is working to better integrate itself and its network with its parent university and with the IU system as a whole. Part of that effort means that network authentication is not performed locally but on servers in Bloomington, located over 100 miles away from IUS. Thus, the link between IUS and the authentication servers must be 100 percent reliable; otherwise, users can't log on.
The IUS IT department recently discovered that under certain circumstances, users were having problems logging in even when network monitoring tools revealed that minimal bandwidth was being utilized. IUS has a 45-Mbps connection to the authentication servers, but logon issues seemed to arise as the number of connections increased, pushing bandwidth usage into the 6- to 8-Mbps range.
The prime suspect in the case was the use of peer-to-peer networking software (particularly KaZaA) on the IUS network, but that alone would not explain why the problems occurred at that bandwidth usage level. So Wes Rose, coordinator of technical systems, scheduled a test to find out what was causing the problem and why it was occurring. The IT department decided the best way to track down the cause was to actually run KaZaA themselves and see what happened.
The plan was simple: While Rose monitored the network, members of the IT staff would run various instances of KaZaA to force the number of network connections into the range where the logon failures began.
KaZaA was installed and set up on several machines in the IT department. The IT staff configured shared folders and, to encourage user connections, added dummy downloads to the shares with inviting filenames. It’s not enough, after all, just to have KaZaA running; you also have to have something people want or you have to be downloading files yourself.
The honeypot of sorts in this case was sweetened by dummy files with names like “MIBII,” “EpisodeII,” and, well, other things that users might be interested in downloading. The dummy files, by the way, were just that: they contained nothing but junk.
And the plan seemed to work. Users began connecting and downloading the files, pushing the number of active connections and bandwidth up accordingly. Rose monitored the activity and waited for calls about network problems to come in. A Bloomington-based utility provided statistics on network usage updated at five-minute intervals. This was the primary source of information about bandwidth utilization.
It took about an hour for saturation to reach the point at which logging on to the network became problematic. Just as before, the problem occurred when bandwidth usage hit the 6-Mbps mark. At one point, one machine running KaZaA had over 2,500 connections.
When the network became throttled, Rose began using ping to identify where the bottleneck was occurring. While he worked to diagnose the problem, he also spoke with a technician at the Bloomington campus who was pinging the IUS network from the other end of the line.
While Rose’s pings to the Bloomington campus showed problems, pings from Bloomington to IUS indicated no issues of any kind.
Rose Telnetted into an HP switch between the IUS LAN and the Cisco router that connected IUS to the servers in Bloomington and began trying to ping the router. The pings either timed out or took an inordinately long time. Conversely, the user on the other end pinging from the Cisco router to the HP switch reported no problems. Only outgoing pings were being throttled.
In spite of the relatively low bandwidth used on the network, the bottleneck at the HP switch seemed to be causing the logon issues.
It looks like the KaZaA test was a success. By monitoring the network while others were running KaZaA, Rose was able to narrow the issue down to a specific area. Unfortunately, the test itself could not reveal what’s occurring in the switch that is resulting in the bandwidth and logon problems. The IT department will have to conduct further investigation to find out what’s really happening with the switch, but in the meantime, it has established two facts in the case:
- KaZaA use on the network can and does cause unacceptable bandwidth problems.
- The HP switch is currently a bottleneck on their system.
The next step in the process will be to remove the switch from the equation and see if network performance improves. While KaZaA use may be a matter of concern and an issue that will also have to be addressed, a bigger concern is how 6 Mbps of bandwidth utilization on a 45-Mbps connection can prevent outgoing traffic from making it through the router and keep users at the IUS campus from logging on to domain controllers in Bloomington.
Although the investigation is not yet closed, the IUS staff was able to load-test its network by utilizing a rather offbeat method that ultimately helped them identify the point at which their bandwidth issues were occurring.
If you’re experiencing bandwidth issues and want to do some load testing, consider alternatives to the standard tools you’ve been using. You might be surprised what you can discover by simply trying to simulate different types of network traffic and monitoring its effects on your infrastructure.