Discussion on:
View:
Show:
On the email I received, the summary for this article said "KaZaA use was crippling its network even without hogging bandwidth" which is as misleading as it can get. The problem had nothing to do with Kazaa use, it was a bad switch. Can the TechRepublic editors please drink a couple of more cups of coffee before preparing these mails?
Did you read the whole article? If it's a bad switch how come the problem occurs only when KaZaA is in use and at the mentioned bandwidths?
Before the IT staff conducted their test, they knew one thing and one thing only: When students started firing up KaZaA on the university network, serious problems began to occur. The purpose of the test was to recreate the situation in order to identify the possible source of the problem. In this case, signs pointed to the switch, but they don't know definitively at this point that the problem is the switch itself. Their tests just gave them something to focus on for additional troubleshooting.
Hello!!?
They never proved it ONLY happens when Kazaa is running. They proved it does happen when Kazaa is running. They also proved that it does happen when network load reaches 6 Mbps or so. The two are not necessarily married. They didn't employ any other methods of generating traffic loads and sessions so they did not isolate the problem to Kazaa.
A junior admin performing basic troubleshooting and analysis should have isolated the problem to the switch in less than 10 minutes.
They never proved it ONLY happens when Kazaa is running. They proved it does happen when Kazaa is running. They also proved that it does happen when network load reaches 6 Mbps or so. The two are not necessarily married. They didn't employ any other methods of generating traffic loads and sessions so they did not isolate the problem to Kazaa.
A junior admin performing basic troubleshooting and analysis should have isolated the problem to the switch in less than 10 minutes.
Maybe japhillipson knows everything there is to know, but not everyone does. The point of this test they carried out, and indeed the point of the article, is that you should perhaps consider other methods of troubleshooting. Not just rely on using the tools all these 'professionals' use. I think it shows initiative on behalf of the university staff which is what I believe Techrepublic was trying to show.
Arrogance is not a very attractive feature.....
Arrogance is not a very attractive feature.....
I am not here to advertise, however after seeing a presentation by Extreme Networks (www.extremenetworks.com) I was very impressed with their solution to this particular problem. The switches they provide are programmable, allowing the administrator to allot network resources based on any number of variables. I imagine that there are other companies out there offering similar products, but Extreme is the only one I have actually seen demonstrated.
In my experience Extreme means Extremely underspecified power supplies, they seem to fail with alarming regularity.
I should also point out that Enterasys switches can do similar things, whilst the Allied telesyn Rapier L3 switch can have a stateful firewall enabled.
I've encountered KaZa on my network and killed it dead with the firewall.
I should also point out that Enterasys switches can do similar things, whilst the Allied telesyn Rapier L3 switch can have a stateful firewall enabled.
I've encountered KaZa on my network and killed it dead with the firewall.
It's really hard to find out which element is the bandwidth hogg. it Could've been the switch at their service provider or a bugg in Kazaa, why not the router before the switch.
What the test basically did was narrow down the problem to a probable cause. All indications from the various tests performed pointed to this particular switch. They don't know definitively that the switch is the problem, but they know enough now toperform specific tests with the switch itself to determine if that is the case.
Granted the problem was _recognized_ after P2P was running...
Granted the P2P is an invalid thing to do on someone else's dime...
But the point of the article seems to be that the team was creative in tracking down where the bottleneck was occurring... i.e., the network switch...
(Would the university have been awarded another 10 points for using Cisco and avoiding the problem? Stay tuned...)
Granted the P2P is an invalid thing to do on someone else's dime...
But the point of the article seems to be that the team was creative in tracking down where the bottleneck was occurring... i.e., the network switch...
(Would the university have been awarded another 10 points for using Cisco and avoiding the problem? Stay tuned...)
I doubt it... They were awarded points in the previous article for switching to AD (implying bogus advantages over NDS) which caused the extent of the problems instead of staying with NDS which would of weathered the network problems better...
Perhaps there were items left out of the article, but as stated, I don't see how they narrowed it to the switch. From what is described, it could simply be a duplex mismatch between the switch and the router.
Perhaps there were items left out of the article, but as stated, I don't see how they narrowed it to the switch. From what is described, it could simply be a duplex mismatch between the switch and the router.
I didn't see any place in the article referencing packet analysis. A well placed 'sniffer' would have narrowed down EXACTLY what was going on.
It's scary to read some of these replies! 
I said the same thing you did, read my reply date 8/02/02
I said the same thing you did, read my reply date 8/02/02
There is an option within Kazaa which allows you not to be a main node, the node option uses your bandwidth for other users to search the kazaa network. In the article it mentioned 2500 users connected at one point which would explain the slowing down of resources. If only one user was connected and was utilizing 6mbps this would have little effect on other users ability to log on. 2500 users all searching for mp3's and porn might however have something to do with the slowdown.
It may be the NUMBER of connections rather than the total bandwidth that's causing the problem.
I don't see how they isolated the problem to a switch. The problem is with slow/failed logins when P2P is running. Questions: what protocol does kazaa use? I checked their web site and found no info. If kazaa sets up a session with random unprivileged ports and sucks up all the sessions in the firewall then nothing will pass until kazaa sessions are terminated. If the server cpu utilization skyrockets because too many sessions are open then there's a place to look. If kazaa creates multiple bogus login sessions for the same user and the server runs out of licenses, then add licenses or limit the sessions.
The article never explained why the university allowed kazaa in the first place, especially when they know its crippling their network. Why would anyone allow peer networking?
If they know kazaa is killing them, then remove the knife. Disable unused and undesirable protocols, use policy manager to restrict users from installing software, and monitor the network for users who knowingly violate the policy and suspend their accounts.
The article never explained why the university allowed kazaa in the first place, especially when they know its crippling their network. Why would anyone allow peer networking?
If they know kazaa is killing them, then remove the knife. Disable unused and undesirable protocols, use policy manager to restrict users from installing software, and monitor the network for users who knowingly violate the policy and suspend their accounts.
Of course they will kill Kazaa, that's not the point; the point is that Kazaa revealed a flaw in the network and evidently Kazaa *is* their load tester.
A clue is provided by the large number of simultaneous connections. Switches do not aggregate addresses and can become overloaded a lot more easily than can a router. I consider it unwise to use a switch on a WAN connection; I am reminded of my days at NCTS when we had thousands of Novell servers advertising themselves; IPX network numbers cannot be aggregated so every single server gets its own entry in routing tables. A layer-2 switch won't care how many TCP sessions are happening, but a layer-3 switch WILL care about such things and can be overloaded if the connection table is not only large, but highly diverse (ie, public internet connections to the farm of Kazaa servers). Presumably a layer-3 switch can aggregate addresses same as a router but if they are diverse it might overload.
It is a problem for the university; the same problem will occur if many web servers are in use at the same time -- 2,500 or so web connections from diverse sources fill up the bridging tables of a layer 3 switch.
I would not totally ignore the relevance of the traffic volume to the T3 itself (I presume that 45 megabits is a T3). You may have a defective multiplexer that manifests itself only when you occupy bit positions in the multiplexed stream that are not occupied until you hit a submultiple of the total bandwidth. A T1, for instance, is a multiplexing of 24 channels, but the way it is done is designed to aid streaming of all channels; it is not just a round-robin byte from each channel times 24. It is quite complex but hardware failures will only manifest when your traffic reaches sufficient density to occupy the bad channel.
A clue is provided by the large number of simultaneous connections. Switches do not aggregate addresses and can become overloaded a lot more easily than can a router. I consider it unwise to use a switch on a WAN connection; I am reminded of my days at NCTS when we had thousands of Novell servers advertising themselves; IPX network numbers cannot be aggregated so every single server gets its own entry in routing tables. A layer-2 switch won't care how many TCP sessions are happening, but a layer-3 switch WILL care about such things and can be overloaded if the connection table is not only large, but highly diverse (ie, public internet connections to the farm of Kazaa servers). Presumably a layer-3 switch can aggregate addresses same as a router but if they are diverse it might overload.
It is a problem for the university; the same problem will occur if many web servers are in use at the same time -- 2,500 or so web connections from diverse sources fill up the bridging tables of a layer 3 switch.
I would not totally ignore the relevance of the traffic volume to the T3 itself (I presume that 45 megabits is a T3). You may have a defective multiplexer that manifests itself only when you occupy bit positions in the multiplexed stream that are not occupied until you hit a submultiple of the total bandwidth. A T1, for instance, is a multiplexing of 24 channels, but the way it is done is designed to aid streaming of all channels; it is not just a round-robin byte from each channel times 24. It is quite complex but hardware failures will only manifest when your traffic reaches sufficient density to occupy the bad channel.
So, IUS IT used Kazaa to instigate network load. Big deal!
I think that the real story is that they were so inept that they were not able to track down the problem during production time and fix it right away. Afterall, it sounds like some standard network troubleshooting was all that it took to find the probable bottleneck.
Is this the best academia has to offer?
I think that the real story is that they were so inept that they were not able to track down the problem during production time and fix it right away. Afterall, it sounds like some standard network troubleshooting was all that it took to find the probable bottleneck.
Is this the best academia has to offer?
what kind of troubleshooting was that? the fact that they used one P2P to recreate the problem was a half done solution. What about if everyone starts using IM softare, or ftp, or anything else to up the bandwidth to 6 to 8 mbps? oh boy, they think it might be the switch . . duh? is this really newsworthy?
No matter what they are using to stimulate the traffic in the network.As long us the problem is there , try to isolate the problem in layer wise.
From the orginator location:
1. local ethernet switch - (Access - Distribution - Core)
2. Router to router
3. Router to switch
From the access end:
1. Do the same
2. Check out the physical bandwidth . Is it 6 0r 45
3. Check the router interface and comm. media type.
4.Router Interface cable controller
5.Check out the duplex stages (router and switch)
From the orginator location:
1. local ethernet switch - (Access - Distribution - Core)
2. Router to router
3. Router to switch
From the access end:
1. Do the same
2. Check out the physical bandwidth . Is it 6 0r 45
3. Check the router interface and comm. media type.
4.Router Interface cable controller
5.Check out the duplex stages (router and switch)
Several lower end switches list a max MAC address count of 4096. With 2500 connections to 1 client plus all the other connections could the switch MAC address table be full????
MAC Addresses are only used for the local network.(LAN) Eg. Router to Node.
Once you talk to anyone outside your Network, then you use TCP/IP via the router.
The article is not totally bad. In my case, I would have figured all of that out a long time ago. Universities don't pay the highest salaries, so I imagine they have a high staff turnover. Not to mention the lack of experience, and the constant use of the network by users.
Under those conditions I can understand why it took so longto narrow down the problem. I think it was a good practice and good education for young Network Engineers, always try to recreate a problem to know how best to deal with it.
Using one Network Operating System (NOS), over another does not hide thefact that a problem exists with the switch, be it a hardware fault or a configuration error (assuming the switch is managable).
The troubleshooting activites carried out by the Staff at this University, serve as a good diversion from the daily activities of changing passwords, and cleaning mice.
When staff memebrs get involved in projects like these usually they learn a lot, and a lot is discovered about hardware limitations on some hardware. In fact, I'm now part of a class action lawsuit against HP for faulty hardware, all because someone took the time to test the product and compare their results with others.
Once you talk to anyone outside your Network, then you use TCP/IP via the router.
The article is not totally bad. In my case, I would have figured all of that out a long time ago. Universities don't pay the highest salaries, so I imagine they have a high staff turnover. Not to mention the lack of experience, and the constant use of the network by users.
Under those conditions I can understand why it took so longto narrow down the problem. I think it was a good practice and good education for young Network Engineers, always try to recreate a problem to know how best to deal with it.
Using one Network Operating System (NOS), over another does not hide thefact that a problem exists with the switch, be it a hardware fault or a configuration error (assuming the switch is managable).
The troubleshooting activites carried out by the Staff at this University, serve as a good diversion from the daily activities of changing passwords, and cleaning mice.
When staff memebrs get involved in projects like these usually they learn a lot, and a lot is discovered about hardware limitations on some hardware. In fact, I'm now part of a class action lawsuit against HP for faulty hardware, all because someone took the time to test the product and compare their results with others.
It is very possible that the routing table or address list is full. Where I work we have a Cisco 25xx and when we tested file sharing on it the routing table caused a significant slowdown desite minimal bandwidth usage.
In this case, that's probably not the source of the problem since most of the incoming connections would have come through a router, thus bear only one MAC address for the reply...
But I have seen this cause very baffling situations on a moderately large LAN. We had just moved our five Netware / cc:Mail servers to a very expensive (at the time) switch. Previously they had been directly on thicknet backbone and we had very high collision levels with approximately 1,500 users. The new switches did seem to dramatically improve things, but then we started getting calls from users who could not log on or who had been connected and had their connection dropped. The calls would start about 9:00 am, be fairly heavy until 1130 am, then drop down until 1:30 pm, and resume until about 3:30 pm. The basic workday was 7:30 am til 4:00 pm with half hour lunch for most people, but some were on slightly staggered schedules. After the "engineer" who had researched the switch when the purchase decision was made had spent four days troubleshooting the problem, one of the old "techs" was glancing through the switch documentation and noticed that the routing table was 1,024 entries max.... duh! Back to the thicknet until arrangements were made to trade the switch for one with larger routing table capabilities!
But I have seen this cause very baffling situations on a moderately large LAN. We had just moved our five Netware / cc:Mail servers to a very expensive (at the time) switch. Previously they had been directly on thicknet backbone and we had very high collision levels with approximately 1,500 users. The new switches did seem to dramatically improve things, but then we started getting calls from users who could not log on or who had been connected and had their connection dropped. The calls would start about 9:00 am, be fairly heavy until 1130 am, then drop down until 1:30 pm, and resume until about 3:30 pm. The basic workday was 7:30 am til 4:00 pm with half hour lunch for most people, but some were on slightly staggered schedules. After the "engineer" who had researched the switch when the purchase decision was made had spent four days troubleshooting the problem, one of the old "techs" was glancing through the switch documentation and noticed that the routing table was 1,024 entries max.... duh! Back to the thicknet until arrangements were made to trade the switch for one with larger routing table capabilities!
I can't help wondering if the problem scales with the number of connections rather than having anything to do with the bandwidth in use!
Perhaps the writer should have waited until the university had completed its testing to write this article.
This says nothing...it might be the switch...it might be Kazaa...it migh be...blah..blah.
Grade: Incomplete.
This says nothing...it might be the switch...it might be Kazaa...it migh be...blah..blah.
Grade: Incomplete.
To be accurate, only Mac OS X is affected, because it is based on BSD Unix. All earlier versions of Mac OS were not (they were entirely proprietary and did not use Sun RPC/XDR).
The title of my reply says it all.
It should be obvious to use the application that users are having a problem with. Can someone say DUHHHHHHH!
They wont find the problem until they find someone who really knows how to troubleshoot, and that someone should have a sniffer.
That's all....
It should be obvious to use the application that users are having a problem with. Can someone say DUHHHHHHH!
They wont find the problem until they find someone who really knows how to troubleshoot, and that someone should have a sniffer.
That's all....
Having read the article again I still can't see how this is helpful to the "Netadmin community" the article states that "Network Monitoring tools showed 6-8Mb of bandwidth being used" but what were these tools? If they are incorrectly configured or misread a Sniffer can miss loads of traffic. Also has anyone considered switches often "hide" traffic from sniffers due to their architecture. Even port mirroring doesn't always reveal exactly what's going on.
As stated in the article and from my own experience P2P file sharing applications such as Limewire, Kazaa, Morpheus, iMesh, Bareshare and others, are the nemisis to network administrators, they cause havoc on corporate and university networks - traffic and bandwith utilization is decrease which causes lagging and in some cases unable to logon to the network as mentioned in the above article. One way to kill applications such as these is to create a protocal definition on your firewall (in this case Microsoft ISA Server) that maps out the port number and protocol of these apps to a packet filter. You can specify the packet filter to deny outbound connections as well as inbound also. Some firewalls allow the network administrator to allow only certain protocols that users have access to (HTTP, FTP, NNTP etc.) This might be a good way to stop users from accessing these apps. Denying downloads of exe, zips, in a university enviorment would have a poor affect on students as well as teachers, this applies to the corporate enviorment too.
I can't believe the responses I'm reading to this article! So many of you making "half-baked" solutions to the problem. You missed the whole point of the article - it is just another way to investigate a problem they were experiencing. Lighten up- YOU are NOT the Network GOD - or they would have already contacted you!
I have had experience with HP2424M and 2524M switches and have found out in conversations with the HP technicians that there are some issues with older firmware with both the workstation (or router) being setup for auto mode / speed selection and having the HP's port configured for auto as well. To boil it down, the HP switch will try to default to what HP says is the standard in an auto to auto connection, that is the HP will choose to go to 100MB / Half Duplex while the NICs may very well go to 100MB / Full Duplex. This creates CRC / Framing errors in the switch which don't seem to appear until large, continuous file transfers occur. In some cases the condition on one port will cause errors for all of the other ports on the same switch.
The advice from HP is to update the firmware in the switches, update the NIC drivers on the workstations, and if possible, set both the NICs and the ports on the switch to 100MB Full or other matching, fixed (not auto) configuration. We've done so for a couple of sites and it has helped a great deal. In some cases simply setting the workstation to a fixed speed and mode may be enough.
Note also that the errors don't necessarily show up when using the HP TopTools monitoring tool which is free and comes with each HP managed switch. Using telnet and connecting to the switch, then reviewing the event log reveals multiple, rapid connect disconnects on the troublesome ports due to the confusion over the auto settings.
Worth taking a look at it anyway!
The advice from HP is to update the firmware in the switches, update the NIC drivers on the workstations, and if possible, set both the NICs and the ports on the switch to 100MB Full or other matching, fixed (not auto) configuration. We've done so for a couple of sites and it has helped a great deal. In some cases simply setting the workstation to a fixed speed and mode may be enough.
Note also that the errors don't necessarily show up when using the HP TopTools monitoring tool which is free and comes with each HP managed switch. Using telnet and connecting to the switch, then reviewing the event log reveals multiple, rapid connect disconnects on the troublesome ports due to the confusion over the auto settings.
Worth taking a look at it anyway!
having read the artical and all responses I tend to think that some of you out there forget the purpose of this forum. Advice,not bickering. there are severel good and viable solutions out there, but I tend to think that some of you guys are feeling rather slighted when there is a solution better (or worse) then yours. the point of the article is that they made a novel attempt at what seems to be a common problem. P2P file sharing is perhaps the biggist hog of bandwidth known. But to remove the P2P option would be a bad Idea but perhaps limiting it would be a better option.
- Keyboard Shortcuts:
- Prev
- Next
- Toggle









































