Entire Network Down - 100% Network Utilization - Please Help!

Alright guys. So as you can imagine, network goes down, I’m hoping to get this resolved soon. Here’s what happened.

It’s 12:30 P.M. [Do YOU know what you’re networks’ doing?]. Everything is running great. 3 main switches in the server room, 48 port managed switches [Dell PowerConnect 3348’s] and all is well. Everything functions normal, all servers are online, all desktops are happily happenin’. I go out to get a shipment of 50 new machines that arrive and start piling them outside my office. Next thing I know, I have lots of requests saying the entire network is down and nobody can access anything. I quickly head back to the server room wondering if a UPS went down, if a server restarted, if a switch turned off, anything. But what do I see? Absolutely nothing out of the ordinary. Everything is functioning great.

But wait…no it’s not. I can’t get on the internet. I can’t ping ANY computers, I can’t remote desktop into the servers, the RDC’s I DO have up with servers all fail, and everything is extremely slow. So I call the school board office. They head over with their handy $18,000 fluke meter. They plug it into one of our switches, and it measures our network and quickly throws back at us a 100% Network Utilization. The guy from the board office goes WHOA!!! I’ve never EVER seen it that high before.

So we try swapping the first switch in the stack under suspicion it may be bad. We put in a 3448, Dell’s next model of the 48 port 10/100 PowerConnect switch and take out the 3348. We use patch cables to link them together in a chain setup and see if that works.

In the end, the switch switch [lol] did nothing. I still can’t ping any machine in the school or get out of the network. I checked our main router, and it’s functioning normally. I restarted the servers and they all appear to be functioning normally. So I think to myself, what would cause 100% network utilization. I noticed that ONE ping got through, but only 1 out of the 4 pings got a reply. So I knew the infrastructure itself was probably ok, but I had the assumption that something was loop backing.

So off I go around the network. Documenting every single wall jack and port in the school [took 7 hours] and checked EVERY switch we have for any type of loopback that might be possible, like a jack plugged into a switch, and another port on the switch plugged into another jack. I also shut down every machine I came across and every network printer I came across so the network can essentially have nothing to broadcast [except for powered nics but since the machine isn’t on it’s less of a chance of being anything malicious running on the machine itself]. Nothing from what I could see in any of the labs was like that. All the jacks had either a direct connection to a PC, or a connection to a switch, that contained only other connections to PC’s, not back to a wall jack.

So here I am guys. It’s the weekend, Friday night, schools not in for the weekend so I’ve got 2 days to work and go in for some overtime. Anyone have any suggestions of what to try next? Thanks a ton you guys. I really appreciate this community and hope we can get this resolved!

Network Notes:
-Windows NT Based Network running Windows Server 2003 servers and 250 XP Professional client stations.
-6 Windows Server 2003 servers total
-3 main switches in server room, 7 others around school. All checked thoroughly and restarted.

Take care and have a good weekend. We have multiple IP’s to the school so since I can’t access the internet inside the school because of extreme lag, I’ll plug in to the main external switch plugged into the modem and grab an IP for my laptop if I go in so I can check in on answers.

Thanks guys!

Entire Network Down – 100% Network Utilization – Please Help!

All Comments