Intermittent loss of connection within small LAN

By dave-morton ·
I administer a small network for a cab company, and have lately been battling an internal connectivity problem for the last several days. It seems that 2 of the 5 computers on the LAN intermittently "drop" connection without losing their IP addresses. If I disable the network connection on the "dropped" machine, and then immediately re-enable it, it seems to work for anywhere from 5 minutes to a couple of hours, then "drops" again. All LAN IP addresses are configured manually. I've posted an image of the network map at to illustrate how the LAN is arranged.

The local network has 2 computers that connect to the internet through the DSL router, one for ICS, and the other has an external web server (neither one has connectivity problems).

One computer on the LAN has an internal web server that handles our dispatch program. All computers on the network need to be able to access this box.

The dispatch client has to access the dispatch server, but can't have internet access. It's blocked from the internet through the simple expedient of not having an external DNS server assignment.

Now, for the meat of the problem. I've checked and re-checked the configurations for both the dispatch server and the dispatch client, and find no problems. Both have the most recent drivers for the network cards, and both have new cables. I've swapped out a known good hub for the LAN, and have made sure that there are no mis-configured hosts files. As outlined previously, disabling and enabling the NIC on the offending machine will restore connectivity for a while. I've checked each machine for an overheating problem, just in case, and find no trouble there. All machines are running XP Pro.

I can't shake the feeling that I'm missing something basic and simple, but I'm at a loss as to what that might be. Any suggestions?

This conversation is currently closed to new comments.

Thread display: Collapse - | Expand +

All Answers

Collapse -

Well, a couple of dozen thoughts:

by robo_dev In reply to Intermittent loss of conn ...

From what you describe, the common elements between two computers that drop are:

1) The power environment
2) The RF (radio frequency) environment
3) Your ethernet switch/hub and it's power supply.

When you disable/reset the nic in the PC, you're forcing the ethernet switch to clear/refresh it's memory. You're also clearing the netbios cache in the local PC.

Does resetting the ethernet switch also fix the problem?

What about plugging/unplugging the failed PCs ethernet cable?

Does it work to simply click on 'repair' or do you need to actually enable and disable the nic on the failed PC?

Did you swap the power supply when you swapped the ethernet switch?

Any high-power radio transmitters near that switch?

What kind of hub/switch are you using?

I would assume that the PC hosting the external web server does not have routing enabled?

What sort of firewall are you running? Is there a simple NAT firewall as part of the DSL connection?

Even though you're using static IP addresses, are there any DHCP servers on the network? ICS typically works as a DHCP server.

Is there any personal firewall software on each PC?

When a PC is in the failed state, try the following:

1) is the link light on at both the switch and the PC NIC?

2) if you ping something, does it give simply timeouts, or destination unreachable errors?

My first guess:

The switch/hub is having a power interference, RF interference issue, or is faulty.

Second Guess: A failing NIC (or other hardware issue) is sending garbage to the ethernet switch, causing it to do weird things.

The third thought is that your web server has been compromised (hacked) and somebody is then trying so aggressively to hack the internal machines that it's causing issues.

Your DSL router/modem/firewall most likely can do some logging, and that would help in the last case.

Collapse -

Answers to a couple dozen thoughts:

by dave-morton In reply to Well, a couple of dozen ...

[computer name][IP address] -t", I'll get maybe 1 in 10-15 good responses. Never do I get more than 1 in a row, however. Since all the computers are configured directly in each computer's hosts file, the correct IP address is obtained each time.

Additionally, Isolating any 2 of the computers in the network where there is at least one that "drops", I still get the same problem. Unfortunately, I can't separate the computers from the radio, as the only place to set them up and use them at all just happens to be the "radio room". Still, I don't think that the radio is the source of the difficulties (as noted above).

Collapse -


by robo_dev In reply to Answers to a couple dozen ...

I'm impressed by the scope part. But I would not rule out a RF problem, myself.

And if you've been poking around with a scope, I suppose you've looked for things like ground loops, or transient noise on the power line.

While those wall-wart power supplies tend to isolate the device fairly well, they tend to fail in unpredictable ways.

Do you have good surge suppression and grounding on everything?

Inside the ethernet switch is a lookup table in memory that correlates mac addresses to switch ports. If that table gets hosed up, from a screwed up workstation or some other problem, then packets will not flow.

You are correct in that you have done the easy stuff.

Couple of thoughts:

10Mbps is more forgiving than 100Mbps, and the resonant frequency is different from an RF standpoint, since the clock speed is slower. So I would try changing the most troublesome workstations to 10Mbs. or heck put everything at 10 as a test.

Also, sometimes devices can mis-negotiate the duplex settings of ethernet....while the switch may be using 100/Full duplex, the client may be stuck at 100/Half duplex. This results in so many CRC errors that communication can halt.

Make 100% sure that there are no network loops. Most of the newer ethernet switches allow 'auto crossover' on every port, so loops are possible.

One protocol analyzer tool that's totally free is WireShark or Ethereal (same basic product). Get a laptop and an old 10/100 hub and plug it between the most troublesome workstation and the switch. A protocol analyzer would be able to give some big clues.

Collapse -

Good ideas...

by dave-morton In reply to Phew...

lol, I'm gratified over your remark about the scope. I was an electronics tech long before I got into IT, so it was actually one of my first diagnostic tests. I'll double check the grounding, and try dropping everything to 10 mips and see how that flies. I've already verified that everything has the same negotiation rates, and all are set to full duplex, so I think I'm good there. And I'll try to see if the D-link switch has a reset feature, but I don't think it does.

I've got a friend who has an old laptop, so I'll give WireShark a try and see what I find. I'll post results as I find them

This problem is certainly going to go into my "Book of Odd Problems".

And by the way, thank you so much for taking the time to help me address this issue. You've earned yourself a "Gold Star" :)

Collapse -

Glad to help

by robo_dev In reply to Good ideas...

If RF is the problem, here are some tips

Collapse -

LAN Connection Loss

by Chris Cook In reply to Intermittent loss of conn ...

I don't know if this will help you out at all as it sounds as though you have most likely already checked for it. I have occasionally had similar errors that were caused by power management options. I'm sure you know where to check, but just for shigrins I'll type it out anyway.

Device Power: Look at Power Management tab for NIC under device manager. Make sure the "Allow the computer to turn off this device to save power." option is unchecked.

Power Options: Also under Power Options within Control Panel, make sure your NIC has no power saving options assigned for it.

Again, I don't know if this will help you as you have probably checked it already, but I figured I would throw it out there just incase.

Collapse -

Results of certain actions:

by dave-morton In reply to LAN Connection Loss

Here we go, from the top.

The power leading into the APC is rather dirty, but the output is really good. Grounds seem to be slightly dirty, but well within tolerance.

Media type change to 10 mip/full duplex:
Completely lost connection to everything. Setting the entire network this way closes ALL connections down. Changed it back to Auto-sense.

Power saving options for NIC:
One had a power tab, and I made sure to un-select power saving mode. The second NIC has no power option, and in that computer's power management settings, there's no provision for power saving for the NIC, so I'll see if any of my spare NIC's has a power save option, just to be sure it's not a "hidden" feature.

Overall results:
Unchanged. Unless you take into account that I'm 12.5% more bald than before.

I do appreciate the suggestions, though. It's just that much more that I can cross off the list of things to try.

To robo_dev:
The laptop/WireShark tests will have to wait a few days till I can procure a laptop. In the meantime, is there any reason why I couldn't install it on a known good box in the network, and run it from there? I've never even heard of the app, let alone used it.

Also, I completely missed this last part, till now:
"The third thought is that your web server has been compromised (hacked) and somebody is then trying so aggressively to hack the internal machines that it's causing issues."
Since I've got all incoming port routing from the DSL going to my development box, I don't see how this can occur. The biggest security risk to the application is from outside access, so I made especially certain that you can't get to the internal web server from outside the LAN. Besides, I'm in the code for this app every day, and would notice something along those lines almost immediately. Or, at least I would HOPE to. :)

Related Discussions

Related Forums