Questions

Windows XP Internetworking problems

+
0 Votes
Locked

Windows XP Internetworking problems

ScarF
Hi guys,

I have a very peculiar but interesting problem with two of my workstations in a 45 NIC LAN. I tried to find a resolution for this problem but, nothing helped. I really hope to find an answer here.

The configuration:

- 39 workstations, all MS Windows XP Pro SP3
- 6 servers - mixed: MS SBS 2003 R2, MS 2003 SRV R2, Linux (Debian and RedHat)
- 2 unmanaged 3COM switches with 24 ports each
- 1 Internet connection used for Internet access and one VPN tunnel with a second branch for data transfer and replication
- The Internet connection is ADSL using a CISCO router 8xx series and a SecureComputing SG-560 firewall
- All the workstations receive dynamic (reserved) IP configuration from the DHCP configured on the MS SBS server; the IPs are in a non-routable class

The problem:

Two workstations lose the Internet connection but not the gateway, while the LAN data transfer works excellent. As symptom, pinging a NIC on LAN - the gateway, returns <1 MS response time without interruptions; pinging a host on the Internet (www.google.com) works fine but, after a random time, it fails for some minutes returning "Time Out" (not Destination Host Unreachable"); than it restarts returning good response times, by itself. I need to emphasize that the gateway is present in the configuration (ipconfig) all the time.
The problem is really annoying since it disrupts the Internet access for these workstations.

What I tried:
- connected the workstations directly to the SG-560 firewall on its own 4-port switch
- changed the IPs
- reinstalled the NIC's driver
- replaced the NIC using a new IP; the NIC was a different brand so it used another driver
- reset TCP/IP stack (using netsh command)
- reset WIN stack (using netsh command)
- cleared the ARP cache
- power cycled all the networking equipments

So far, nothing helped.

NOTE: the workstations don't differ in any way by the other workstations in the LAN, as configuration and updates. All the workstations are IBM or HP. the workstations with problems are: one IBM, the other HP.

Should any of you have any ideas, please let me know. I really appreciate any help since I reached the bottom of the sack.

Thank you very much.
  • +
    0 Votes

    Connect the computers in question directly to the internet and see if it drops the connection. If not then it might be your switch that could be at fault.

    Please post back if you have any more problems or questions.

    +
    0 Votes
    ScarF

    Thank you for your answer. I have already tried this by connecting the workstations directly to the firewall's switch. So, I bypassed the LAN switches but it didn't help.
    The SG-560 firewall has a built-in 4-port switch. Normally, I use a single port to connect the LAN to the Internet.
    On the other hand, on the LAN, the workstations are connected to different switches.
    The problem, as I see it now, is at the OS level. The troubleshooting I tried changed everything at the physical level, but not at the OS level - except the reset of TCP/IP and Win stacks.

    +
    0 Votes
    Kenone

    When you connected the workstations directly to the firewall did you bring the workstations to the firewall or just move wires around in the closet?
    Have you done a thorough scan for malware/viruses?

    +
    0 Votes
    ScarF

    Q: When you connected the workstations directly to the firewall did you bring the workstations to the firewall or just move wires around in the closet?

    A: Being relatively close to the networking rack, I drew a long patch cord directly from the workstation to the firewall

    Q: Have you done a thorough scan for malware/viruses?

    A: Yes. I will not detail this, but I used several antivirus/antimalware apps as well as a couple of system apps. The workstations are free of malware.

    +
    0 Votes
    Dumphrey

    cisco 8XX? IIRC they are reccomended for 25ish users, you may be overloading the connections and so the router is dropping them.

    The fact that they can reach the 800 but not go past makes me think its the router, either a firewall rule or acl. Have you tried assigning a new IP to either machine and check if it works with a new IP? A faulting switch port or cable could be causing malformed packets as well which the router would drop on inspection (depending on your setup, outgoing traffic may not be getting filtered).

    Is the service loss consistent or only when others are on the lan as well?

    And since you are using the SG as a firewall, are firewall services on the 800 turned off?

    Any logs on the SG?

    +
    0 Votes
    ScarF

    The 800 is used as ADSL router, only. The 800 is connected directly to the internet, then comes the SG firewall which is the LAN GW, then the LAN itself. The 800 is SG's gateway. The SG uses NAT for the LAN.
    The logs are on and I log everything on both devices. Any of the logs doesn't present anything particular wrong with the packets coming from these two workstations.
    The service loss is random and is not related to the presence of other workstations. There is no difference between these two wks and the others in regard with the Internet usage. They are all connected to the Internet without any restrictions on the SG or Windows firewalls.
    The firewall on 800 is off, indeed.

    And, btw, may be a Windows Firewall problem? As symptom, it is like the firewall refuses any communication outside the LAN for some minutes. When comes to the way Windows Firewall works - not configuration, I am in deep fog.

    +
    0 Votes
    ScarF

    The 800 is used as ADSL router, only. The 800 is connected directly to the internet, then comes the SG firewall which is the LAN GW, then the LAN itself. The 800 is SG's gateway. The SG uses NAT for the LAN.
    The logs are on and I log everything on both devices. Any of the logs doesn't present anything particular wrong with the packets coming from these two workstations.
    The service loss is random and is not related to the presence of other workstations. There is no difference between these two wks and the others in regard with the Internet usage. They are all connected to the Internet without any restrictions on the SG or Windows firewalls.
    The firewall on 800 is off, indeed.

    And, btw, may be a Windows Firewall problem? As symptom, it is like the firewall refuses any communication outside the LAN for some minutes. When comes to the way Windows Firewall works - not configuration, I am in deep fog.

    +
    0 Votes
    Dumphrey

    of some type. And its ONLY these two specific machines? (If its only these two machines, I would look for MAC filtering or IP filtering on the SG, and or replace cables and move switch ports.)

    Turn off the XP firewall and see if that fixes it. Easy enough to do as a test.

    As a curiosity, what features in a firewall does the SG offer over using the 800's os firewall set?

    Do either of the machie logs mention tcp/ip conection maxed? Shouldn't happen with only 30 or so machines, but its possible.

    +
    0 Votes
    ScarF

    Dumphrey, thank you very much for your interest in this problem.
    In the mean time, I have tested the Windows Firewall by stopping this service for both workstations, but with no result. They still report time out when pinging an internet host. This happens randomly for a 2 - 3 minutes period. Then they resume the normal communications.
    Pinging from another workstation, gives me normal response times. Not even longer ones.
    The SG appliance doesn't have MAC filtering. Regarding a problem with the IP filtering, I tried to troubleshoot the problem by changing the IP. I had even changed the NICs and IPs, without result.
    To answer your curiosity, the SG doesn't present much more than Cisco 800 except that the Cisco is under the ISP control, being used as a "DSL modem". However, I can see the logs on Cisco because it is configured (by the ISP) to send the log to a remote workstation.
    And, for the final question, there is no maxed connection reported by either of the devices.
    Hm. The more I look into this problem, the weirder it looks. Interesting enough that the problem started suddenly on one workstation at first, than on the other - one month later, without any visible cause. I looked at all the Windows updates and other events. As I said before, all the workstations are at the same OS level, but only these two have problems. They are good workstations, and there are more workstations exactly the same on the LAN, working without problems.

    +
    0 Votes
    Kenone

    Not enough IP addresses or a DNS conflict. IP shouldn't be an issue with that few machines though and he's checked then dns settings.

    +
    0 Votes
    ScarF

    Finding that more workstations started to have problems with the Internet, I focused my investigation at the router level. I found that the SG-560 appliance has moments when it doesn't respond to ping nor delivers the web management console for 2-3 minutes. However, other workstations on the LAN where able to browse the Internet during these periods of time. Quite weird.
    Briefly, I felt that the problems may come from the VPN tunnel. So, I modified the MTU value for the IPSec VPN tunnel from "undefined" to 1300.
    Now, everything works fine.
    Thank you everybody for all the help.

  • +
    0 Votes

    Connect the computers in question directly to the internet and see if it drops the connection. If not then it might be your switch that could be at fault.

    Please post back if you have any more problems or questions.

    +
    0 Votes
    ScarF

    Thank you for your answer. I have already tried this by connecting the workstations directly to the firewall's switch. So, I bypassed the LAN switches but it didn't help.
    The SG-560 firewall has a built-in 4-port switch. Normally, I use a single port to connect the LAN to the Internet.
    On the other hand, on the LAN, the workstations are connected to different switches.
    The problem, as I see it now, is at the OS level. The troubleshooting I tried changed everything at the physical level, but not at the OS level - except the reset of TCP/IP and Win stacks.

    +
    0 Votes
    Kenone

    When you connected the workstations directly to the firewall did you bring the workstations to the firewall or just move wires around in the closet?
    Have you done a thorough scan for malware/viruses?

    +
    0 Votes
    ScarF

    Q: When you connected the workstations directly to the firewall did you bring the workstations to the firewall or just move wires around in the closet?

    A: Being relatively close to the networking rack, I drew a long patch cord directly from the workstation to the firewall

    Q: Have you done a thorough scan for malware/viruses?

    A: Yes. I will not detail this, but I used several antivirus/antimalware apps as well as a couple of system apps. The workstations are free of malware.

    +
    0 Votes
    Dumphrey

    cisco 8XX? IIRC they are reccomended for 25ish users, you may be overloading the connections and so the router is dropping them.

    The fact that they can reach the 800 but not go past makes me think its the router, either a firewall rule or acl. Have you tried assigning a new IP to either machine and check if it works with a new IP? A faulting switch port or cable could be causing malformed packets as well which the router would drop on inspection (depending on your setup, outgoing traffic may not be getting filtered).

    Is the service loss consistent or only when others are on the lan as well?

    And since you are using the SG as a firewall, are firewall services on the 800 turned off?

    Any logs on the SG?

    +
    0 Votes
    ScarF

    The 800 is used as ADSL router, only. The 800 is connected directly to the internet, then comes the SG firewall which is the LAN GW, then the LAN itself. The 800 is SG's gateway. The SG uses NAT for the LAN.
    The logs are on and I log everything on both devices. Any of the logs doesn't present anything particular wrong with the packets coming from these two workstations.
    The service loss is random and is not related to the presence of other workstations. There is no difference between these two wks and the others in regard with the Internet usage. They are all connected to the Internet without any restrictions on the SG or Windows firewalls.
    The firewall on 800 is off, indeed.

    And, btw, may be a Windows Firewall problem? As symptom, it is like the firewall refuses any communication outside the LAN for some minutes. When comes to the way Windows Firewall works - not configuration, I am in deep fog.

    +
    0 Votes
    ScarF

    The 800 is used as ADSL router, only. The 800 is connected directly to the internet, then comes the SG firewall which is the LAN GW, then the LAN itself. The 800 is SG's gateway. The SG uses NAT for the LAN.
    The logs are on and I log everything on both devices. Any of the logs doesn't present anything particular wrong with the packets coming from these two workstations.
    The service loss is random and is not related to the presence of other workstations. There is no difference between these two wks and the others in regard with the Internet usage. They are all connected to the Internet without any restrictions on the SG or Windows firewalls.
    The firewall on 800 is off, indeed.

    And, btw, may be a Windows Firewall problem? As symptom, it is like the firewall refuses any communication outside the LAN for some minutes. When comes to the way Windows Firewall works - not configuration, I am in deep fog.

    +
    0 Votes
    Dumphrey

    of some type. And its ONLY these two specific machines? (If its only these two machines, I would look for MAC filtering or IP filtering on the SG, and or replace cables and move switch ports.)

    Turn off the XP firewall and see if that fixes it. Easy enough to do as a test.

    As a curiosity, what features in a firewall does the SG offer over using the 800's os firewall set?

    Do either of the machie logs mention tcp/ip conection maxed? Shouldn't happen with only 30 or so machines, but its possible.

    +
    0 Votes
    ScarF

    Dumphrey, thank you very much for your interest in this problem.
    In the mean time, I have tested the Windows Firewall by stopping this service for both workstations, but with no result. They still report time out when pinging an internet host. This happens randomly for a 2 - 3 minutes period. Then they resume the normal communications.
    Pinging from another workstation, gives me normal response times. Not even longer ones.
    The SG appliance doesn't have MAC filtering. Regarding a problem with the IP filtering, I tried to troubleshoot the problem by changing the IP. I had even changed the NICs and IPs, without result.
    To answer your curiosity, the SG doesn't present much more than Cisco 800 except that the Cisco is under the ISP control, being used as a "DSL modem". However, I can see the logs on Cisco because it is configured (by the ISP) to send the log to a remote workstation.
    And, for the final question, there is no maxed connection reported by either of the devices.
    Hm. The more I look into this problem, the weirder it looks. Interesting enough that the problem started suddenly on one workstation at first, than on the other - one month later, without any visible cause. I looked at all the Windows updates and other events. As I said before, all the workstations are at the same OS level, but only these two have problems. They are good workstations, and there are more workstations exactly the same on the LAN, working without problems.

    +
    0 Votes
    Kenone

    Not enough IP addresses or a DNS conflict. IP shouldn't be an issue with that few machines though and he's checked then dns settings.

    +
    0 Votes
    ScarF

    Finding that more workstations started to have problems with the Internet, I focused my investigation at the router level. I found that the SG-560 appliance has moments when it doesn't respond to ping nor delivers the web management console for 2-3 minutes. However, other workstations on the LAN where able to browse the Internet during these periods of time. Quite weird.
    Briefly, I felt that the problems may come from the VPN tunnel. So, I modified the MTU value for the IPSec VPN tunnel from "undefined" to 1300.
    Now, everything works fine.
    Thank you everybody for all the help.