Web Development

SonicWALL routers and dropped ARP packets

Mark Pimperton describes how more secure handling of ARP packets by a new router caused a baffling loss of both Internet connections after 15 minutes.

We recently upgraded our router from a Zyxel Zywall 35 to a SonicWALL NSA 240. (We have two Internet connections and our venerable Zyxel was unable to cope with rising demand. Every so often the CPU would hit 100% and then we'd lose connectivity on both connections.)

During configuration, testing and initial deployment of the SonicWALL all seemed well. It was only when we went live that things unraveled. Web browsing was very slow - a real disappointment for Day 1! After a while we figured we had a DNS problem because all our nslookups, pings and tracerts to external sites were failing. We played around with DNS settings on the SonicWALL, but we knew they shouldn't have been relevant because DNS requests from users are handled by our DNS server. (The SonicWALL uses its own DNS settings to resolve names in reports, for example, but ordinary Web browsing requests should be handled by the DNS server.)

Security services

The router included bundled subscriptions to SonicWALL security services (e.g. content filtering) but our intention was to operate with all those switched off in the first instance in case of performance problems. I checked and found one of them still switched on in one of the zones. I switched it off and - bingo! Our DNS and browsing all came to life again.

Unfortunately it all broke again a few minutes later.

To make matters worse, I then realised our Exchange server wasn't sending any email out. Opening the Exchange Queue Viewer showed a stack of undelivered messages with - guess what - DNS failures.

I searched discussion forums and took some comfort from apparently not being the only one, but the thread I found didn't offer me a solution. We went back to checking our settings, including NAT Policies. There was one we weren't sure about so we disabled it. Everything started to work again, and our email was flowing once more. 15 minutes later, it all broke again. Eventually we realised that making any setting change on the SonicWALL - enabling or disabling a rule or a policy - would fix it for about 15 minutes.

Curiouser and curiouser, as they say.

I logged a support case with SonicWALL and also posted on the Spiceworks community. Responses from the community led me to think we'd cracked it and that it was caused by packet splitting when spilling over from one WAN to the other. Unfortunately that proved to be a dead end as well.

We could tell it was something to do with having two WAN connections because when we ran on only one (which was our faster one), everything was fine. It was when we reconnected the secondary connection that it would start to fail.

ARP packets

We tried a few other changes - like deleting a route policy that forced all HTTPS traffic to use WAN1, regardless of load balancing settings - to no avail.

Finally SonicWALL support came up with the goods. Their knowlegebase article describes our problem exactly, and it's something our old Zyxel was blissfully unaware of. Evidently our secondary ISP sends ARP (Address Resolution Protocol) requests to check which of our static IP addresses are in use. The SonicWALL detects these requests as coming from an unknown subnet and promptly drops them as this is regarded as a security risk. After a while (about 15 minutes in our case), the ISP's ARP cache no longer has any record of how to reach us so doesn't know where to send packets we should receive. Result: No connectivity for that ISP.

Because of the load balancing between our two connections, whenever the primary connection reached the preset threshold, the SonicWALL would stop using it for new connections and try to use the secondary connection - which was broken. Hence we lost both connections, and it was just like the bad old days with the Zyxel. Only more frequent.

The SonicWALL article describes three steps to diagnosing and fixing this problem:

  1. Using a hidden option to send "gratuitous ARP requests" from the router to restore connectivity. (This seems to be what we were effectively doing when we made setting changes, though we didn't realise it.)
  2. Using Packet Capture to see the incoming ARP requests being dropped. Just like the article shows, I could see the relevant IP address and the packets being rejected.
  3. Adding a static route to tell the SonicWALL that requests from this IP address are acceptable.

Finally we could load-balance, browse and send email without problems. The article does warn that if the ISP ever changes the source IP address for the ARP packets we'll hit the same problem - but this time we'll be prepared and can just change the static route.

Summary

This was easy to fix once we found the relevant article but I did begin to wonder if I'd bought a bad router! I'm no expert on networking but I've learnt that ARP requests are important and that normally you'd only see them on your internal LAN. Incoming requests from an unrecognized address will be dropped and if they're from your ISP your connection will break.

About

Mark Pimperton BSc PhD has worked for a small UK electronics manufacturer for over 20 years in areas as diverse as engineering, technical sales, publications, and marketing. He's been involved in IT since 1999, when he project-managed implementation ...

9 comments
jbreitwieser
jbreitwieser

Mark, Great piece and glad you found resolution for the issue in our knowledge base (KB) library (http://www.sonicwall.com/us/support/2213.html) helpful. The KB is probably one of our most accessed websites and an amazing resource. On another note - you are right that ARP can be confusing. There is a great background info on this issue and the feature you are referring to. It's an extra we added for customers using specific ISP hardware that behaves differently from others. This is something that we added as a courtesy to our customers who used to experience this issue with some ISP gear from Verizon (FIOS). Please note that our product is a strict firewall and not another router. Here is a thread in the Spiceworks community that references your issue: http://community.spiceworks.com/topic/115863-fios-business When you login here (http://sonicdts.eng.sonicwall.com/update_bug.asp?jobid=82297), you get access to the DTS we created to fix the issue with a workaround on our end for those affected customers, because we want to make sure our products work well for them. You will see the comments from the developer about this not being a SonicWALL issue. The solution we provided in the KB article is simply aimed at resolving this issue for our customers with specific ISP appliances. Please feel free to let me know if you would like to discuss further and we'll engage a Senior Support Engineer to reach out to you and take care of this. Thanks again for the great piece! Jock Breitwieser Director Global Public Relations Dell | SonicWALL office +1 408 800 JOCK (5625)

jf555
jf555

Same problem, solved it by setting up load balancing and using the DynDNS.org as the DNS server for both ISP, no problem, everything works as designed

jfuller05
jfuller05

We run sonicwall at my work. Our old sonicwall finally died (psu), so we had to purchase a new one http://www.sonicwall.com/us/products/TZ_210.html In between the sonicwalls, we used a dsl router for the day and 1/2 we had to wait for our new device. Anyway, I had a problem with the install. I followed the directions step by step, which resulted in slow lan and wan speeds. The network was sluggish to say the least. I left it alone for about 15 minutes thinking the sluggishness was due to network adaption, but even after the long wait performance was the same. Working with the sonicwall knowledge base and my fellow techs didn't resolve the issue. One thought came to mind. Change the LAN network id from 192.168.10 (the nid we used previously on the old sonicwall) to 192.168.1 Surprisingly that did work. I'm not sure why, but after the nid change, network performance was excellent.

mark1408
mark1408

First, it's nice to know that someone at SonicWALL takes notice of a blog like this. I also appreciate the link to the discussion on the problems in the USA with a different ISP. Not quite sure why you pointed me at your DTS site but I of course agree that it's an ISP rather than a SonicWALL issue.

mark1408
mark1408

Like @pgit says, this sounds like a different problem. It's not so much about who you use for DNS as whether your ISP sends those ARP requests and if so, where from.

pgit
pgit

Was the ARP pinging coming from the ISP's DNS servers?? If not, I don't see how changing to DynDNS, OpenDNS, google etc would help... The packets would still get dropped and the ARP cache purged.

pgit
pgit

When you set the internal network address space, scripts should handle altering all related rules. It sounds like changing the nid effected only that id. BTW I have found that a lot of the hardware of this sort, routers with a lot of firmware (especially home built systems like dd wrt) needs to be restarted after making changes like subnet, even when no restart is (supposedly) needed. In fact not a few times I have had to make the same change, and restart to get it to stick, more than once. I also learned the hard way (with a much older sonic wall, a good while ago) that sometimes making a simple change in settings actually screws up the firmware and can look like it should work, but doesn't. In these cases there's no way to trouble shoot, the settings are 'lying' to you. A reset to factory setting almost always cleared that up, in fact the only routers I've seen bricked were running dd wrt. Point being, I don't hesitate to go for the reset button early on in troubleshooting these things. Saved me a ton of headaches over the years. Don't rule out just wiping the settings and starting over from scratch, especially if you've triple-checked all the settings and everything looks as it should, but it's just not working. BTW when I get the call I put in a smoothwall, running on older but more robust computer hardware that can usually be had for free, it has tons of overhead in every department; load balancing, filtering, DNS cache etc. Put smoothwall on a P-4 with 512 mb ram and a few gigabit NICs and you're on your way to the triple crown. One big advantage is the hardware is "user serviceable" as they say, you normally don't brick an entire computer. It's a great way to utilize that stack of 40 GB hard drives on the shelf, too.

jfuller05
jfuller05

we rebooted the sonicwall device a few times, but that didn't fix the issue. I guess changing the network id solved the issue for the reason you gave. I don't like that the settings can 'lie' to me. :\ Before you reset to factory settings, do you make a backup of the current sonicwall settings, reset to factory, then restore your backup of sonicwall settings? Does that still fix the issue? Or will restoring the sonicwall settings resurrect the problem?

pgit
pgit

I don't know why backup->reset->restore didn't work, but it didn't. In every case I had to just write down the settings and go back through setup and redo everything manually.

Editor's Picks