While I have been programming most of my life, I have not spent that many years of my career as a 40 hour a week, code-writing developer. It's just how things worked out. One of the really great things about this is that I came to learn a set of skills that many developers do not have. For example, all of the time I have spent doing systems administration has given me great insight into how to solve problems with applications even when you can't see the code directly or attach a debugger to a process. At one job I had, I often had to figure out the internal logic of an application to make fix changes to the development staff... with nothing other than the HTML output and the ability to see what SQL statements it was running.
One of the skills I use a lot in my day-to-day work is network monitoring and packet captures to find issues with Web applications. I am going to show you some of the utilities and tricks I use to approach this kind of problem.
What is packet capturing?
Packet capturing is a technique where you use an application that records some or all of the network traffic on your machine. A good packet capture application will break the traffic down by the source or destination, the port or protocol, and hopefully the system process that the data is going to or coming from. An even better packet capture application will have an understanding of various protocols and break the traffic down into an appropriate view for you along the way.
What it gets us
Packet captures can be used to glean all sorts of useful information from an application as it runs. It will show you all of the network calls that are occurring under the hood of an application; for instance, I often use it to understand authentication issues. Another use is to solve problems where a library is throwing an exception around network access, but not providing nearly enough information to go on. After I started doing it a few years ago, I find myself reaching for packet capture to solve all sorts of problems for which I never thought I would use it.
My tool of choice is the Microsoft Network Monitor application (I reviewed Microsoft Network Monitor 3.3 for TechRepublic in August 2009). There are other good applications out there (Wireshark comes to mind), but I have gotten used to Microsoft Network Monitor, and it does a great job for me.
Microsoft Network Monitor (Click the image to enlarge.)
Fiddler is a similar tool that I also find quite helpful, although Fiddler is a bit different from a true network monitor. Instead of sitting on top of the NIC, it acts as a proxy server and then records what is going through. This gives it extremely in-depth looks at some things (particularly SSL traffic, which packet captures cannot decrypt), and it is optimized for viewing HTTP traffic. At the same time, it is very limited (it's not going to show you your DNS lookups, for example).
The troubleshooting process
I am not going to go in-depth with how to use the tools; each person will find a tool they like, and the tools are fairly simple to use. But the techniques you will want to use are the same.
You usually do not want to start the capture until just before the problem you are trying to fix comes up. These applications generate a ton of data, and it can be difficult to sort through, so you want to minimize what is collected as much as possible. Once the problem has occurred, I stop the capture, again, to minimize data. Next, I'll use the filters to show only traffic from my application, and then I'll apply another filter to show only traffic to the other machine. This will let me see the exact data I am looking for in the process.
If I do not see any traffic going to where I expect it to go, it usually indicates a DNS issue that needs to be addressed. Drop to a command line and try pinging the machine name, and verify that the DNS lookup is occurring correctly. Too many times I have gone nuts over a "bug," when it turned out that someone put in the wrong DNS entry.
If DNS is looking up correctly but you aren't getting any response back, it's probably a firewall issue. Use a telnet client to connect to the same machine and port as your application, and see what is happening. If it times out, there is something blocking the connection, and you'll need to engage the system administrator who handles the destination machine to get it sorted out.
If the traffic is going through just fine, but something is going wrong at the application level, your network monitor is going to be a huge help here. When you properly filter things, you'll see the exact traffic for which you are looking. And when you select that traffic, a good tool will break it down for you depending on the protocol, and let you see specific information. For example, with Microsoft Network Monitor, it will detect SOAP traffic and give me a good view into the XML being sent and SOAP-specific things like authentication information, which is clearly a huge help when working with Web services. You will have to have a good knowledge of the underlying protocols and specifications (I like rfc-editor.org for this) to figure things out, but you'll be able to directly see any error codes that your library or framework might be hiding. For example, an SMTP library might just say, "Could not send email," while the error code revealed in the packet capture will say "User does not exist" or "Rejected by spam filter," which will lead you to the proper fix.
When you are trying to debug a networked application, there is really no substitute for packet capturing as a technique. It is so successful for me that I have a habit of trying it right after I look up any error codes for these kinds of situations. I find that getting your eyes on the problem at a low level is faster than trolling search engines and hoping that out of the million results for the generic message, one of them addresses my problem. I think that with a little bit of practice, learning this technique will pay off for you too.
J.JaDisclosure of Justin's industry affiliations: Justin James has a contract with Spiceworks to write product buying guides; he has a contract with OpenAmplify, which is owned by Hapax, to write a series of blogs, tutorials, and articles; and he has a contract with OutSystems to write articles, sample code, etc.
Justin James is the Lead Architect for Conigent.