Over the years, I’ve worked on a lot of Exchange servers. In doing so, I’ve found that one of the most difficult problems to troubleshoot is when a server that appears to be configured correctly fails to deliver messages. In this Daily Drill Down, I’ll explain some of the possible causes of this problem and how to fix it.
Gathering information
It’s no big secret that Exchange is complicated. It can be tough to troubleshoot message delivery problems in a single-server environment, but the difficulty is compounded when multiple servers are involved. Therefore, before you start the troubleshooting process, you need to have a really good handle on the symptoms. It isn’t good enough to simply know that the messages aren’t getting to their destination, you need to know something about the messages. For example, here are some questions that you need to ask yourself before getting started:
- Who’s sending the messages? Is it just one person’s messages that aren’t being delivered, or is it everyone’s? If several people’s messages aren’t being delivered, are all of those people on the same server or in the same routing group?
- Where are the messages going? Find out where the messages that aren’t being delivered are going. Are they all destined for the same server or routing group? Are they all going to the Internet? Or, are the messages going to a variety of places?
- What do all of the failed messages have in common? What you’re really looking for is the smoking gun that points to the cause of the failure. You can find that smoking gun by looking for something that all of the failed messages have in common. If the problem appears to be completely random, check to see if the failed messages all have attachments. I’ve seen situations in which a setting gets changed, and messages with attachments fail to get delivered.
An individual server failure
If you notice that all of the failed messages seem to be coming from or going to one individual server, then you probably don’t have a routing problem. Of course the exception would be a situation where this one individual server was in its own routing group.
If an individual server appears to be failing, the first thing you should check is the server’s Message Transfer Agent (MTA) queues. Make sure the server’s Microsoft Exchange MTA Stacks service is running. (While you’re at it, ensure that the other services are running, as well.) The MTA is often the culprit. If the MTA is running, then try stopping and restarting the service. Many times, doing so will flush the MTA queues and make message delivery begin to function again.
If message delivery still fails after stopping and restarting the Exchange MTA Stacks service, you need to determine whether you have a routing problem or some other type of problem. The easiest way of testing for this is to create two test accounts on the ailing server. Use the first test account to send a message to an Internet user (not an Exchange user’s SMTP address but an unrelated Internet e-mail account such as someone’s Hotmail account). Next, send a test message to an Exchange user on a different server. Finally, send a test message to the other test account that you created.
The message that went to the other test account should always be delivered, because there’s no external routing involved. If this message fails to be delivered, you’ve got a more serious problem that’s beyond the scope of this article. If both the Internet message and the message destined to another server fail to be delivered, then you’ve got a total routing breakdown on that server. If one type of message or the other makes it through, your MTA is working, and the problem lies elsewhere.
If neither type of message is being delivered, check your system resources. I’ve seen Exchange servers become so bogged down with user activity that messages just sit in the MTA queue waiting to be serviced. By the time the server can send the message, the message has already timed out. Another possible cause of internal mail flow working but external mail not working is a bad network card. Try pinging another Exchange server within your routing group to make sure that it’s accessible. You might also have someone who uses another server send a message to the test account that you’ve created to see if the problem only exists with outbound messages or if inbound mail is affected, as well.
Internet mail failures
If you’ve determined that the mail delivery failures are limited strictly to Internet (SMTP) mail, then that’s actually good news, because it means that your internal routing structure is probably Okay. Knowing that the problem is Internet related is also good, because it narrows down the possible causes of the problem.
To solve this type of problem, you’ve got to remember the way that Exchange handles Internet mail traffic. Usually, organizations will have one server per site that’s responsible for handling Internet mail traffic. Of course in smaller, or more budget-strapped, organizations there may only be one such server in the entire organization, which really makes it easy to find the possible problem.
The first step in fixing the problem is to locate the server that’s responsible for managing Internet mail traffic. Before I show you how to fix the problem though, I should mention that I’m assuming that Internet mail has worked before and that you’re not setting it up for the first time. There are many things that can go wrong with Internet mail on an initial setup that are also beyond the scope of this article.
Once you’ve located the server that’s responsible for Internet traffic, you should verify Internet connectivity. To do so, pick an external e-mail server that you have access to, such as Hotmail. Now, ping the external mail server by opening a command prompt and typing ping www.hotmail.com. If the ping returns, then Internet connectivity from the server is good. Keep in mind, though, that Microsoft Proxy Server 2.0 won’t allow clients to ping through the proxy server. Therefore, if the ping fails, you might try to open Internet Explorer and go directly to the site before writing the problem off as an Internet connectivity problem.
Once you’ve verified that Internet access is available from the server in question, create a test account on the server and try to send a message to your external Internet mail account. If the message makes it through, have other users on the server try to send a message. If their messages make it through, you have some sort of internal routing problem. If none of their messages make it through, there’s a problem with the server’s SMTP connector.
You can look for potential problems by opening the Exchange System Manager and navigating through the console tree to your organization | Administrative Groups | your group | Connectors | your SMTP connector. Then, right click the connector and select Properties from the resulting context menu. When you do, you’ll see the connector’s properties sheet.
As you browse through the connector’s properties sheet, there are a few things that you can look for. First, make sure the local bridgehead has a value. In many organizations, the local bridgehead is the server that contains the SMTP connecter, but this isn’t always the case. The SMTP connector must also have at least one associated address space or connected routing group, so it’s a good idea to verify these values, as well.
Routing failures
If your main problem is that message flow doesn’t work between servers within your organization, then the problem is probably related to your internal routing structure. To make sure, go to one of the servers and use a test account to send a message to a test account on another server. If the message fails to arrive at its destination, then open a command prompt window on the server that you sent the message from and ping the destination server. This will verify that the physical network is functional, since ping is a TCP/IP command rather than a part of Exchange.
If the ping fails, you probably have a problem with the physical network, such as a bad network card or a failed router, rather than a problem with Exchange. You can try browsing My Network Places and accessing the destination server to be sure.
If the low level communications work, then the problem is with Exchange. The problem could either be at the routing group level or at the routing group connector level. Remember that routing group connectors are very important. Regardless of how many routing groups that you’ve defined, the routing groups won’t be able to communicate with each other through Exchange unless you’ve provided them with the appropriate routing group connectors. Therefore, you should verify that all of your routing group connectors still exist and that they are configured correctly.
As you look over your routing group connectors, there are a few things to keep in mind. Routing group connectors are logical connectors between two servers, one in each routing group. These servers act as bridgehead servers that move messages in and out of the routing group. Therefore, as you look at your routing group connectors, make sure the servers that are defined still exist, are currently online, and are capable of routing the typical volume of traffic that the routing group generates.
Routing group connectors also require a permanent, stable, high-bandwidth connection between the two bridgehead servers. That’s one of the reasons why I had you to do the communications test earlier. If the routing group connectors appear to be configured correctly, try deleting them and re-creating them. I have seen routing group connectors fail for no apparent reason. Often in these situations, deleting and re-creating the connector fixes the problem. Just be sure to write down the configuration information from the old connector before you delete it. When you create the new connector, you’ll also need to give the organization time to replicate the change to all of the Exchange servers before you test it.
At one of the places that I used to work, we had dozens of Exchange servers that serviced tens of thousands of users. In this environment, routing failures were common. To help quickly diagnose the problem, a coworker devised a test. Each server had a mailbox called SERVERTEST. The idea behind this mailbox was that we created a group that consisted of the SERVERTEST mailbox on each server in the organization. From anywhere, we could send a test message to this group. Each mailbox was configured with a rule that would tell the servers to reply to a message with a confirmation note. Thus, we could send a single message and get a reply back from every server. If we didn’t get a reply within a few minutes, we knew that a server had a problem.
Another interesting thing about this test was that we knew which servers were separated by which routers. Therefore, if a group of servers failed to respond to our test, then we could see if network traffic to those servers all passed through a common router, hub, or network segment. This made it easy to locate the source of the problem.
Conclusion
One of the more complex types of Exchange problems to troubleshoot is that of messages not making it to their destination. I’ve explained some possible causes for such a problem and hope that this information will help you diagnose and fix problems with your Exchange servers.