Networking

Avoid the most common culprits for single points of failure on small to midsize networks

Derek Schauland shares a recent hiccup on his network and how it spurred him to revisit areas where there could be single points of failure. Here are some common culprits to address in your business continuity planning.

In my organization -- a small office -- we use Active Directory. Until recently, the environment consisted of one local and one remote site with one Domain Controller (DC) each, providing service for about 65 users total and serving up everything from file and print services for both sites to e-mail. The remote location has about five users, and everyone is close by, so the single domain controller works quite well there.

Here at the corporate office, the remaining population of about 60 users is connected to a single domain controller. This setup faithfully plugged along, handling authentication and all the directory services we could ask for -- until this week.

One morning recently when I arrived at the office, there were several users ready to let me know that they didn't have access to the services they needed.

My initial investigation revealed that DNS was nonfunctional. Also, the DC itself was very sluggish and seemed like it might need a restart. At first, the restart was to see if the kinks would go away and allow me to dig in to the issue further, but when the system came back up, everything picked up and the Active Directory load flowed again. People were able to log in, and drive mappings started working again. Because of Active Directory's heavy dependence on DNS, when DNS went down, everything else went with it.

Being a small shop by most standards, the idea of the single point of failure was there, but it didn't really seem like it could be a major problem. After all, we have a domain controller at the remote site and this should be quite sufficient. Well, this would be true if the link leaving the corporate office were faster, but trying to send replication traffic and additional requests for login over the WAN would have been a nightmare.

The restart got everything back online as quickly as possible, but I wasn't satisfied with knowing that under any heavy load, the issue could easily come back and take the organization offline. At first I thought about ordering a new server in order to get another DC set up, but even though servers are cheap, they're not free and don't materialize upon request, so I started to take stock of some of the other servers we have running in our environment.

One of these boxes used to run all kinds of things for the Web, but we moved those sites out to a host in the cloud to speed up access to them. Doing this left a server with a good amount of horsepower and not much work to do, making it a perfect candidate for our next DC.

Better performance with more infrastructure

Now that Active Directory runs on two domain controllers at our main site and both of them host the integrated DNS zone for our organization, the likelihood of a complete downtime has diminished. Also improved are the authentication for all users in the main site and access to resources here and on the Internet.

Outside of AD, I use Desktop Authority from ScriptLogic to manage the user environment, providing a one-stop place for printer and drive management and things of that nature. Since I was adding another DC to the directory, I also installed the Desktop Authority services there to ensure everything that typically processed during logon had no excuse not to attempt running when the users logged on.

In addition to getting another DNS Server/DC running on the network, I also added the role of Global Catalog to the new DC. This should allow for all aspects of AD to function continually if one of the DCs here were to go down.

Network areas that need particular attention

In many Windows environments, Active Directory plays a starring role and missteps in configuration or not planning for enough resources can bring things crashing to a halt. But there are other areas, even on a small or midsize network, that can become single points of failure if you aren't careful. Here are a few to watch out for:

Network Switches: Depending on the user count in an organization, keeping spare switches online might not be feasible; however, it is recommended to keep a couple spare switches around in case something happens to cause a failure. Tape Drives: Backup and recovery is fundamental in the IT world; without a good (and regularly tested) backup, the data in an environment is only as good as the weakest link. In my organization, I have two tape drives. We are small enough that one tape covers all the backup jobs, but in the event that one drive goes down, I do not need to worry about not being able to restore from a previous backup if there is a catastrophic event. Network Interface Cards (NICs): Most servers today ship with multiple NICs, which is good for both improved connectivity when using both and failover if one of the cards in a server (or other box) fails. Internet Connections: As dependent as society is on the Internet, having redundant connections, depending on the size of an organization and its business model, may be a key component in preventing a single point of failure. Smaller businesses outside of the technology industry may not be able to justify the cost of keeping a connection with two providers active, but it couldn't hurt to have a contact at multiple providers and possibly discuss what you would need to get up and running if your main provider were down.

The list I provided here is not all inclusive, but for most organizations these are things that should be considered in planning for the worst. Planning for redundancy will always seem like overkill to some people when things are working normally, but not planning for components to fail will surely result in those same people looking to you when there's unexpected downtime.

Lessons learned

This ordeal was a major one for our organization, even though it was cleaned up and corrected fairly quickly. I am glad I caught this when I did, but I will admit I wish I had gone the route of the additional domain controller prior to the outage. Doing so would likely have prevented this issue. Working in a one-man IT shop makes some of the tasks that need to get accomplished more difficult or likely to be postponed while you're putting out other fires. But the consequences of not planning for every contingency will always be worse than making the time to address single points of failure on your network.

Need help configuring, administering, supporting, and optimizing network infrastructure? Then turn to our free Network Administration Newsletter. Automatically sign up today!

About

Derek Schauland has been tinkering with Windows systems since 1997. He has supported Windows NT 4, worked phone support for an ISP, and is currently the IT Manager for a manufacturing company in Wisconsin.

12 comments
jakesty
jakesty

Personally I'd use a simple virtual server like MS Virtual Server 2005 and install it on either one of your available systems, or it can even live on an XP computer. This makes it portable. We have about 150 users and have 2 virtual machines running as DCs w/DNS and one physical machine as a DC. The network performance is never a problem. Remember only one GC per site. So go ahead and activate it on your other location if you'd like. As far as tape drives and tapes, the cost is fairly high for this solution and USB/esata drives looks to be a faster and cheaper set up. Even just running a .bkf from NTBackup works just fine. Another thing I found is that even a good switch has overhead. If you need to move a lot of data between servers, install gigabit NICs and direct connect the two systems. Your bottleneck will be the 7200 rpm drives, so you might consider upgrading to a faster throughput to saturate the pipe.

christopher.smith
christopher.smith

I work for a major carrier supporting indirect channel sales. This article is of value to many of the consultants I work with that sell into this space. I would like to suggest a follow up to this article that touches upon a couple of points mentioned (or not mentioned: 1) the organization in the article has 2 sites. how are they connected? Would a private network ( via MPLS technology or private line) aid or add complexity? 2) in adding a secondary Internet connection as recommended, what is the best approach? To fully utilize a multi-homed environment requires BGP to be deployed. This adds a lot of complexity, as well as cost / router horsepower, to the SMB network administrator to manage.

lguillot
lguillot

nice, simple solution, thanks

pcplodpc
pcplodpc

These are some essential tips for SMBs, especially viz a secondary domain controller. One additional issue I would mention, based on bitter experience, is that you may well find that a backup tape may not read on a replacement drive even if it is an identical make/model. I have my own theory that this is more likely with a helical drive e.g. DAT than with a linear one such as DLT. Oh, and stay way from library mechanisms unless you have a masochistic streak.

casternj
casternj

how did you setup DNS servers on the DC's network card server1: 1st: server1 2nd:server2 server2: 1st:server2 2nd:server1

critch
critch

With ebay, you can usually afford to pick a cheap, used switch/NICs to sit on the shelf... As far as tape drives, I aim for external drives and try to have 2 of the same on my servers.. if one croaks or the server driving croaks, you have a fall back to buy you time to fix the issue more permanently. Might be a real PITA but at least your A is covered...

bandman
bandman

I'm nearly to the point that I just blame DNS defacto for every odd problem on my network. It's caused so many issues for me that everything from slow connections to certificate errors to...well, everything else cause me to look suspiciously at it.

Derek Schauland
Derek Schauland

My DNS is set up to point to itself first and then the other... ex server1: server1 | server2

Walmone-Hadwor
Walmone-Hadwor

In any Network Marketing business you've probably heard your sponsor/up line say that "Network Marketing is nothing more than recommending & promoting something that you like". When we go to see a great film at the cinema, we can't wait to tell our friends and family how great it was, and how they should waste no time and go and see it straight way. Or maybe you and your partner have just tried the new restaurant in town and have had a fabulous meal, you can't help yourself recommending the new restaurant to everyone you come into contact with. Of course we don't benefit financially if they go to see the film you recommended, and we don't earn a commission if they buy popcorn at the concession stand, nor do we get given a free entry ticket as a thank-you from the manager next time we call, or a bottle of wine on the house from the new restaurant, no chance it doesn't work like that!! _______________ http://www.ppt2swfsdk.com

Derek Schauland
Derek Schauland

Thanks for the feedback everyone. It seems that the things that cause headaches make the best posts to help others. Too bad it takes experiencing these things to get the story right.

rkuhn040172
rkuhn040172

I'd rather see more articles written like this than the typical OS flame war articles.

Editor's Picks