Many computer engineers will boast of their vast knowledge of operating systems, but I am the first to admit I know absolutely nothing about Linux. To be completely honest, I still struggle with the pronunciation. Is it |Line ix| or |Lin ix|? I guess I am just a hardcore Microsoft/Novell guy. However, after last week’s disaster, I have a newfound respect for this penguin-based operating system.
The network background
The network at my company is Novell 5.0 using NDS for NT. We have a single domain, an aging PDC, and a couple of BDCs. There are several stand-alone servers including a Microsoft Exchange 5.5 server.
At this time, upper management has decided not to implement Microsoft Windows 2000. However, I am working on the IT director in hopes to sway this decision.
Here’s the symptom
At the close of business, I heard an announcement that our developing team was bringing down the Costguard server, a billing application used in my office. Because this application is very configurable, the developers often change design, which requires a reboot. I thought nothing of the announcement, and ushered myself toward the parking lot. My wife and I were having a celebration, and the steaks were on the grill!
But as you can guess, it never fails. As soon as you expect to enjoy a relaxing evening with a good steak, a storm blows in and destroys the peace. As my cell phone rang, I knew there must be a major problem. The IT director only calls me after hours when he is stumped. Note that this is a good thing.
He described the problem briefly. In a nutshell, the Costguard server was unable to connect to the domain, and all users were dead in the water. Knowing the history of our PDC, I requested he reboot the server to make sure the PC was “happy.” You know—a reboot a day keeps Jake away! I had a funny feeling the night had only begun, so I choked down my steak and awaited the inevitable.
Pondering the possible problems
As I made the 35-minute truck into work (I’m a worrywart), I pondered the different possibilities that could cause problems with the server. First, the server was running SQL, and we just had a consultant in tweaking the equipment and database. Could they have done something to mess up the computer?
Next, the aging PDC was running on squirrel power, and I expected this beast to die any minute. Did the SAM go south? Finally, I had been investigating another problem with NT rights with some groups. Apparently, the old network engineer had “altered” rights with groups. I had only made one change, but could I have created the problem?
The real problem
As I walked in the door, my boss informed me that on reboot, users were unable to connect to the domain. Immediately, I turned my attention to the PDC. Keep in mind; this problem was approaching three hours, and the IT director was obviously frustrated. Returning to the basics, I decided to look at the Event Log on the PDC. Novel idea, huh?
Immediately, I noticed the following error: A PDC is already running on this domain. Let me regress a little. Our department consists of the director, a PC tech, and myself. None of the IT team had built a PDC, or for that matter an NT server, in the past 4 months. What could be acting as a domain controller?
Let’s get ready to Samba!
A week before this problem, our company had brought on board two co-ops. One of the co-ops was assigned to our Cisco Networking group where he was tasked to “play” with Samba.
Samba, in case you didn’t know, is an open-source software suite that provides smooth file and print services to SMB/Common Internet File System clients. In redneck techie terms, this puppy allows a Linux box to access files and printers in a Windows world.
To make a long story short, the masquerading domain controller was the Linux box. Samba uses some configuration file to specify how the software acts. One of the lines defaults the box to be a master domain controller. When the Linux box came alive earlier that day, a master browser election was held. I’m sure Windows put up a fight, but Linux came out on top. The reason users could not log on to the network was because they were authenticating against the Linux box. Linux means no SAM. No SAM means access denied.
Not out of the woods yet!
After the Linux discovery, we immediately removed the box from the network. My boss and I decided to bring down the network completely, reboot, and go home!
To our surprise, when the servers came up, we were still unable to see users in User Manager For Domains. Specifically, when launching User Manager For Domains from the BDC, we received this error:
"Could not find domain controller for this domain.
Do you want to select another domain to administer?
Already frustrated, our final troubleshooting energy was exhausted. My boss began building a new PDC, and another tech and myself recapped the night’s events. Our thoughts were somehow the Linux box had goofed up the PDC’s SAM. I had an idea to go into the NDS manager and force the master replica of the SAM down to the PDC.
As I was working on launching the Novell Utility, I thought about services. I aborted the NDS idea and took a look at them. Sure enough, the NDS service, NDS_Server.NTPDC.CN.TREE, had been disabled. Apparently, when the server was unable to become the Primary Domain Controller, NT disabled the NDS service. I started the service, and the network was alive with activity!
Lessons I learned
- Watch those co-ops!
Most co-ops know enough to be dangerous. In our situation, we gave the student technician the dynamite (Linux Box) and match (Samba). No supervision coupled with little knowledge resulted in a major disaster.
- Return to the troubleshooting basics!
Basic troubleshooting is key. Remember to check the Event Log often! Also, look at those services! In addition, I’ll promise to practice what I preach.
- Take a deep breath!
I am very excitable when challenged by a technical problem. When frustrations are high, make a point to force a five-minute break. This is very important because a frustrated engineer’s focus is blurred. Case in point—the error received by the BDC concerning domain administration should have been a clear sign that the communications between the PDC and BDC were severed. My technical vision was blurred because I was tired and frustrated. I needed a quick breather in order to regroup.
- Learn about Linux!
I guess I could travel down two roads here. Road one would be simple: I can ban all Linux boxes from my network. However, the future dictates Linux as a cost-effective networking solution. So, road two suggests that I learn about Linux. This old Microsoft/Novell dog will be learning new tricks! I saw how quickly the penguin took over. The power of Linux motivates the engineer inside. I must master the penguin and unleash the power!