Windows optimize

Troubleshooting woes: Some tips all IT pros should remember

Donovan Colbert describes a recent frustrating issue on a project that reinforced some of the tips IT pros should remember when troubleshooting technical problems.
Recently I was building a Citrix Xenserver test lab to create a Citrix XenApp farm. During the build process, I ran into difficulty several times with the order and steps necessary to make the systems available via RDP instead of the Citrix XenCenter console. It isn't the easiest or clearest build process to set up a Citrix XenApp server as a VM in Xenserver. You've got a lot of little details you've got to get right -- and the default build of W2k8 and W2k3 has some additional initial steps you've got to go through before it'll open it up for RDP connections. This is all compounded by the fact that you've got to get the Xentools installed correctly and get the right .net framework installed prior to installing Presentation Server on your Windows base build.

Additionally, I'm not on the most robust test server for this build (a Dell PowerEdge T110 tower quad core with 4GB of ram). I'm also introducing myself to setup, configuration and administration of XenApp with this lab. My lead Citrix engineer is assisting me when I hit snags, but he has his regular job to keep him busy. Altogether, it has made it a relatively difficult lab for me to get running. This lab is built off our main LAN -- and it replicates the naming policy of our production network so I have to be careful to ensure that it remains isolated. In order to get online to download updates and components, I am using our public guest network -- and I've got the host machines and VMs configured with non-routable IP addresses. I've changed the IP range a couple of times as I've built the machines.

Because of all of this complexity, getting these machines up and running has been a start and stop process with several sets of frustrations along the way. We've kind of cheated and bent the rules at a couple of points; for example, we did not create a SQL server to house the database because of the limited hardware resources on the Xenserver, and we created the original set of XenApp VMs as a workgroup, without a domain controller. We encountered strange behavior with this model. Among the problems was getting the machines into Terminal Server mode and reachable from my lab console where I am running XenCenter (a Lenovo laptop). I missed steps during the configuration and initial Windows update process and the Firewall was left enabled on one VM. The other had a similar issue. After all was said and done, we still had problems with one of the Citrix servers being registered correctly in the farm, and my engineer told me I should just create a domain rather than a workgroup so I would be working with something more similar to a production model. Rather than build a server up from scratch, I borrowed a VM image from another engineer's lab that was already set up as a DC.

After importing the image I logged in and found that this engineer uses a different password than our normal test-lab password. I'm okay with that in general principle, but I had to wait until I could talk to that engineer to get the correct password. He was out for a couple of days, then very busy, so my project got put on the back-burner. When I returned to it and had him change the password so I could get into the machine, I found that it also failed to connect via RDP once I had it all set up. I could only access the machine from the XenCenter.

A second set of eyes is often all it takes

Frustrated, I went in to My Computer properties to enable Remote Desktop, but it was already enabled. Normally my next step would be to go into security or services and ensure that the Windows firewall was disabled, but my intuition told me that this wasn't the problem in this case. Puzzled, I went to another engineer who works daily with AD and W2kX domain controllers and asked him to come fix my problem for me. I figured I was just rusty from too much time behind a desk and not enough time hands-on working daily as a forward engineer (which is undoubtedly part of the problem). He came in, sat down, and of course, his first step was to do exactly what I had just tried. "Get out of my way and let someone who knows what they're doing fix this probl- er... wait, Remote Desktop is activated."

He asked, "What is the IP address this machine is assigned," and opened a command prompt to run an ipconfig - and that is when it hit me.

My previous issues and challenges had put me in a mindset where I was approaching the problem with a particular bias on what the solution was. I was applying previous experience specific to this particular lab and not attacking this issue as a separate, unique challenge. I was mistaking cause for correlation.

This was a great example that reaffirms some of the concepts and skills I constantly stress with my team and that all successful IT engineers should constantly revisit.

  1. Do not mistake cause for correlation.
  2. Do not assume that past experience is any indication for resolving current issues.
  3. Seek out another set of eyes when you're experiencing issues.
  4. Be humble in understanding that asking for help is not the sign of an inexperienced engineer; it is the sign of an experienced one.

It turned out that the issue was that the image I had used to import the DC had an IP on a different subnet assigned to it. It was a simple and logical mistake that was easy to overlook. Having another engineer come in and look at the box and ask a simple question instantly provided me the solution. I felt silly and a little stupid that I had overlooked such an obvious mistake - but I would have felt far worse if I had sat trying to troubleshoot the issue by myself for any length of time. My mindset was oriented toward chasing down the wrong set of causes based on my previous experiences - and that is a very common mistake for IT employees to make when troubleshooting. It doesn't matter if you're at the help-desk or enterprise engineering, everyone who works in IT eventually encounters a situation like this. I've been on both sides, too. I've been the engineer barking up the wrong tree, and I've been the engineer that pointed a co-worker in the direction that helped resolve their trouble.

Sometimes it is the soft skills that are most important to being successful in this industry. We spend a lot of time focusing on the technical knowledge and training. A comprehensive knowledge of a platform isn't going to help you avoid spinning your wheels over an issue like this, though. Having both the hard and soft skills is the difference between being stuck in a position and being a star performer who is recognized in your organization and among your peers.

Do you have a similar experience or perspective to share? Let us hear your examples in the forum.

About

Donovan Colbert has over 16 years of experience in the IT Industry. He's worked in help-desk, enterprise software support, systems administration and engineering, IT management, and is a regular contributor for TechRepublic. Currently, his profession...

18 comments
RMSx32767
RMSx32767

#3 is too true. Been there, done that, on both sides; been the eyes, needed the eyes.

BALTHOR
BALTHOR

I agree,don't worry about feeling silly or stupid is the solution.Sometimes it happens so fast,especially with IP addresses and Mdc codes.Here I'll show you my fix for the problem /** 37 * blk_rq_count_integrity_sg - Count number of integrity scatterlist elements 38 * @q: request queue 39 * @bio: bio with integrity metadata attached I know it's not much but it blows the doors off of the gathering crowd.Keep going and let your light shine.

dcolbert
dcolbert

Tech Republic is not letting me add my posts, right now. Nope... Weird. It just won't let me add a *particular* post. What is up with that? That happens sometimes. I just can't get it to publish a particular response. I clear my cache, exit my browser and log back in. I check for language filters... I haven't wrote a bad word intentionally or unintentionally. I don't get it.

TheSwabbie
TheSwabbie

I've been in IT since before it was even called IT. Starting in the mid 1970's in the US Navy as a communications center Supervisor to where I am now in IT Management for a technology company. I've had more challenges than I can possibly recount.. sometimes I was a shining star in restoring something.. others (more often) I was stuck up to my neck. I learned a VALUABLE lesson - STICK to KNOWN GOOD TROUBLESHOOTING PROCEDURES.. and do NOT be tempted to SKIP STEPS! When you take troubleshooting steps logically and in order you WILL solve the issue. Step off the path and you "fight the dragon". I've found that I can easily distinguish when I'm bogging down and subject to being "myopic" on something.. its when I start to get frustrated and think "this doesn't make sense"? Those are the warning bells to STEP BACK.. or, relegate yourself to more fruitless hours of troubleshooting that will inevitably have you banging your head against a wall. You feel SO stupid for missing something SO blatantly obvious and repeat over and over in your head like a mantra "DON'T FORGET HOW YOU SOLVED THIS!" Yes brother.. we've ALL been at that "dock" many times looking for a ship that never seems to arrive. Good Article.

pauljosephwalsh
pauljosephwalsh

As a recently certified IT professional, I took a position with a Help Desk team and immediately began to realize that the pristine, predictable world of the IT training labs was not even close to reality. The whole world doesn't have the latest version of MS Server, and the latest workstation OS, with the latest version of Office, etc. The real world includes so many variations and mixtures of technologies, that the recent school graduate can be quickly overcome. As suggested above, the best solution, I found, was to admit that I don't know everything, get my ego out of the way and realize perhaps my comprehension is based upon some false perceptions, and ask for help. You learn more from your colleagues because unlike the school labs, they have faced the real world of IT (particularly in the organization where you work) and found solutions.

Timbo Zimbabwe
Timbo Zimbabwe

"Recently I was building a Citrix Xenserver test lab to create a Citrix XenApp farm." We've found NUMEROUS issues with Xen.

RealInIT
RealInIT

You know, sometimes you get so focused you develop 'tunnel vision', happens all the time in all industries not just IT. One of the things that helps is having a broad base of troubleshooting experience. This is my third career change, and the lessons learned elsewhere have been very helpful now. My 2c worth.

sboverie
sboverie

Another set of eyes helps for a lot of problem if you find yourself stuck in a rut. Another thing to do is to take a break, this gets you out of the situation long enough to detense. I think what is more important is to not give up, call for tech support, call for a coworker to help, look at a similar system that is working and read the freaking manual. Also, get more of your senses involved, listen for unusual noises, smell weird odors, feel vibrations or tapping and look for something that doesn't look right. I look at symptoms and eliminate subsystems to reduce the complexity of the problem. Experience is useful but if you are working on something new then don't apply experience further than it helps.

jev.case-24297005939114168965253281161338
jev.case-24297005939114168965253281161338

Good advice, I find myself too often trying to relate new problems to past experience. That can be helpful in some cases but it can be damaging in others such as the one you described above. It's easy to do that unfortunately, sometimes I do that because i am too lazy to figure out the problem. Instead, as you say, we should classify the problem and note what is unique about it instead of just throwing our past experience at it. Many times unique problems will have a unique solution. I also like the part about being humble; hubris can often cloud people's judgement.

dswope79
dswope79

Too many IT "pros" have the mindset of a know it all and simply refuse to use the resources available to them by simply asking for help. I am sure we have all worked around this type of IT people. I have been lucky to have worked around a lot of helpful pros that have massive amounts of experience on their hands and like to teach and help when needed. I will always be the first to ask for an opinion or help before making a critical move in a production environment. I will also freely admit that while I might not have an answer for a question immediately, I am confident I can find the answer. People need to learn and understand that IT is not about knowing everything, in my opinion a sound IT professional is not one who thinks he/she knows everything, it's one who is humble and has the ability to utilize resources to find resolutions to problems.

mondals
mondals

Many of us in IT fail to ask simple questions fearing that someone will take it as lack of knowledge. Yet, I would respect an engineer who says 'I don't know' as opposed to an engineer who does not say that at all. There is no way one would know everything about any particular subject. It is simply too vast. I have a CCIE certification in Route and Switch but it makes me humble on how much I don't know about Networking. We all need to be humble not macho, that's all.

dcolbert
dcolbert

Company culture has a huge impact on how you proceed with issues. This is another one of those soft-skills. Understanding that even if there is an ideal way to achieve your end-goal, the culture and attitude of your organization has to be worked around. A lot of IT professionals throw fits and storm around upset when company attitudes and policies that are wrong-headed make it difficult to work best practices to arrive at a solution. I run into (and struggle with) this all the time. I'm very driven to achieve the most efficient, most reliable, and most secure solutions by applying industry best practices. When political and environmental obstacles interfere with that process - sometimes I'm not the best at responding to that kind of frustration. This is not an uncommon fault among my peers. I think I.T. tends to attract the kind of people who get very passionate and have trouble accepting that sometimes the BEST way to do something is going to be ignored in favor of some other course because of unreasonable business considerations and sensibilities. I honestly deal with that kind of stress at *least* on a monthly basis. It is something I need to work on, personally. Good call.

TheSwabbie
TheSwabbie

I can tell you right now, forget 90% of what you learned. Most of the problems you'll experience will be simple.. too often fresh helpdesk/desktop support techs look for the most complicated answer.. Always remember KISS - "Keep it simple Stupid". A vast majority of your issues will be loose cords or "One D 10 T" errors (IDIOT). I use to tell all my helpdesk & Desktop people that MOST problems were between the "Keyboard and the chair". Listen to the "old guys/gals".. they can save you a LIFETIME of aggravation with their insight & tips on things. Welcome to IT!

dcolbert
dcolbert

The response below is the response that wouldn't post. I entered "test" with the word "test" in the body... saved, then edited and put in my response. This is somehow totally appropriate to this thread. :) I love it when I make a system do what I want, even when it doesn't want to for some silly reason.

dcolbert
dcolbert

We had to move our entire environment from a data room in Folsom to another in the main data-center. We shut down all the servers, still in their racks, and had them all moved from one building to another building with 3 buildings in between. The racks were too big for the cage where the new DC was located, and so the movers had to TIP them to get them in the door way. Relocated all servers to their aisles in a hot-aisle/cold-aisle configuration. Mine wouldn't come back up. It wouldn't recognize the external drive array that it was hooked into. We played with the high availability application for 6 hours trying to get it to recognize the down DAE, and it simply wouldn't see it. Finally, in exasperation, we called the vendor's hardware support. He came out, went behind the machine, unplugged teh cable and checked the pins. Bent SCSI pin. Replaced the cable, and we were up in a matter of minutes. Case. In. Point. :)

dcolbert
dcolbert

I have to admit, when I was writing this, it occurred to me that part of the value of a veteran I.T. engineer is his past experience with a broad range of issues. This experience allows him to intuitively narrow the field for what might be causing the issue. I use a divide and conquer approach to troubleshooting in a lot of cases. I figure out where the problem might be and trouble-shoot in the middle. In an ideal case, this divides the potential cause and you've halved the possible causes. Wash-rinse-repeat until the issue is isolated and identified. It doesn't always work like this - but I'd say more often than not, this is a solid approach to most I.T. challenges you are faced with. But in any case, you've got to apply what you know, what you've experienced. You shouldn't throw out a hunch based on what you've encountered in the past. But you also shouldn't become so fixated on it that you get caught in a mindset where you're not seeing the forest for the trees. One of my good friends once told me about troubleshooting an issue at home with NT 4 refusing to join his network. He did all his troubleshooting with due-diligence, and the problem persisted. I can't remember all the details anymore, but we discussed it at length back in the day and I was stumped by what he described too. Eventually, he just blew out the OS and re-installed. After he had rebuilt the entire machine, he realized he had never blown out and reinstalled the TCP/IP stack. He was almost certain - and I agree, that this would have resolved the issue without a complete system rebuild and all the hassle that entailed. Lesson learned - but lesson learned the hard way. That kind of example happens every day, all over the world, costing I.T. organizations and individuals countless man-hours. It is unavoidable - and executive management doesn't understand that. Human error happens in making decisions like that - and frequently after you've been hitting a brick wall all day long with a deadline looming, you make a single rash decision that isn't the *best*. It gets the job done, but it is terribly inefficient. You've got to accept that those things happen, and will continue to happen, regardless of your best efforts to try to avoid them. But taking that extra breath and having that conversation with another set of eyes BEFORE you take the irrevocable step can frequently help avoid all that extra work.

TheSwabbie
TheSwabbie

Keep it Simple Stupid.. LOL.. Its almost always - without fail.. the SIMPLE things that drive $30 an hour techs to the looney farm. LOL