Some years back, I was involved with a sizable network administration redesign task: All 3,000-plus seats in corporate headquarters were slated to go on a new corporate standard called the Standard Office Solution (SOS). Everybody across the country would use the same applications, the same desktop hardware, and the same network architecture. The goal was to reduce costs across the board, simplify support, and establish a common development environment for all future desktop and networking applications.
Our group had a tight timeline—an all-new CAT5 backbone, a new network topology, an entirely new server farm, and new desktops and training for 3,500 people in 42 months. The standards, plus implementation instructions, were to come from the SOS Planning Committee, a centralized authority responsible for integrating all the NOS and desktop systems and, most critically, the specialized desktop and networked applications already in use by the corporate staffs.
In theory, we were starting from a clean slate, and our anticipated problems were going to be transitioning the users to the new environment and integrating their old apps into the new order. The considerable mechanical task of installing some 30 miles of CAT5 into an antiquated building, rewiring every one of over 200 closets, and refitting the server rooms with the new NOS, as well as maintaining the existing four older ones, was given first priority.
As the new desktops were installed, at 10 a day, the data would be transitioned from the old servers to the new data structure on the new servers. Data in older application formats was translated into the new application formats either as it was transitioned or immediately afterwards with IT staff help. We planned to have all users trained within two weeks of receiving their new equipment. We provided each person with a new password and a new customer service number as each desktop went in and was connected to the backbone.
We met all objectives and adhered to the timeline rigorously. The new SOS was in place on the date specified and in full working order. And if that were all there was to it, this story would be over.
First sign of problems
There were some problems, most of them anticipated, with bringing all the HQ staffs to the new hardware and software, and especially in moving the users from their old local support people (my IT staff) to the new centralized call-center structure. There were considerable teething problems transitioning the old IT support staffs into the new centralized IT staffs. The users had some difficulties adapting to the new order, but all these were expected to end. And they did.
What did not end, and in fact became more prominent the further along we went into the SOS, was a rise, not a decrease, in network outages, server crashes, and server space reallocation. As the new servers were being put into service, the space allocation among them became more problematic, with several staffs sharing parts of one server and staffs needing to bridge their data across multiple servers. Since the traffic loads varied greatly between the staffs and between the servers shared among them, load balancing became a daily problem.
These problems, added to the normal issues of a full-system switchout, soon overwhelmed the existing staffs. Service calls increased across the board as users, frustrated with their lack of access, called the help desk daily and hourly for updates. We appealed to the SOS Planning Committee for help, but they insisted they could help only with "normal" SOS installations, not with the hybrid one we had applied to corporate HQ. We would soon learn that other large offices within the corporation—those with 1,000 or more seats—were facing the same problems with no apparent solutions.
Some factors quickly became apparent to us. As the SOS Planning Committee had said, we were indeed a hybrid installation. We had to be. Because the typical corporate location was 200 or fewer seats, the SOS had been designed to accommodate installations of this size. This included the network administration, the transition of IT staffs, billing, and all other technical and administrative issues. We had been forced to write these specs larger, to make a larger network, and to design a much more complicated administration infrastructure—and so had all the other larger offices.
The SOS had been designed with corporate server integration as the goal in mind. All servers would eventually be accessible to all authorized workstations through a centralized administration system reliant on local authorization. This architecture was necessarily still in development; this was before VPN and Web-based WANs were in common usage. As it turns out, we had jumped the shark on large-scale network server integration for corporate HQ and, to our chagrin, the SOS Planning Committee members were watching what we were doing on this issue as a guideline for future integration! We would be forced to disappoint them.
The light . . . is a train
Searching for a solution, our site manager ordered an extensive review of where we had deviated significantly from the SOS specs. My own team, responsible for desktop integration and installation, had few deviations. We simply installed more seats. But the networking team had needed to install not merely more wire but a vastly more complicated switching and routing architecture using the SOS-specified components, which, like all other parts of the SOS, were intended for offices of no more than 200 seats. The tools we were required to use were not robust enough for a headquarters office that was more than 11 times this size.
But it was the server team that proved to be the actual source of the continuing support problems—not because they had deviated from the standard, but because they had not deviated from the original specification at all.
Every server they had installed had been the exact model and size indicated in the original SOS spec. They had dutifully followed the mandatory corporate standard and put in nothing but SOS-required equipment. The result was nearly a hundred servers, some only marginally in use, some regularly overloaded, and some that worked well until end-of-month closing came, when the workload overflowed their drives and caused load-balancing and routing errors affecting half the LAN.
Barring replacement of all the servers with new larger ones, perhaps even a separate midsize mainframe for each department at headquarters, there was no permanent solution. The switches and routers were replaced with beefier ones, and Web traffic was rerouted entirely to a separate network, but even with dedicated servers reassigned for each staff, the administrative load over the headquarters LAN was high and often caused outages well above the average for any other SOS site.
I left the organization shortly thereafter and did not see the final solution. But, a couple of years later, headquarters moved to a newer and larger building, where management could doubtless apply the lessons they had learned about forced scalability.
By rigidly adhering to corporate specifications never intended for a site the size of headquarters, our team had caused its own problems. By not comparing the known and projected volumes of network and server traffic under the preexisting wiring and server plan with the new corporate standard, our team never understood the real magnitude of the administration problems it would encounter.
Ultimately, the real lesson here is about confidence in your own and in your staff's knowledge of local conditions and user needs. We needed to be able to go back to the SOS Planning Committee and say, "This spec won't work for us, and here's why. Here's our suggested alternative. Will you support us?" Had we done that and fought that bureaucratic battle over a year, we could have avoided four years of substandard service to our customers and the loss of their confidence.
There's no substitute for your experience with your staff and customers. Even the best technical plan your technicians can develop is going to need your feedback and advice in order to succeed. You may have been given the best weapon humankind has ever devised, but if you don't check to make sure the ammunition is in it, then it's of no use.