Data Centers

Doing More with Less: NCR avoids server bottlenecks, wasted capacity with TeamQuest tools

Learn how one capacity-management tool saved NCR about 00,000.


As global IT planning and procurement manager at NCR Corporation, Paul Armstrong decides how to allocate processing and server capacity for applications ranging from human resources management to the analysis of vast stores of historical data.

One of his greatest concerns is keeping the dispatch program for field engineers up and running for service calls. It’s an especially important business application because NCR provides service not only for its own equipment and applications, but also for those of third parties. Bottlenecks at the Dayton, OH, corporate data center can slow service activity to a crawl around the world, whether for NCR’s point-of-sale signature capture devices or a third-party’s satellite dishes.

“If someone is not able to make a call, what we risk is a customer, and that can be really big bucks,” said Armstrong, a 25-year NCR veteran.

To head off problems that could cost the company customers, Armstrong began investigating analysis and optimization tools in the late 1990s. As a more tangible payoff, the company stood to save money by avoiding the need to purchase new processing and server capacity as its needs grew.

Managing the data center
The massive size of the data center makes any opportunities for savings significant. The Dayton center has roughly 900 servers and 100 terabytes of storage run by about 140 employees. It houses an enterprise production environment and runs an off-site recovery center, which also is home to the development environment. For added security, the sites back up to each other. Among the most important applications are the service-dispatch system and the company’s ERP system, which runs on Oracle.

The servers primarily run Sun Solaris, although some older servers that run an in-house UNIX-based operating system are still being migrated to Solaris. Armstrong says that most disk storage is from EMC and LSI.

With heavy demands for processing and storage, Armstrong was shopping for software to help improve operations in two ways:
  • Performance management
  • Capacity management

In performance management, he was especially concerned with keeping the enterprise production applications running well. To do that effectively, system administrators would need to review reports and statistics daily to spot trends and predict and avoid bottlenecks.

In his capacity management efforts, Armstrong wanted to develop a team to drive the procurement process. The team was to review all requests for additional storage and processing capacity and choose to reallocate resources “on the fly,” or, if absolutely unavoidable, request additional capital expenditures to handle the growth.

Armstrong realized that without good analytic tools, NCR would probably end up wasting money on unneeded capacity.

“Let’s face it—you never wanted to be short, so you guessed high,” he said about estimating storage needs.

That philosophy may be a fairly common cause of waste in many IT departments. In an analysis of how well IT resources are used, Gartner estimated that many Web server and storage environments have peak utilization levels as low as 30 to 40 percent. Gartner estimated that consolidation projects can push the level of peak utilization to around 70 percent.

Sizing up the products
In the late 1990s, Armstrong and his team evaluated several analysis and optimization products. Among the first was BestOne, a company later acquired by BMC. At the time, BestOne still had strong ties to IBM and was just beginning to enter the open systems market. Armstrong was impressed by the “phenomenal algorithms” in the product, but couldn’t overcome concerns about expense and support for the open systems environment. He also tried some tools from OpenView MeasureWare and OpenVision (which merged with Veritas), but at the time found them too rudimentary.

Then, on the suggestion of Sun representatives, Armstrong contacted TeamQuest for a demonstration. He was impressed by the “what-if” capabilities of the software, which could show the impact of placing more demands on the system. For example, TeamQuest could show how a change would affect CPU utilization, as shown in Figure A, and memory capacity.

“We had a good base of information,” Armstrong said, “and we made decisions within a week.”

Figure A
TeamQuest displays CPU usage.


TeamQuest’s support also helped make the sale. During the two-week demo process, TeamQuest made some revisions to the software based on NCR’s feedback. Armstrong decided to purchase the product and install it on all new Sun or NT servers, and the department also installs it as they retrofit older servers.

Armstrong said new releases of the software are easy to install and require little downtime. Also, the department rarely has to change any of its scripts after installing a new release.

Armstrong declined to say how much NCR invested in TeamQuest, although he said that the system quickly paid for itself. TeamQuest’s Rebecca Kauten said that the pricing structure varies.

“A software license in the Windows/Linux/UNIX environment is less than $1,000 for a single server, and fees scale according to the number of servers installed,” she said.

Saving costs—and customers
As Armstrong expected, one of the strengths of the system has come in keeping the field service operations running. The field engineers use wireless terminals to access all customer information, such as updates on the problem, the dispatch calls, priority of the customer, even directions to the site. If the application is down, the work can grind to a halt.

One day, there was a significant slowdown in the performance of the field service application. There were no obvious reasons for the slowdown, such as changes to the code or databases in the last 24 hours, and the equipment was up and running. TeamQuest’s analysis revealed the cause.

“The performance group found an i/o string that was eating the processor alive,” Armstrong said. “TeamQuest enabled them to spot it right away and recommended that they reload the database.”

Once they did and the keys were rebuilt, the system was up and running again, saving hours of downtime for the field engineers—and, possibly, saving customers for NCR’s service business.

Although Armstrong finds it difficult to quantify the return on those kinds of performance issues, he can quantify some of the savings on the capacity side when departments request IT resources.

“Prior to having this product on site, we’d throw more servers at it, we’d throw more hardware at it, and we’d throw it out there with either a bigger processor, if available, or multiple processors,” Armstrong said.

He estimates that by finding ways to run new applications on existing equipment, the company has avoided the need to buy at least five servers at $60,000 each, for a total of $300,000. That figure doesn’t include the overhead costs of adding a server, including the need for floor space in the data center, the technical issues involved in spreading applications over multiple servers, and the added complexity of backing up and, if needed, restoring the system while keeping it in sync.

“You start adding all this up, and the cost of a decision like this becomes phenomenal,” Armstrong said. “Instead, we make a decision based on the architecture we have today and spend no money, other than the time it takes for a system administrator to look at the issue and make a recommendation and resolution.”

Editor's Picks

Free Newsletters, In your Inbox