Lost revenue and customer impatience with unplanned downtime can force organizations to look at building a continuously available architecture. However, these systems are very expensive and require a highly trained staff to keep them up and running. This article explains some specific costs to implementing a highly available system and some of the costs of not having such systems.

Continuous availability costs
Cost for highly available systems generally fall into two camps: hardware and IT talent. On the hardware side, massively paralleled replicated machines are costly (at least five times the cost of a nonparallel server). On the other hand, so is the army of database administrators and architects needed to maintain the hardware and the database management system (DBMS). Although the software components are readily available, the human effort for the installation, setup, and testing of continuous availability systems can be very expensive and time-consuming.

When using a massively parallel processing (MPP) server, many Oracle users set up Oracle9i Real Application Clusters (RACs) and Transparent Application Failover (TAF). Introduced in Oracle9i, RACs take advantage of an improved Cache Fusion architecture and reduced I/O demands, increasing scalability. (In future articles, I will drill down on the drawbacks and benefits of RAC and TAF as compared to Oracle Parallel Server.) Installation, configuration, and testing of these products can run into hundreds of hours. Although staffing needs can be reduced after installation is completed, keeping top database talent and expensive equipment running can be a challenge in lean economic times.

As shown in Figure A, RAC with TAF can provide a rapid response to application failover—but at a price of extra human hours for the configuration, setup, and monitoring.

Figure A
RAC with TAF is fast but expensive.

Next, let’s review some industry metrics to help development managers decide if the investment in a highly available system is worth it.

Justifying the cost
Obviously, the monetary cost of implementing a highly available architecture is high. However, if you look at the big picture, it’s possible to justify even a significant expense.

As shown in Figure B, the cost of unplanned downtime can be significant for all business segments. For example, credit card companies could lose hundreds of thousands of dollars per minute.

Figure B
Financial impact of downtime

With respect to e-commerce engines and Web sites, downtime is not just measured in lost revenue and worker productivity but also in lost customer goodwill. The intangible cost can easily translate into millions of dollars as frustrated customers quit visiting.

Manufacturing systems are also subject to very high costs, but these are direct costs in terms of lost sales. Lost sales for manufacturing operations often have to do with interruption of the manufacturing process and lost wages paid to factory workers who were no longer able to do their jobs.

In practice, companies must take a look at the cost of downtime relative to investment requirements of a highly available system.

Probably worth the expense
Sure it costs money to have a high availability system. There is hardware, software, and people. However, when you consider the price you may have to pay as a result of downtime (and these may not all be direct costs), you just may find it is worth every penny.

Do you work with high availability systems?

What are some of the problems you have run into that are unique to high availability systems? Have they been worth the effort? Send us an e-mail with your thoughts and opinions or post a comment below.