Over the years many of my clients have come to me with an
annoying problem: the need to constantly buy ever-increasing amounts of storage.
Every time they built a system based on previous growth numbers it turned out
disastrously wrong. Storage systems have a nasty tendency to grow in punctuated
equilibriums—following a normalized growth curve, then exploding at odd moments.
Tracking these sudden explosions in storage needs back to their source allows
us to more accurately estimate need, thereby smoothing out future capacity
planning efforts.

Sources of storage needs

When I went over my compiled notes about these projects I
realized something. My raw notes contained vast amounts of information about
storage utilization, but we had compiled it all into a single growth percentage
over time. This is, unfortunately, a classic error in data analysis: applying
an aggregate number to a real-time sample. Sure, over time the two will match,
but specific instances may well differ radically.

It is in the places where the two differed that my clients
ran into trouble. For example, one client estimated 20 percent growth per year
for four years based on the last decade of data. Unfortunately, two years into
his plan, his business users demanded a 100 percent increase in existing
storage. When they checked the short-term rather than long-term trends, he
discovered the demand fell into reasonable predictions but he could not meet
the requirement in time. I worked with him to deploy additional storage as
quickly as possible while gathering data on what happened.

In this specific case, my client did not account for his
company’s established six-year cycle of shifting in and out of ERP solutions. Every
three to four years for the last thirty, they had installed a new system to run
the company. The storage team inherited responsibility to maintain the
development channels for the “old system” and run simultaneous development
environments for the “new system,” effectively doubling their need
for storage every time the cycle started backing up.

Digging around in my files, I found similar events in most
of my clients’ storage growth cycles. Normal data growth (e.g., files, data and
email) followed relatively linear growth patterns barring outlying users. However,
most companies had regular intervals of extremely high-growth need based on
business-cycle dependent projects.

These projects included but were not limited to:

  • Installation of new systems (e.g., e-mail
    systems, new archiving systems) requiring simultaneous operation of multiple “live”
    data sets for a period of time.
  • Creation of new development methods. Each time
    the development team split or created a new methodology, they created
    additional need for duplicate data storage. The more conscientious the
    developers, the more storage they needed. This storage need came from both
    their desire to retain historical development data and also the need for
    multiple versions of “live” data for development/testing purposes.
  • Periodic increases in data gathering and
    retrieval needs based on upcoming audits and industry examinations. Any client
    whose activities fell under a regulatory body could expect to see a sudden
    increase in the data gathered to meet regulatory requirements between three
    months and one year before the event.

Additionally, many companies suffered from periodic, but
relatively unpredictable, bloats in active database sizes. In every environment
with active database development, the databases occasionally bloated due to
coding errors. Except in one case, all of these errors did not fit a
predictable pattern. However, they did have a predictable bloat effect: over 50
percent growth in less than 24 hours. In about half of the cases, the database
was not restored to its original, non-bloated state.

Estimation equations

Where does this idea lead us, other than to say you need at
least a decade of historical data to correctly estimate your needs based on cycles
rather than aggregate data? What if we don’t have access to such information or
don’t have the time to build it?

Fortunately, a few rules of thumb emerge from the data I
have in front of me: quadruple each development channel, triple each mail
system, and assume that all non-IT databases will bloat at a rate of 100
percent per year, whether they do or not. Non-database storage tends not to
spike unless and until an acquisition (either a purchase or being purchased) at
which time it will either plummet as executives purge files or explode as
people create documentation.

The real key lies in managing the growth of the development
channels. Depending on your company’s development methodology and archive
requirements, you could end up managing six versions of your live data
(development, QA, and live for two simultaneous systems) for anywhere between
one and three years. Each channel requires at least twice its current size in
storage capacity; it will be more comfortable running at around 25 percent
maximum utilization in case you need to perform rapid copies or restores.

In most shops, the rest of the storage (e.g., e-mail,
smaller databases and files) present us with far less trouble. E-mail grows
relatively predictably; files do the same. The smaller databases making up the
shadow ERP generally try to stay under the IT manager’s radar, so you can count
on the “developers” to control them. If they don’t, a little friendly
reminder to run the appropriate database control tools can earn the IT team a
bit of extra political capital for use in another situation.

Working the punctuations into the plan

The good thing about regular punctuations is that you do not
need to address them immediately. Once you have a grip on when they will occur,
you can build their needs into an extended two-to-four-year expansion plan
rather than making sudden purchases. When doing a consolidation, build the
extra capacity into next year’s budget—this defers the cost while showing the
decision makers that you have a clear plan for the future.

If you have recently installed new archive/back-up/storage
solutions, going back and looking for the punctuations gives you valuable
insight into the future. When the inevitable “sudden” demand comes in for new
storage, you can coolly lay out your plan to meet the need using existing or
readily available resources. This saves you headaches and allows you to
continue your move from reactive to proactive infrastructure management.