Over the years many of my clients have come to me with an annoying problem: the need to constantly buy ever-increasing amounts of storage. Every time they built a system based on previous growth numbers it turned out disastrously wrong. Storage systems have a nasty tendency to grow in punctuated equilibriums—following a normalized growth curve, then exploding at odd moments. Tracking these sudden explosions in storage needs back to their source allows us to more accurately estimate need, thereby smoothing out future capacity planning efforts.
Sources of storage needs
When I went over my compiled notes about these projects I realized something. My raw notes contained vast amounts of information about storage utilization, but we had compiled it all into a single growth percentage over time. This is, unfortunately, a classic error in data analysis: applying an aggregate number to a real-time sample. Sure, over time the two will match, but specific instances may well differ radically.
It is in the places where the two differed that my clients ran into trouble. For example, one client estimated 20 percent growth per year for four years based on the last decade of data. Unfortunately, two years into his plan, his business users demanded a 100 percent increase in existing storage. When they checked the short-term rather than long-term trends, he discovered the demand fell into reasonable predictions but he could not meet the requirement in time. I worked with him to deploy additional storage as quickly as possible while gathering data on what happened.
In this specific case, my client did not account for his company's established six-year cycle of shifting in and out of ERP solutions. Every three to four years for the last thirty, they had installed a new system to run the company. The storage team inherited responsibility to maintain the development channels for the "old system" and run simultaneous development environments for the "new system," effectively doubling their need for storage every time the cycle started backing up.
Digging around in my files, I found similar events in most of my clients' storage growth cycles. Normal data growth (e.g., files, data and email) followed relatively linear growth patterns barring outlying users. However, most companies had regular intervals of extremely high-growth need based on business-cycle dependent projects.
These projects included but were not limited to:
- Installation of new systems (e.g., e-mail systems, new archiving systems) requiring simultaneous operation of multiple "live" data sets for a period of time.
- Creation of new development methods. Each time the development team split or created a new methodology, they created additional need for duplicate data storage. The more conscientious the developers, the more storage they needed. This storage need came from both their desire to retain historical development data and also the need for multiple versions of "live" data for development/testing purposes.
- Periodic increases in data gathering and retrieval needs based on upcoming audits and industry examinations. Any client whose activities fell under a regulatory body could expect to see a sudden increase in the data gathered to meet regulatory requirements between three months and one year before the event.
Additionally, many companies suffered from periodic, but relatively unpredictable, bloats in active database sizes. In every environment with active database development, the databases occasionally bloated due to coding errors. Except in one case, all of these errors did not fit a predictable pattern. However, they did have a predictable bloat effect: over 50 percent growth in less than 24 hours. In about half of the cases, the database was not restored to its original, non-bloated state.
Where does this idea lead us, other than to say you need at least a decade of historical data to correctly estimate your needs based on cycles rather than aggregate data? What if we don't have access to such information or don't have the time to build it?
Fortunately, a few rules of thumb emerge from the data I have in front of me: quadruple each development channel, triple each mail system, and assume that all non-IT databases will bloat at a rate of 100 percent per year, whether they do or not. Non-database storage tends not to spike unless and until an acquisition (either a purchase or being purchased) at which time it will either plummet as executives purge files or explode as people create documentation.
The real key lies in managing the growth of the development channels. Depending on your company's development methodology and archive requirements, you could end up managing six versions of your live data (development, QA, and live for two simultaneous systems) for anywhere between one and three years. Each channel requires at least twice its current size in storage capacity; it will be more comfortable running at around 25 percent maximum utilization in case you need to perform rapid copies or restores.
In most shops, the rest of the storage (e.g., e-mail, smaller databases and files) present us with far less trouble. E-mail grows relatively predictably; files do the same. The smaller databases making up the shadow ERP generally try to stay under the IT manager's radar, so you can count on the "developers" to control them. If they don't, a little friendly reminder to run the appropriate database control tools can earn the IT team a bit of extra political capital for use in another situation.
Working the punctuations into the plan
The good thing about regular punctuations is that you do not need to address them immediately. Once you have a grip on when they will occur, you can build their needs into an extended two-to-four-year expansion plan rather than making sudden purchases. When doing a consolidation, build the extra capacity into next year's budget—this defers the cost while showing the decision makers that you have a clear plan for the future.
If you have recently installed new archive/back-up/storage solutions, going back and looking for the punctuations gives you valuable insight into the future. When the inevitable "sudden" demand comes in for new storage, you can coolly lay out your plan to meet the need using existing or readily available resources. This saves you headaches and allows you to continue your move from reactive to proactive infrastructure management.