The data warehouse is cousin to the conventional RDBMS system, but the family resemblance is slight at best. If you’ve never built a data warehouse before, you’re in for a very different experience, from the setting of goals and objectives to pounding out a design, from creating data structures to writing analytics to the most ill-defined user interviews you’ve ever conducted. In short, if you try to build this warehouse the old-fashioned way, you’re going to go way over budget, and your warehouse might not be standing when you’re done.
There’s a long list of things not to do in managing a data warehouse project (See "The seven deadly sins of data warehouse implementation development"), but there are also a number of positive, proactive steps that can increase your chances of a smooth implementation. Resolve to be open to new ideas, and seek creative inspiration in radically modifying your tried-and-true practices to fit this new way of thinking.
1. Assign a full-time project manager, or do it yourself full-time
It's common, and often unavoidable, that project managers ride herd on several projects at once. The economics of IT resources make this a fact of life. When it comes to building a data warehouse, however, don’t even think about it. You are entering a domain unlike anything else you and your crew have worked on. Everything about it—analysis, design, programming, testing, modifications, maintenance—will be new. You, or whoever you assign as project manager, will have a much better shot at success if allowed to get into that “new” mode and stay there.
2. Consider "hand-off" project management
Because the phases of a data-warehouse build are so very different, you do yourself no disservice by handing off to another project manager when a phase is complete, provided you adhere to Step One above. Why is it reasonable to do this? First, any phase of a data warehouse implementation can be exhausting, from a project management standpoint. From the deployment of physical storage to implementing the Extract-Transform-Load, from designing and developing schemas to OLAP, the phases of a warehouse build are also markedly different from one another. Each not only could use a fresh hand, management-wise, but a fresh creative perspective. Handing off management not only doesn’t necessarily hurt, it may even help.
3. Institute "user-mining" interviews
This is important enough to be an article in itself. You must understand, going into the design process that your potential warehouse users aren’t going to be able to clearly articulate what it is they want the warehouse to do for them. They’re going to have to explore and discover it as they go—and so will your development team, in conducting interviews. Make your interviews open-ended, with lots of note-taking, and have your development-team interviewers focus more on the consequences of processes than the processes themselves.
Since you’re conducting these interviews in order to get some idea of what data to store and how to efficiently store it, you need to (in partnership with your users) come up with new ways to look at data, not process it. You’re trying to find potential information that can be gleaned, not from transactional data itself, but from the information behind it: the rise and fall of numbers over time, etc. Don’t chase answers in these interviews. Let answers come to you.
4. Assign leads as technology/information repositories
These don’t need to be full-time assignments, but because the phases of a data warehouse implementation differ so greatly, you’re going to need people out there assuring continuity. There are three important areas: architecture, technology, and business. Assign an architecture lead to ensure that the generally agreed-upon architecture of the data warehouse, from the physical level on up, is maintained throughout the project. A technology lead should be appointed, because your developers and key users will all be using tools they’ve never used before—someone needs to oversee the deployment and consistent use of these tools.
Finally, the business needs that will be met through use of the warehouse must be carefully observed and documented, to spur continued development. Since the analytics and metrics to be derived from the process are developed over time, by users who will not necessarily communicate well with one another, someone must watch this development, encourage its continuation, and nurture it into progressing to higher levels.
5. Resign yourself to many iterations
A data warehouse will never, ever be right the first time. Why? You don’t know what you’re really looking for until you see it. Or, to say it more precisely, the ultimate users of the system won’t know what they’re really going to use it for until they’ve used it for awhile. As contrary as that may seem to all that you’ve sworn by throughout your career, it really is the way to go: business intelligence is an infant science, and different for every company.
You’ll have to fish around for the right data in the right format, and things will change often. BI is very “personal,” unique to your environment, your market, and your partnerships. What does this mean? First of all, it means you need to lock your database administrator in a room somewhere and break the news that the data warehouse data structures are going to change and change and change, as will the ETL procedures. There is no way around this. Make your peace with it now, and save both yourself and the DBA a lot of stress.
6. Put considerable front-end resources into data source analysis
You’re going to be stepping in it again and again as you wade through oceans of old data, in old databases, on old magnetic tape, from remote sources. Much of it will be dirty. Much of it will be hard to get to. You’re going to be doing a lot of this, and you’re going to be devising ETL procedures to seek out and retrieve information like this forevermore. You do yourself and the project a great service by establishing a method of doing this right the first time. Have your development people put in the extra time to explore old data thoroughly, characterize “dirty” data issues realistically, and to design and implement robust extraction and transformation procedures exhaustively. The ETL portion of a data warehouse can consume as much as 80 percent of your total project resources! Make sure you spend wisely.
7. Make diplomacy your high priority
The hottest hell you’ll burn in during a warehouse implementation will be the people hell, not the technology or the development. You’re going to have senior management whining about completion dates and murky objectives. You’re going to have development people griping that everything takes too long and why can’t they do it the old way? You’re going to have users with wildly unrealistic expectations, who are used to systems that require mouse-clicking but not much intellectual investment on their part. And you’re going to grow weary, parsing out Needs from Wants at all levels. Commit from the outset to work very hard at communicating the realities, encouraging investment, and nurturing the development of new skills in your team and your users (and even your bosses).
Most of all, keep smiling. When all is said and done, you’ll have a resource in place that will do magic, and your grief will be long past. Eventually, your smile will be effortless.
Scott Robinson is a 20-year IT veteran with extensive experience in business intelligence and systems integration. An enterprise architect with a background in social psychology, he frequently consults and lectures on analytics, business intelligence and social informatics, primarily in the health care and HR industries.