Data Centers

Beginning and planning your data-warehousing project

Determine what a data warehouse will accomplish for your enterprise before implementation.

Congratulations! You've been chosen to spearhead the creation of your organization's first data warehouse. What luck! So where do you start?

Because management is breathing down your neck for a budget estimate, it's tempting to begin your project by looking at the hardware and software necessary to create the warehouse; but that would be the wrong thing to do. Instead, you should begin by asking yourself why you're putting a data warehouse together in the first place.

Why a data warehouse?
The data warehousing process is first and foremost about making operational data accessible and easily and efficiently queried so that management can get answers to business questions. So the first step in creating a data warehouse is to determine what information is needed most. What does management want to know about? What are the most important business questions?

To obtain this information, you'll have to perform quite a few interviews. But who will you target for your interviews, and what kinds of questions will you ask? The answer to who you will target is broken down in two major categories: those who have the questions and those who know the data. We'll first concentrate on those who have the questions.

Questions
These individuals are generally the management of the organization, although they don't have to be. The trick is to start as high up in the organization as you can in the various functional areas, and work your way down. Begin with whoever asked you to create the data warehouse and work from there. Find out why the person is asking you to create a data warehouse. Did the person come up with the idea, or did it come from someone in a higher position?

It shouldn't take you long to get a handle on where to start. Next, put together a list of questions you need answered. You might be tempted to start out asking, "So, what do you want in your data warehouse?" but that isn't the road you want to go down. Many of the people you talk to may not even have a clue what a data warehouse is, so asking what they want in it will not get you the desired information.

Instead, begin with a question such as, "What business questions have you not been able to get an answer to, or what questions have you found it very difficult to get an answer to?" Another question might be, "What kind of information is most important to you, and why?" The point of these questions is to understand what data is most significant to the organization.

Go to the keepers of the data
Once you've determined what information is important, you can go to the keepers of the data—those who know the data inside and out. This will usually be the organization's DBAs, but it's not limited to them. There will be departmental subject-matter experts who know their own systems very well. These individuals will be the ones to tell you whether the information needed to answer the questions you gathered from your first set of interviews is currently being collected, and what shape that data is in. In short, you're determining the quantity and quality of your data sources.

Obviously you can't get data that isn't publicly available, so for your first data warehouse, concentrating on those data sources that are easily accessible and are of high quality will increase the chances of your success. Now that you have the information you're looking for and know that it is accessible and accurate, the next step is to decide how to get your data to the warehouse and what form it will take.

Design the data warehouse
Now comes the design/architecture phase of the process. If you've prepared well in outlining your goals for the data warehouse and taken a close look at your data sources, then the design phase will be easier. Going into the design phase with fuzzy requirements and a less-than-complete understanding of your data sources will complicate the process and add risk to your project.

The design phase can be broken down into two tracks that can run parallel or sequentially, depending on the time pressures of the project:
  • Designing the data warehouse database, extraction and presentation layer
  • Addressing technical infrastructure, production control, testing and certification, end-user training, etc.

Both tracks are affected by the type of data warehouse you plan to create and the software you choose to implement your design.

Data warehouse types
  • Relational online analytical processing (ROLAP)
  • Multidimensional online analytical processing (MOLAP)
  • Hybrid online analytical processing (HOLAP)


The actual design of a warehouse is complex and goes far beyond the scope of this article. However, there are many resources available to get you started on the right foot. Part of your budget for the project should be to hire some expertise in the creation of data warehouses.

Seek outside help
While it's prudent to check your in-house resources for the necessary skill set, if this is your first warehousing project, it's probably a good idea to hire some outside consulting help. There are thousands of consultants available to provide assistance in this area.

All data warehouses aren't created the same. There is great variability due to the management requirements of your warehouse as well as the quality of your source data. Make sure the consultant(s) you choose address this uniqueness. Additionally, unless your data warehouse is very small, it will more than likely add significantly to the workload of your existing database team. Make sure your team is sized to assume the additional responsibility.

Finally, prepare to design and create the warehouse over again. Why? Because a data warehouse project is an iterative process. Your organization is not stagnant. In order for your data warehouse to remain relevant, it will have to change as your organization changes. These changes may be the result of different questions needing to be answered as your organization changes, or modifications required due to changes in your source data.

Editor's Picks