Data Centers

The seven deadly sins of data warehouse implementation development

Conventional OLTP techniques that worked in the past may turn out to be big mistakes.

Data warehouse implementation is a formidable undertaking. Most of the experience you bring to the task won’t fit a data warehouse’s unique requirements and challenges. There are several things you might ordinarily do that you should steer clear of when working on a data warehouse.

What follows is a list of things you can do to really screw things up. What you might find odd about this list is some of the items might not seem so bad. However, a data warehouse isn’t a transactional system; it conforms to no particular standard, does not implement a particular application, and is very organic in nature. In short, the data warehouse you are about to build is unique to the company that’s building it, and you may find yourself inventing new ways to do what it is that you do in order to make it happen. And the best way to find out what works is to maneuver around what won’t work. Here are some don’ts you should take seriously.

1. DON'T write code you can't modify quickly
The apps you’ll be accommodating will be analytical, not transactional. The users working with you to spec them out really don’t know exactly what they want—so you may go through several iterations before you get where you’re going. If you write well-constructed, flexible code, assuming it will change, your life will be easier. Write fly-by-wire code and you’ll regret it.

2. DON'T use a database access API that won't allow modifications
In the past, your database work has accommodated apps that accessed moderate amounts of data, for a potentially large pool of users. Now, you’re accommodating apps that will pull in huge amounts of data. You need to write code that will get the maximum amount of data with each read, and you’re not going to get that right the first time. So you need to choose tools that permit you to revise quickly and radically.

3. DON'T design anything that isn't extendable
Analytics aren’t really applications in the sense that Online Transaction Processing (OLTP) apps are. The point of analytics is to take large amounts of old data, find patterns in the data, and infer new information (intelligence) from the patterns. The code you write to access the underlying information may require stretches to include additional data, which will require joins. Don’t write code supporting analytics that fails to assume this will be the case!

4. DON'T insinuate anything between the data and the user unnecessarily
A warehouse needs to be exactly that. A user needs to be able to walk into the warehouse and pull information off the shelf. Because of the nature of business intelligence, analytics, and the metrics, your client's users are seeking to gauge performance. The user needs an environment that permits them to pick and choose data they wish to include in their analyses, whatever it might be. You can’t always accommodate this ideal, but you must do your best. Don’t add anything to their analytical apps that will make warehouse data access any harder!

5. DON'T take shortcuts on data cleanup or source analysis
The single biggest black hole you’ll encounter will be analyzing data sources for the Extract-Transform-Load mechanism, and the act of cleaning up data for loading. It is safe to assume the project manager will budget more than half of the total project resources for this phase alone. To be blunt, if you take shortcuts here, you will most certainly burn for it later. Don’t skimp on cleaning up dirty data, no matter how dull the work.

6. DON'T avoid granularity and partitioning issues
The two biggest data storage issues in warehouse design are settling transformed data at the proper level of granularity and partitioning data categorically. Why is this so important? Because the total warehouse volume shifts geometrically in response to granularity, and the efficiency of data access is directly proportional to the effectiveness of your partitioning. This is difficult grunt work, but it is critical. Don’t try to step around it.

7. DON'T try to work OLAP without asking business questions
Your client's users don’t really know what they want out of their apps until they see it. There will be lots of trial and error as they fish for the analytical result that will honestly deliver the performance metrics or forecasting that will make a difference in the way their department or the company does business. You don’t stand a chance of contributing to this process or catching their mistakes, if you don’t go beyond your role as IT accommodator and learn as much as possible about how their department (and, by extension, the company as a whole) functions. In conventional OLTP development, you can count on those around you to mind the business picture. In Online Analytical Processing (OLAP), everything is exploratory, and the people around you won’t necessarily catch mistakes that result from your misunderstanding. So, don’t assume you know more than you do. Ask the extra questions that will ensure you really do have a handle on the "business" in "business intelligence."

About

Scott Robinson is a 20-year IT veteran with extensive experience in business intelligence and systems integration. An enterprise architect with a background in social psychology, he frequently consults and lectures on analytics, business intelligence...

0 comments