Recognizing that most organizations are rich in data but poor in information, the central problem a business intelligence (BI) system seeks to solve is spinning the organization’s data “straw” into information gold. The acid test of success takes the form of a simple question: Are you getting the right information to the right people at the right time in the right way to advance the organization’s strategic objectives? This sounds like a cliché, but it’s important that you ask this question often while planning your system.
In my last article, I talked about the common architecture of the BI solution (as shown in Figure A below). You can see from this figure that the multidimensional cube is the heart of the system. Activities up to and including building the cube may be referred to as “back-room” activities, while activities concerned with distributing the cube’s information to end users may be termed “front-room” activities. In this article, I’ll touch on the back-room activities and issues of the BI solution. (Front-room activities will be the subject of a future article.) Keep in mind that these articles are intended as introductory since either of the topics could easily fill a volume.
|This is the conceptual architecture common to the business intelligence (BI) solution.|
Warehouses and marts
As a point of qualification, although I use the term “data warehouse,” the architectural concepts apply equally to “data marts,” although the latter does differ in focus and scope. Data warehouses tend to be all-inclusive in scope and aimed at the enterprise, and they can be very complex. Marts, on the other hand, are simpler and more narrowly focused on a specific business area. The effectiveness of your organization’s marketing campaigns is best gauged by a data mart, for example.
Keep in mind that when designed properly, individual data marts may be combined over time to create an enterprise data warehouse. The interim use of individual data marts is often recommended, however, because it gives you the ability to quickly bring critical information online in a less complicated manner than would be possible through the creation of a data warehouse.
Don’t miss the first two parts of this series:
- “The intelligent organization: An introduction to BI”: In this article, Pratte talks about what BI can do for an organization and offers some examples of how BI is used within the enterprise.
- “Use this architecture to structure your business intelligence solutions”: Pratte outlines the activities, application types, and data structures necessary to process an enterprise’s raw data into useful information.
The process of building a BI solution
At the conceptual level, the process of building the BI solution is quite simple and it begins with copying source data, often from multiple operational systems, into a single relational database, or a data warehouse. You might think of the data warehouse as containing a series of “snapshots,” taken at regular time intervals, which represent accurately the state of some business measure. It’s easy to see that by combining individual snapshots of dailysales, for example, you can quickly retrieve weekly or monthly views. (Note also that because the data warehouse is a historical record, its data is retrievable solely as read-only information.)
In the past, the back-room process ended with populating the relational data warehouse, and complex SQL queries were run directly against it. In modern implementations, however, you’ll want to convert the relational warehouse into the multidimensional cube.
This begs the obvious question: If the cube provides the same data as its underlying relational data warehouse, why is it needed? The cube provides several advantages, such as:
- Faster query response.
- A more intuitive query language. (SQL can become unwieldy when dealing with multidimensional structures.)
- Built-in security controls (which individuals have access to what data).
Cubes also permit flexibility as to how the bits are physically distributed across systems, enabling both storage and access to be finely tuned. Finally, on the subject of storage, cubes generally contain compressed data, allowing for efficient utilization of online storage.
Source data quality issues
It’s my bet that source data quality issues will be your biggest headache and these will have to be resolved before you can develop anything particularly useful or meaningful from the information. Here’s an example of what I mean: One of my clients wants to determine trends associated with the cost of cleaning specific pieces of industrial equipment. The source data contains only free-form text “description” fields as equipment identifiers. The problem is, standard equipment descriptors haven’t been used. I’ve encountered descriptors like "No. 32 Reactor" and "Reactor, Number 32," both referring to the same equipment. Clearly, this lack of a unique identifier creates a problem when you want to “slice” the data by individual pieces of equipment.
In this case, we can either go back and manually fix the descriptors (probably 600,000 records) or consider fixing only the data-entry system so the go-forward data is correct. Ultimately, the requirements of the business will determine how much pain we’re willing to endure and which approach we’ll eventually take.
The point here is that this sort of problem is more common than not—you’ll almost always have to do some form of creative data validation or transformation when copying data from source databases into the data warehouse. You can, however, enlist the help of extraction, transformation, and loading (ETL) tools, of which Microsoft’s Data Transformation Services (or DTS, which come as part of SQL 2000) is an excellent choice.
Keep your eye on the ball
Back-room activities revolve mostly around designing and populating the data warehouse. From there, it’s a fairly straightforward process using current tools and technologies to build the multidimensional cube.
By now, it should be clear that during the design of the data warehouse, you’ll want to constantly look back and forth across the big picture to keep from painting yourself in a corner.
Always keep one eye on the business issues—the problems you’re trying to solve—and the other on the realities of source data that, more than anything else, ultimately determine what you can and can’t do.