Data Centers

Data marts deliver fast results, but proceed with caution

Unlike a data warehouse, which can cost millions and take years to implement, a data mart can produce results quickly and cheaply. But beware, because poorly conceived data marts could end up magnifying the information problems you were trying to fix.

An interesting chicken-and-egg argument over the relationship between the enterprise data warehouse and smaller data marts has been kicked around for some time now, dividing zealous business intelligence (BI) practitioners into two theoretical camps. The top-down camp argues that the data warehouse should be built first to ensure uniformity and compatibility across all data structures. The bottom-up approach purports that data marts should be implemented quickly for fast results and then combined later into an enterprise-wide data warehouse.

Both camps, it would appear, have logic and reason on their side. In this article, we’ll first define both data warehouses and data marts to better understand each side’s reasoning, and I’ll touch on a particular architecture that could bring these arguments to a close.

Defining data warehouses and data marts
BI systems consolidate an organization’s data into a single relational database, called a ”data warehouse,” as a critical part of the company’s information delivery process. Our Business Intelligence Conceptual Architecture diagram (Figure A) shows the place of the data warehouse within the BI schema.

Figure A
Business Intelligence Conceptual Architecture

A data warehouse physically resides on an industrial-strength relational system, such as Microsoft SQL Server, Oracle, or DB2. Best practices require that you structure this data according to a “dimensional data model” (also known as a “Star Schema”) to facilitate flexible retrieval and to effectively conceptualize the information’s structure. (See my previous article, "Thinking dimensionally aids business intelligence design and use.")

A data warehouse possesses five key characteristics:
  • Data is consolidated in the sense that it may come from multiple operational (online transaction processing, or OLTP) databases.
  • Data is “certified” to be of high quality. There are no credibility issues because low-quality data is upgraded before landing in the warehouse.
  • Data is read-only. It can’t be changed by the actions of end users.
  • Data is historical and represents a series of ”snapshots” depicting the state of your business at specific points in time. These snapshots consist of transformed source-data extracts taken at regular time intervals—e.g., hourly, daily, and weekly.
  • Data warehouses are meant to be a “one-stop shop” for enterprise information. They are frequently huge, commonly in the multigigabyte range.

In contrast, data marts incorporate all of the above characteristics except the last. Marts are designed to meet the demands of a specific group of knowledge workers and have a comparatively narrow subject area: a single department, operating area, or perhaps a specific, nagging business pain. Correspondingly, the “Sales Department,” the “St. Louis Division,” or “Defects Analysis,” for example, could each be a prospective data mart subject area.

Incidentally, narrow in focus doesn’t necessarily mean small in size. Data marts may contain millions of records and require gigabytes of storage.

The challenge of the data warehouse
The (often exceptional) benefits of the data warehouse are well documented and well understood. The ability of a data warehouse to provide not only a one-stop shop for information but also instant and direct access to that information have contributed to the success of world-class companies like Wal-Mart and Dell Computer. These highly successful enterprises have placed the data warehouse at the core of operations, decision support, and strategic planning.

Despite the enormous potential benefit of a data warehouse, implementing one from scratch can cost millions of dollars and take years to complete. Many such projects are abandoned altogether owing to complexity-related issues and the problems associated with sustaining an organization’s economic and political will as time drags and frustrations mount.

The data mart promise?
In contrast, a data mart can be built quickly—in weeks or months—at a cost considerably less than that of a data warehouse. Accordingly, the overall risk is low as well.

Choosing to build a data mart instead of a warehouse also reduces the need to coordinate activities and maintain cooperation across functional and operating units for long periods. Also, low levels of sponsorship within the organization will suffice (managers, not vice presidents), and, generally speaking, economic/political will isn’t a huge issue. Clearly, with the mart, benefits may be realized quickly and business pains eased swiftly.

Stand-alone data marts—given that they are correctly designed—can be combined with other data marts to serve as building blocks for the enterprise data warehouse. In this way, the data mart is said to be a subset of the enterprise data warehouse. As you can imagine, this bottom-up approach has a number of advantages when compared to tackling a huge data warehouse project head-on.

It may seem evident, from what I’ve said thus far, that the best plan of attack is to build data marts, beginning with those easing your biggest business pains, and consolidate them over time. Organizations discover, however, that ad hoc data marts built outside the context of the data warehouse can lead to waste and inefficiency.

The issue is simply this: Data marts can be so easy, so compelling, that every department, division, or team is off and running with their own—which, without collaboration, erases all hope of ever putting them together. Should that occur, you’re right back in the situation you’ve attempted to avoid: data redundancy, islands of information, disparate systems and interfaces, arguments over the ”right” numbers, etc.

The answer: Data warehouse bus architecture
Most organizations today are finding that the bottom-up approach to data mining best serves their needs. If your enterprise chooses to go this route, you’ll want to keep in mind while designing and implementing your data marts that they’ll eventually wind up as components of your enterprise data warehouse. This demands some very careful planning, lest you end up with a collection of information islands rather than the seamless and integrated information resource you sought in the first place.

One structured approach, called the Data Warehouse Bus Architecture, demands that your marts be constructed around a “master suite of conformed dimensions" and “standardized definitions of facts.” Data bus architecture allows data marts built anywhere in your enterprise to "plug into the bus” to receive the dimension and fact tables they need. This approach emphasizes a brief planning activity at the outset aimed at documenting the warehouse’s overall architecture. Your document should be specific enough so that you won’t have trouble implementing separate data marts, which adhere to the principles of the overall architecture. The greatest advantage of the data bus approach is that it can prevent different groups within the enterprise from building their own marts, which will end up destroying the benefits of integrating your enterprise data. Next week, I’ll talk about the data bus approach in detail.
Does your enterprise employ data marts? Have you built a data warehouse? What methods have you found helpful in recording, storing, and accessing data within your business unit or enterprise-wide? Send us a note or post a comment below.

Editor's Picks