In the first part of this data warehousing series, I noted the structure of database storage for a data warehouse is very different from that of a conventional online transaction processing system (OLTP). The reason? OLTPs have many users and typically handle distinct data objects in any given session, whereas online analytical processing systems (OLAP) have fewer users pulling in much larger amounts of data. Therefore, from a design standpoint, it’s important to create structures that put related data into physical proximity, so that it can be scooped up in a minimum number of reads.
The ideal structure for such storage is a star schema, realized as a multidimensional cube. In such a structure, tables are related not in the relational manner of an RDBMS, but by a central key table that defines the dimensions along which data will be gathered from logically-related tables. For this to work, data must be denormalized. The redundancy in such an arrangement is in the central table. A safe rule of thumb is to say that everything you would do in configuring an OLTP database is the opposite of what you want to do in configuring an OLAP database. Here’s how it would work in Microsoft SQL Server 2000.
Loading and retrieving with a MOLAP database
MOLAP means Multidimensional OLAP. That’s just another name for an analytical cube database. SQL Server 2000 has a suite of utilities called Analysis Services that facilitate the construction of an analytical cube database, tools for its management, and software for user access. This suite of tools was originally developed for OLAP Services in SQL Server 7.0, and can be used for data mining as well as OLAP.
Your first step is to access Analysis Manager. It’s under Start | Programs | Microsoft SQL Server| Analysis Services. You’ll be given a left pane showing any available resources, with details on the right pane (you will create an initial ODBC Data Source Name at the time you install Analysis Services). You administer it—i.e., configure drivers, etc. – from Control Panel | Administrative Tools | Data Sources. Under the System DSN tab, available databases are listed. You can add and remove data sources from this screen.
Data sources are the tables in your conventional transaction system (usually archived data, or online history tables) that hold the information that will migrate into your data warehouse. You’ll make use of these data sources to provide data in your warehouse initially, and you’ll use them again on a regular basis to add data to the warehouse as necessary.
I'm skipping over Extract-Transform-Load in this discussion; for an overview of this important phase, see my other TechRepublic article on data warehouse tactical chores.
Once you've added or removed the appropriate data sources, you are ready to create an OLAP database in Analysis Manager. Go back to the original left pane that lists the name of your analytical database server. Right-click on it, then select New Database to name and describe it. It is empty, of course, containing no data structures yet. You can now add data to the database, such as tables described above, in cube form: right-click on Cubes in the left pane, beneath the database just created. Then choose New Cube. A wizard and an editor will be offered. Choose the wizard and follow its lead. You’ll be guided through selecting a Data Source, including Data Link Properties (a list of Microsoft choices are offered) and a Connection to the database server.
The wizard will now let you define a "fact" table, or context array that specifies the depth and direction of your cube. Once you do that, you’re ready to define the cube’s dimensions.
This screen contains a Browse Data option that allows you to actually page through records at this point, if you like.
The wizard will now assist you in defining fact table columns, cube dimensions (via the New Dimension button, which launches the nested Dimension Wizard), and schema selection.
Dimension Wizard now assists you in the continued construction of your cube using those tables (and nested fields) you’ve specified. You then can designate key columns.
You can change any of these options later, using the Advanced Options and the Cube Editor.
Finally, the Dimension Wizard will give you a storage mode option. Choose Store as multidimensional OLAP (MOLAP). When you’re done, you’ll go back to the cube wizard and finish its construction. You want to use this for the Storage Design Wizard, which allows you to optimize the cube’s storage in the database for enhanced performance. You can then establish security for the cube through the Cube Roles item in the left-pane tree of Analysis Manager.
Reading it back
The power of SQL Server 2000’s OLAP implementation is that, in creating a data cube, you create a permanent access object configured to that cube. You can now browse the cube through the Browse Data option in Analysis Manager, and query the database for large batches of data. You can then drag-and-drop by dimension to build whatever body of data you need for analysis.
This represents your efficiency: to put data in and get data out using an optimized structure that creates the view of the data that is most useful for analysis. Tables are related with minimal fuss, data is stored sequentially and in physical proximity, and reads are minimized. This greater efficiency is a home run for both IT Consultants and their clients.
In the course of this discussion, I’ve neglected metadata and the repository, which are essential concepts in warehouse features. I’ll return to them in a future article.
If you’re working with a warehouse build atop an ERP system, like SAP R/3 or Oracle, you’ll find some differences in your build tools, but the concepts and structures available to you will be the same. Put careful thought into how they would take form in your environment, and you’re on your way to an efficient, responsive design.
Scott Robinson is a 20-year IT veteran with extensive experience in business intelligence and systems integration. An enterprise architect with a background in social psychology, he frequently consults and lectures on analytics, business intelligence and social informatics, primarily in the health care and HR industries.