Every day we create reams of data in customer relationship management applications, order entry applications, and warehouse management systems. We're drowning in a sea of data. However, even with all that data we don't have a large amount of information. We have the ones and zeros of the transactions, but we don't have the answers we need to simple questions like:
- "Why was March better than February?"
- "Where is the sales force having the most success?"
- "In what conditions does the sales team struggle with making sales?"
In the 1990's, the talk was about decision support systems and executive dashboards. The maturation of those concepts is realized in online analytical processing (OLAP). OLAP is designed to convert data into usable information by allowing the aggregation of data. This process allows you to answer questions like these—even when you don't know what characteristics may be important to the question.
What is OLAP?
To understand what OLAP is, you must first understand a few OLAP terms. OLAP works on facts, and facts are numbers. A fact could be a count of orders, the sum of the order amounts, or an average of order amounts.
OLAP organizes facts into dimensions which are ways that the facts can be broken down. For instance, total sales might be able to be broken up by geography. Similarly, total sales might also be broken down by time. Dimensions are also hierarchies of levels. For instance, a geography dimension might contain the levels of country, state, and county. Similarly, a time dimension might be broken down by year, month, and day.
In using OLAP tools, you typically arrange one or more dimensions along the rows as well as the columns. You then place one or more facts in the data portion of the grid. The result looks very similar to a cross-tab spreadsheet. In our example above, perhaps we would have geography along the rows and time along the columns. The end result would be something that looks like Figure A.
A final OLAP term that you should know is cubes. Cubes are collections of facts and dimensions. In practical terms, you'll probably have a separate cube for production data and sales data because there is not a one-to-one relationship between all the facts in sales information and the production information. Within a single cube every fact must have a one-to-one relationship.
Why is OLAP valuable?
OLAP is valuable because of its flexibility. Once the facts and dimensions are defined within the OLAP server, OLAP tools provide an easy way to analyze data by simply dragging and dropping dimensions and facts into the appropriate locations.
Anyone who's ever tried to develop a cross-tab report will appreciate the simplicity of being able to drag the dimensions and facts into position. If you've never developed a cross-tab report, please note that it is not an easy task. Typically, a substantial amount of time is spent trying to figure out how to make the data convert into the rows and columns. The problem is that every change to the report requires a great deal of effort to execute. In contrast, with OLAP it's as simple as dragging a new dimension in place and removing existing dimensions.
Making the cross-tab report easier is certainly valuable, but it is not an end unto itself. The desired end result is to help transform data into information. It just so happens that many people approach the process of understanding their data as the development of cross-tab reports.
Additional white paper resources
- What is OLAP?
- OLAP and OLAP Server Definitions
- OLAP Server and Relational OLAP
- DB2 Cube Views: Getting Started With MicroStrategy
OLAP is useful in helping to determine why the data appears the way it does. For instance, if the sales for North America are way up or way down for a given quarter, it's easy to expand the North American geography into the states to see which state or states may be responsible for the difference.
By progressively expanding portions of levels within a dimension, it is possible to drill-down into progressively more detail, but only the detail that is necessary to spot a trend or a problem.
OLAP tools also allow a breakdown structure of the data where you start with a single piece of data and you dissect it into a series of levels looking at the data for something interesting. For instance, let's say that we're looking at a total sales number for the month of June. Breaking down this number might first occur by country as a member of the geography dimension. In seeing a larger than expected number for North America, you might choose to break the number down into product lines. From the product lines breakdown you may notice that your gold plated widget sales were up substantially. You decide to look at the sales by state so you select the gold plated widget product line and expand the state level of the geography hierarchy. Each level in the breakdown tells you more details about what makes up the number you're interested in.
Types of OLAP
Like everything else in technology, there's not a single way to solve the technology problem. There are three basic approaches to OLAP servers: MOLAP, ROLAP, and HOLAP.
Multidimensional Online Analytical Processing (MOLAP) is the most standard approach to OLAP solutions. It uses a multidimensional database which directly stores the information contained in the various cubes. This is the best performing solution when using SQL Server Analysis services.
Relational Online Analytical Processing (ROLAP) provides the same solution but uses a relational database for storage of the data. This approach translates native OLAP queries, written in a language called multidimensional expressions (MDX) into the appropriate SQL statements. This is primarily done to prevent the need for another copy of the data. The data created directly by the online transaction processing (OLTP) applications are used. The primary disadvantage to this solution is that it does not, generally speaking, perform as well as a MOLAP database.
Hybrid Online Analytical Processing (HOLAP) is a hybrid approach to the solution where the aggregated totals are stored in a multidimensional database while the detail data is stored in the relational database. This is the balance between the data efficiency of the ROLAP model and the performance of the MOLAP model.
If you're developing applications, you ultimately do not care which model is being used because your code will work against any kind of OLAP model (unless you're creating the cubes yourself). It is only a concern when a large scale project is being rolled out and you must decide which solution will meet its unique needs.
A matter of perspective
OLAP solutions provide a new perspective on the reams of data that are being created, and provide ways to transform that data into information and useful answers to important questions.