Every day we create reams of data in customer relationship
management applications, order entry applications, and warehouse management
systems. We’re drowning in a sea of data. However, even with all that data we
don’t have a large amount of information. We have the ones and zeros of the
transactions, but we don’t have the answers we need to simple questions like:
- “Why
was March better than February?” - “Where
is the sales force having the most success?” - “In
what conditions does the sales team struggle with making sales?”
In the 1990’s, the talk was about decision support systems
and executive dashboards. The maturation of those concepts is realized in online
analytical processing (OLAP). OLAP is designed to convert data into usable
information by allowing the aggregation of data. This process allows you to
answer questions like these—even when you don’t know what characteristics may
be important to the question.
What is OLAP?
To understand what OLAP is, you must first understand a few
OLAP terms. OLAP works on facts, and facts are numbers. A fact could be a count
of orders, the sum of the order amounts, or an average of order amounts.
OLAP organizes facts into dimensions which are ways that the
facts can be broken down. For instance, total sales might be able to be broken
up by geography. Similarly, total sales might also be broken down by time.
Dimensions are also hierarchies of levels. For instance, a geography dimension
might contain the levels of country, state, and county. Similarly, a time
dimension might be broken down by year, month, and day.
In using OLAP tools, you typically arrange one or more
dimensions along the rows as well as the columns. You then place one or more
facts in the data portion of the grid. The result looks very similar to a
cross-tab spreadsheet. In our example above, perhaps we would have geography
along the rows and time along the columns. The end result would be something
that looks like Figure A.
Figure A |
![]() |
Cross-tab spreadsheet |
A final OLAP term that you should know is cubes. Cubes
are collections of facts and dimensions. In practical terms, you’ll probably
have a separate cube for production data and sales data because there is not a
one-to-one relationship between all the facts in sales information and the
production information. Within a single cube every fact must have a one-to-one
relationship.
Why is OLAP valuable?
OLAP is valuable because of its flexibility. Once the facts
and dimensions are defined within the OLAP server, OLAP tools provide an easy
way to analyze data by simply dragging and dropping dimensions and facts into
the appropriate locations.
Anyone who’s ever tried to develop a cross-tab report will
appreciate the simplicity of being able to drag the dimensions and facts into
position. If you’ve never developed a cross-tab report, please note that it is
not an easy task. Typically, a substantial amount of time is spent trying to
figure out how to make the data convert into the rows and columns. The problem
is that every change to the report requires a great deal of effort to execute.
In contrast, with OLAP it’s as simple as dragging a new dimension in place and
removing existing dimensions.
Making the cross-tab report easier is certainly valuable,
but it is not an end unto itself. The desired end result is to help transform
data into information. It just so happens that many people approach the process
of understanding their data as the development of cross-tab reports.
Additional white paper resources
- What is OLAP?
- OLAP and OLAP Server Definitions
- OLAP Server and Relational OLAP
- DB2 Cube Views: Getting Started With MicroStrategy
OLAP is useful in helping to determine why the data appears
the way it does. For instance, if the sales for North America are way up or way
down for a given quarter, it’s easy to expand the North American geography into
the states to see which state or states may be responsible for the difference.
By progressively expanding portions of levels within a
dimension, it is possible to drill-down into progressively more detail, but
only the detail that is necessary to spot a trend or a problem.
OLAP tools also allow a breakdown structure of the data
where you start with a single piece of data and you dissect it into a series of
levels looking at the data for something interesting. For instance, let’s say
that we’re looking at a total sales number for the month of June. Breaking down
this number might first occur by country as a member of the geography
dimension. In seeing a larger than expected number for North America, you might
choose to break the number down into product lines. From the product lines
breakdown you may notice that your gold plated widget sales were up
substantially. You decide to look at the sales by state so you select the gold
plated widget product line and expand the state level of the geography
hierarchy. Each level in the breakdown tells you more details about what makes
up the number you’re interested in.
Types of OLAP
Like everything else in technology, there’s not a single way
to solve the technology problem. There are three basic approaches to OLAP
servers: MOLAP, ROLAP, and HOLAP.
Multidimensional
Online Analytical Processing (MOLAP) is the most standard approach to OLAP
solutions. It uses a multidimensional database which directly stores the
information contained in the various cubes. This is the best performing
solution when using SQL Server Analysis services.
Relational Online
Analytical Processing (ROLAP) provides the same solution but uses a
relational database for storage of the data. This approach translates native
OLAP queries, written in a language called multidimensional expressions (MDX)
into the appropriate SQL statements. This is primarily done to prevent the need
for another copy of the data. The data created directly by the online
transaction processing (OLTP) applications are used. The primary disadvantage
to this solution is that it does not, generally speaking, perform as well as a
MOLAP database.
Hybrid Online
Analytical Processing (HOLAP) is a hybrid approach to the solution where
the aggregated totals are stored in a multidimensional database while the
detail data is stored in the relational database. This is the balance between
the data efficiency of the ROLAP model and the performance of the MOLAP model.
If you’re developing applications, you ultimately do not
care which model is being used because your code will work against any kind of
OLAP model (unless you’re creating the cubes yourself). It is only a concern
when a large scale project is being rolled out and you must decide which
solution will meet its unique needs.
A matter of perspective
OLAP solutions provide a new perspective on the reams of
data that are being created, and provide ways to transform that data into
information and useful answers to important questions.