Big Data

Oracle Tip: Understand the difference between star and snowflake schemas in OLAP

Discover the difference between star and snowflake schemas in online analytical processing (OLAP). It's important to learn about these differences during application design or migration, rather than later when schema changes are more costly.

This article originally appeared in the Oracle e-newsletter. Click here to subscribe automatically.

At the core of data warehouse applications and OLAP (online analytical processing) is a specialized schema that relaxes the rules of Third Normal Form RDBMS schemas in favor of faster analysis and processing of large amounts of data. It's important to understand the difference between these kinds of schemas during application design or migration, rather than later when schema changes are more costly.

Most database developers encounter Third Normal Form schemas before any other schemas. The EMP/DEPT examples use Third Normal Form schemas as the basic pattern. In this form of database normalization, an object or entity is related to another object or entity by a primary key and foreign key on an individual basis. (The foreign key DEPTNO in the EMP table defines the relationship between EMP and DEPT tables.) Here is an example of a Third Normal Form database schema:

create table brands
(
    id number primary key,
    name varchar2(30) not null
);

create table products
(
    id number primary key,
    brand_id number not null,
    name varchar2(30) not null,
    foreign key (brand_id) references brands(id)
);

create table sales
(
    product_id number,
    when date,
    how_many number,
    foreign key (product_id) references products(id)
);

Normalization isn't always a good thing when dealing with large amounts of data. While it's ideal for data updates, inserts, deletes, and integrity, it can slow processing. To speed up processing, you can denormalize data into a star schema. A star schema consists of a set of object tables called dimensions, which stores attributes about a specific object, such as product, customer, or time. All of these tables join together through a single join table of facts. This kind of schema can be more natural to nontechnical end users who are more familiar with dealing with logical entities rather than entities and relationships. Here is the example from above, reimplemented as a star schema:

create table product_dims
(
    id number primary key,
    name varchar2(30) not null,
    brand varchar2(30) not null
);

create table time_dims
(
    id number primary key,
    when date
);

create table sale_facts
(
    product_id number not null,
    time_id number not null,
    how_many number,
    foreign key (product_id) references product_dims(id),
    foreign key (time_id) references time_dims(id)
);

Star schemas do have some inherent problems. For instance, the central fact table can grow very large, with an upper limit of the product of the number of rows in each dimension table. Also, the dimension tables are no longer normalized, so they are larger and harder to maintain with lots of duplicate data.

In my example above, brand was denormalized and needs to be maintained separately. To go back toward normalization a bit, there is also a snowflake schema, which is a star schema with some of the features of a Third Normal Form data. Here is the same oversimplified example, but in a snowflake schema:

create table brands2
(
    id number primary key,
    name varchar2(30) not null
);

create table product_dims2
(
    id number primary key,
    brand_id number not null,
    name varchar2(30) not null,
    foreign key (brand_id) references brands2(id)
);

create table time_dims2
(
    id number primary key,
    when date
);

create table sale_facts2
(
    product_id number not null,
    time_id number not null,
    how_many number,
    foreign key (product_id) references product_dims2(id),
    foreign key (time_id) references time_dims2(id)
);

The problem with snowflake schemas is that they may start to show signs of the performance problems of Third Normal Form queries. I don't recommend using snowflake schemas with Oracle's cost-based optimizer.

Scott Stephens worked for Oracle for more than 13 years in technical support, e-commerce, marketing, and software development. For more of his Oracle tips, visit our Oracle Dev Tips Library.

Editor's Picks

Free Newsletters, In your Inbox