This article originally appeared in the Oracle e-newsletter. Click here to subscribe automatically.
At the core of data warehouse applications
and
OLAP (online analytical processing) is a specialized schema
that relaxes the rules of Third Normal Form RDBMS schemas in favor
of faster analysis and processing of large amounts of data. It’s
important to understand the difference between these kinds of
schemas during application design or migration, rather than later
when schema changes are more costly.
Most database developers encounter Third Normal
Form schemas before any other schemas. The EMP/DEPT examples use
Third Normal Form schemas as the basic pattern. In this form of
database normalization, an object or entity is related to another
object or entity by a primary key and foreign key on an individual
basis. (The foreign key DEPTNO in the EMP table defines the
relationship between EMP and DEPT tables.) Here is an example of a
Third Normal Form database schema:
create table brands
(
id number primary key,
name varchar2(30) not null
);
create table products
(
id number primary key,
brand_id number not null,
name varchar2(30) not null,
foreign key (brand_id) references
brands(id)
);
create table sales
(
product_id number,
when date,
how_many number,
foreign key (product_id) references
products(id)
);
Normalization isn’t always a good thing when
dealing with large amounts of data. While it’s ideal for data
updates, inserts, deletes, and integrity, it can slow processing.
To speed up processing, you can denormalize data into a star
schema. A star schema consists of a set of object tables called
dimensions, which stores attributes about a specific object, such
as product, customer, or time. All of these tables join together
through a single join table of facts. This kind of schema can be
more natural to nontechnical end users who are more familiar with
dealing with logical entities rather than entities and
relationships. Here is the example from above, reimplemented as a
star schema:
create table product_dims
(
id number primary key,
name varchar2(30) not null,
brand varchar2(30) not null
);
create table time_dims
(
id number primary key,
when date
);
create table sale_facts
(
product_id number not null,
time_id number not null,
how_many number,
foreign key (product_id) references
product_dims(id),
foreign key (time_id) references
time_dims(id)
);
Star schemas do have some inherent problems.
For instance, the central fact table can grow very large, with an
upper limit of the product of the number of rows in each dimension
table. Also, the dimension tables are no longer normalized, so they
are larger and harder to maintain with lots of duplicate data.
In my example above, brand was denormalized and
needs to be maintained separately. To go back toward normalization
a bit, there is also a snowflake schema, which is a star schema
with some of the features of a Third Normal Form data. Here is the
same oversimplified example, but in a snowflake schema:
create table brands2
(
id number primary key,
name varchar2(30) not null
);
create table product_dims2
(
id number primary key,
brand_id number not null,
name varchar2(30) not null,
foreign key (brand_id) references
brands2(id)
);
create table time_dims2
(
id number primary key,
when date
);
create table sale_facts2
(
product_id number not null,
time_id number not null,
how_many number,
foreign key (product_id) references
product_dims2(id),
foreign key (time_id) references
time_dims2(id)
);
The problem with snowflake schemas is that they
may start to show signs of the performance problems of Third Normal
Form queries. I don’t recommend using snowflake schemas with
Oracle’s cost-based optimizer.
Scott Stephens worked for Oracle for more than 13 years in technical support, e-commerce, marketing, and software development. For more of his Oracle tips, visit our Oracle Dev Tips Library.