Data Management

Relational databases: Achieving normalization

Even though your database design meets First Normal Form (1NF), you aren't finished yet. Find out how to apply the other normalization rules: 2NF, 3NF, and BCNF.

You'll probably spend more time normalizing your data than any other single task in the design stage. And the more data you have, the longer the process takes. With experience, you may find that the hardest part is meeting all the requirements for First Normal Form (1NF), because moving repeating values to another table often removes inappropriate dependencies.

So you might be tempted to stop after 1NF, but don't. Continue to normalize data through Second Normal Form (2NF), Third Normal Form (3NF), and even Boyce-Codd Normal Form (BCNF) if necessary. Doing so will find those dependent data elements that would otherwise slip through the cracks of your design and cause problems later. The time to discover problems is while you're still designing—not later when those problems are keeping your users from getting their work done and costing you money.

Part of a series
You’re in the middle of the Builder.com Relational Database Design series. If you’ve missed the first three installments, read them first:
Relational databases: The inspiration behind the theory
Relational databases: Using normal forms to create databases
Relational databases: Applying the First Normal Form


In the last installment of this series, we began with a single table and worked through making it conform to 1NF. This table wound up becoming four tables. Let's finish the normalizing process by applying 2NF, 3NF, and BCNF.

Picking up where we left off
The sample database, when completed, will store data about books—it's a simple purpose that will require a simple database. At this point, there are four tables, all normalized to 1NF (remember that key fields are indicated by asterisks):
 
Books: {*Title, *ISBN, Price}
Authors: {*FirstName, *LastName, *State, *ZIP}
Categories: {*Category, Description}
Publishers: {*Publisher}
 

Applying 2NF
To meet 2NF, the table must be normalized to 1NF, that is, there can be no multivalued items, there can be no repeating groups, each field must be atomic, and each table must contain a key. So far, the tables all seem to meet this requirement. The second requirement for 2NF is that all fields (formally referred to as attributes during the design stage) must depend on the primary key—and only the primary key. Right now, all our attributes appear to meet 2NF without doing anything else to them.

On the other hand, let's suppose that the Books table also stores a number of attributes that describe individuals who borrow your books. You might get away with including a lent date attribute in the Books table. However, including the rest of the data—the person's name, address, and so on, would violate 2NF, because the information about lending doesn't fully support or describe the actual book itself.

Applying 3NF
Once a table is normalized through 2NF, you can begin checking for 3NF violations. This form requires that all fields be mutually independent. Any field that depends on a nonkey field must be moved to another table. The easiest way to catch 3NF violations is to see whether changing the value of each attribute makes data held by other attributes suddenly invalid. This simple test won't catch all violations, but it's a good place to start.

The Authors table has a possible violation: If you change the State value, you might need to also update ZIP, and vice versa. For instance, if an author moves to a new state, you must change both values. You should avoid this type of dependency by moving the State attribute to a new table as follows:
 
Authors: {*FirstName, *LastName, *ZIP}
States: {*State}
 

As result of this change, each author will have a ZIP value, and some of these values may be repeated, but the States table will contain only one entry for each state. You might have to update the ZIP value if an author moves, but you could simply relate your record to a different state, accordingly. Or, you might have to enter a new state value, but you won't repeat state values.

At the moment, it looks like you're creating a lookup table. Later the tables will relate to one another via their primary and foreign key values, but until then, following the logic of this last decision can be difficult. Don't worry if the picture isn't as clear as you like; just concentrate on the rules for now.

Now you have five tables, all of which are normalized to 3NF:
 
Books: {*Title, *ISBN, Price}
Authors: {*FirstName, *LastName, *ZIP}
Categories: {*Category, Description}
Publishers: {*Publisher}
States: {*State}
 

Or, are they? One attribute seems to have slipped through the cracks—ZIP in the Authors table. I’ve already mentioned that ZIP values could be repeated. In such a simple application, leaving ZIP in the Authors table is probably acceptable; the database will probably run efficiently anyway. However, the table isn't fully normalized, so we’ll go ahead and move ZIP to a new table. After moving ZIP, we have six tables:
 
Books: {*Title, *ISBN, Price}
Authors: {*FirstName, *LastName}
ZIPCodes: {*ZIP}
Categories: {*Category, Description}
Publishers: {*Publisher}
States: {*State}
 

You don’t need to be normalized to be efficient
Not every table has to be fully normalized to be efficient. In other words, it's acceptable to sometimes denormalize a table if you find that doing so makes a database more efficient.

Applying BCNF
BCNF is really just a subrule of 3NF, and many developers consider it overkill and stop normalizing at 3NF. Some argue that enforcing it actually degrades performance. With today's powerful systems, performance probably isn't the serious consideration it once was unless you're trying to relate millions of records. So you don't always need to include BCNF. You'll have to weigh the gains in flexibility that a fully normalized database offers against any performance hit that normalizing to BCNF might incur.

BCNF states that there can be absolutely no chance that an attribute might depend on a nonkey field. Our tables seem to meet this requirement at this point, so let’s shake things up a bit by adding a City attribute to the States table:
 
States: {*City, *State}
 

Each city and state record will be unique and relate to a ZIP value. However, the state values now present a repeating group, since it stands to reason that there will be more than one city from each state in the database. The solution is to move the City attribute to its own table as follows:
 
Cities: {*City}
States: {*State}
 

Although the City and State attributes are an inappropriate dependency, it's really the repeating values that force the new Cities table. That’s something you’d normally catch while enforcing 1NF, but BCNF is a good place to catch errors that make it this far. Often, although a dependency issue is what catches your eye, the truth is the attribute should never have made it to BCNF.

After normalizing the single table initially created through BCNF, you have seven tables:
 
Books: {*Title, *ISBN, Price}
Authors: {*FirstName, *LastName}
ZIPCodes: {*ZIP}
Categories: {*Category, Description}
Publishers: {*Publisher}
States: {*State}
Cities: {*City}
 

Don't worry that the number of tables seems to have multiplied like rabbits. In fact, we're not finished yet. We may even add to the table count in the next installment of this series, where I’ll talk about primary and foreign key fields and how they establish relationships between multiple tables.

About

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

0 comments

Editor's Picks