Relational database theory may have been the salvation of storage system pioneers of the 60s and 70s, but it’s since become the bane of many database developers, precisely because modern database systems have become so good at hiding their relational underpinnings from the developer. A well-designed relational database is easy to use, flexible, and protects the validity of the data. A bad design on the other hand will still be quite functional, but will ultimately lead to frustration and possibly erroneous or missing data.

Developers have certain rules, known as normal forms, that they follow to create well-designed databases. Here, I’ll examine normal forms through the creation of a simple database for storing information about a collection of books.

Part two in a series

This is part two in a series dedicated to relational database design. If you missed it, you can check out part one, which explained the motivations behind the development of relational theory.

Determine the entities and elements
The first step to designing a database is to do your homework and determine the entities you’ll need. An entity is a conceptual collection of one type of data. It’s not unusual to start out with just one or two entities and grow the list as you normalize your data. For our sample database, it looks as though we’ll need just one entity, books.

After determining the list of entities you’ll need, you’ll next need to create a list of data elements (i.e., the information you’ll store) for each entity. There are a number of ways to collect this information, but the most effective is probably to rely on your users. Ask your users about their current routine, and ask to see any forms and reports they’re currently using to get their jobs done. For instance, an order form might list many of the data elements you need to accommodate in a purchasing application.

There are no paper forms or reports to work from for our book entity, but the following list of elements is a good start:
{Title, Author, ISBN, Price, Publisher, Category}

It’s important to note that the process of moving from entity to element that we used here doesn’t work in all situations. The entities you’ll require will not always be as clear as in our book collection example, so you may have to begin with a long list of data elements, which you’ll later divide by entity.

First steps of normalization
Once you have a list of entities (tables) and data elements (fields), you’re ready to put relational theory to work. The main thrust of the theory is normalization—the process of removing any repeating groups and redundant data and placing them into two or more related tables. It isn’t absolutely necessary that you have more than one table, but the chance is slim that your data is so simple that you end up with just one table.

You should carefully review your entity and element lists for data that occurs in several records and for inappropriate dependencies, and move the offending field to a different table. For instance, you might list several books by the same author, repeating that author’s name in your database. When you believe you’re going to see the same data value occur over and over, you should consider moving that field to another table.

Keep in mind that at this point, you’re only working with lists of potential tables. You should not be actually building those tables: Stick to pen and paper lists for now.

Introducing the normal forms
The process of normalizing a database is so well known that there are formal rules governing how a normalized database should be structured. There are seven of these rules, known as normal forms, in all, but the first four are adequate most of the time:

  • First Normal Form (1NF)—This rule has several requirements, including that there are no multivalued items or repeating groups; that each field is atomic, meaning each field must contain the smallest data element possible; and that the table contains a key.
  • Second Normal Form (2NF)—The table must be normalized to 1NF. All fields must refer to (or describe) the primary key value. If the primary key is based on more than one field, each nonkey field must depend on the complex key, not just one field within the key. Nonkey fields that don’t support the primary key should be moved to another table.
  • Third Normal Form (3NF)—The table must meet 1NF and 2NF requirements. All fields must be mutually independent. Any field that describes a nonkey field must be moved to another table.
  • Boyce-Codd Normal Form (BCNF)—There must be no possibility of a nonkey dependent field occurring. This rule is really a subrule of 3NF and supposedly catches dependencies that might otherwise sneak through the process. It’s rather abstract and can be difficult to apply at first.

While the above rules are precise, technical definitions, the rules of normalization can be simplified into the following:

  • Each field must be as small as possible.
  • Each field can contain only one data item.
  • Each record should be unique.
  • Watch for repeating entries.
  • Each field must fully support the primary key, and only the primary key.

What’s next?
Applying the normalization rules, particularly the multipart 1NF, can be a tricky process. As you’ll see in the next installment, where I’ll begin actually applying the normal forms to the sample database, you may wind up revisiting 1NF after you’ve performed the other normalization steps.