The database organization process called normalization is a series of steps to guide you through analyzing and designing relations in your database schema; the goal is to try to reduce redundant data and inconsistent functional dependencies.
This series of rules originally developed by Dr. E. F. Codd is comprised of what are known as "normal forms." These forms are built upon one another sequentially, beginning with more top-level data organization and moving towards more granular rules.
To demonstrate what these forms can do, let's look at an example containing fields that might be present in a spreadsheet for tracking car sales. I will analyze this data and start to apply the normalization steps to derive a table structure. Figure 1 is the sample spreadsheet structure I will use for the example.
|
First normal form
This is the initial step in the process. The goals of this rule are to eliminate any non-atomic fields in individual tables, identify a unique set of fields in each table with a Primary Key, and create a separate table for each set of related data. By "non-atomic," I am looking for any fields that contain multiple values, such as full name fields or any lists separated by commas.
By looking at the spreadsheet structure, I can change a couple of things. First, I see that the OwnerName field is only one field; I will want to keep track of the owner first name and last name in separate fields if available. Also, this table has no Primary Key. The TransactionNumber is unique, so I can assign it as the Primary Key for now. See Figure 2.
|
Second normal form
I will create separate tables for sets of values that are associated with repeating groups of data. Once created, I can relate these tables with foreign key constraints. After this form has been completed, each non-Primary Key field in the table will be an attribute that describes that Primary Key. Before this rule can be completed, the database design must fully conform to the first normal form. See Figure 3.
|
| ||||||||||||
|
Applying the second normal form to the table I created from applying the first normal form, I arrive with two tables. The AutoSales table contains information regarding the sales transaction and owner, while the AutoSalesDetail table contains information regarding the vehicles involved in the transaction. As you can see, I no longer have any repeating fields that are suffixed with numbers from the first normal form table.
Third normal form
This is the final form that I will apply to the example. The goal of this rule is to eliminate any fields from tables that are not fully functionally dependent on the Primary Key of that table. I must ensure that each field in the table depends directly on the Primary Key. See Figure 4.
|
|
|
|
Applying the third normal form, I have arrived at four tables (although theoretically I could have five). Because the owner names are not directly dependent upon the TransactionNumber from the Transactions table, I have moved them to their own table named AutoOwners. I have created a Primary Key on this table that relates back to the Transactions table.
I also know that a CarType does not relate specifically to the Primary Key in the TransactionDetail table, although the VIN number does. Because the CarType relates specifically to the VIN number, I have created a separate table named Automobiles with the VIN number as the Primary Key. Also, because the CarType field in this table may be considered to be a type of lookup value, I could have created a separate table to house this value, but for simplicity's sake I neglected to do so. (Isn't normalization fun?)
The trade-off with normalization
Do you always normalize your database design to meet the third normal form requirements? Strict proponents of this approach say it is important to do so because they think a database in the third normal form is a fundamentally and academically sound structure that minimizes nulls and redundancy and saves space.
In my experience, a database in the third normal form is not very practical. This technique begins to fail when you need to perform data retrieval operations in your database, which you will surely have to do. The problem with a fully normalized approach (most database developers consider normalization to the third form fully normalized) is that it creates a lot of tables, which will usually require more joins to satisfy your queries. This becomes cumbersome for developers and leads to an escalation in query execution times.
I feel that the best solution is a hybrid approach. I typically try to design the database to be somewhere around the third normal form specification. Once there, start looking at the queries that will run on the system and denormalize as necessary. This strategic denormalization will save you valuable CPU cycles when you're in a crunch. Of course, there is always a trade off when you want to increase performance. You will give away space (data redundancy) for time. It's rare that anything is free in computing.
Tim Chapman is a SQL Server database administrator who works for a bank in Louisville, KY, and has more than 7 years of IT experience. He is also Microsoft certified in SQL Server 2000 and SQL Server 2005. If you would like to contact Tim, please e-mail him at chapman.tim@gmail.com.






