Data Management

Relational databases: Defining relationships between database tables

Database normalization is the cornerstone of database theory. Once a database is normalized, relationships between the data in multiple tables must be established.

A hefty part of designing a relational database is dividing the data elements into related tables. Once you're ready to start working with the data, you rely on relationships between the tables to pull the data together in meaningful ways. For instance, order information is useless unless you know which customer placed a particular order. By now, you probably realize that you don't store customer and order information in the same table. Instead, you store order and customer data in two related tables and then use a relationship between the two tables to view each order and its corresponding customer information at the same time. If normalized tables are a relational database's foundation, then relationships are the cornerstone.

Relational database design series
You’re in the middle of the Builder.com relational database design series. The previous installments in this series are:
"Relational databases: The inspiration behind the theory"
"Relational databases: Using normal forms to create databases"
“Relational databases: Applying the First Normal Form”
“Relational databases: Achieving normalization”


Starting point
The following data is used for demonstration in this article. The process of normalizing the data through Boyce-Codd Normal Form (BCNF) produced seven related tables:
Books: {Title*, ISBN, Price}
Authors: {FirstName*, LastName*}
ZIPCodes: {ZIPCode*}
Categories: {Category*, Description}
Publishers: {Publisher*}
States: {State*}
Cities: {City*}

Now it's time to establish just how those tables are related to one another.

Relationship types
You share many relationships with members of your family. For instance, you and your mother are related. You have only one mother, but she may have several children. You and your siblings are related—you may have many brothers and sisters and, of course, they'll have many brothers and sisters as well. If you're married, both you and your spouse have a spouse—each other—but only one at a time. Database relationships are very similar in that they're associations between tables. There are three types of relationships:
  • One-to-one: Both tables can have only one record on either side of the relationship. Each primary key value relates to only one (or no) record in the related table. They're like spouses—you may or may not be married, but if you are, both you and your spouse have only one spouse. Most one-to-one relationships are forced by business rules and don't flow naturally from the data. In the absence of such a rule, you can usually combine both tables into one table without breaking any normalization rules.
  • One-to-many: The primary key table contains only one record that relates to none, one, or many records in the related table. This relationship is similar to the one between you and a parent. You have only one mother, but your mother may have several children.
  • Many-to-many: Each record in both tables can relate to any number of records (or no records) in the other table. For instance, if you have several siblings, so do your siblings (have many siblings). Many-to-many relationships require a third table, known as an associate or linking table, because relational systems can't directly accommodate the relationship.

Establishing relationships
By the time you get around to establishing relationships between the related tables, you'll probably be very familiar with the data. Consequently, associations are more obvious at this point than when you started. Your database system relies on matching values found in both tables to form relationships. When a match is found, the system pulls the data from both tables to create a virtual record. For instance, you might want to see all the books written by a particular author. In this case, the system would match values between the Books and the Authors tables. It's important to remember that most of the time, the resulting record is dynamic, which means any change made to the virtual record will usually work its way back to the underlying table.

Those matching values are the primary and foreign key values. (The relational model doesn't require that a relationship be based on a primary key. You can use any candidate key in the table, but using the primary key is the accepted standard.) You learned about primary keys in Part 2—a primary key uniquely identifies each record in a table. A foreign key is, simply put, one table's primary key in another table. As such, there's not much for you to do—simply add the primary key field to the related table, as a foreign key.

The only consideration is that a foreign key field must be of the same data type as the primary key. Some systems allow one exception to this rule and will allow a relationship between a number and an autonumbering field (such as AutoNumber in Access of Identity in SQL Server). In addition, foreign key values can be Null, although the recommendation is that you not leave a foreign key Null without a very specific reason to do so. Chances are you may never work with a database that requires this capability.

Return to your sample tables and start entering foreign keys as appropriate. (Continue to work with paper lists—it's still too early to actually create tables in your database system. It's much easier to correct a mistake on paper.) Remember, you're adding the primary key value to a related table. Simply recall the relationships between the entities, and the rest is easy:
  • Books are related to categories.
  • Books are related to publishers.
  • Books are related to authors.
  • Authors are related to ZIP codes.
  • ZIP codes are related to cities.
  • Cities are related to states.

This particular step isn't written in stone, and you may find it easier to add foreign keys during the normalization process. When you move a field to a new table, you'll probably add that new table's primary key to the original table as a foreign key. However, often the foreign keys change as you continue to normalize the remaining data. You may find it more efficient to do them all at once after all the tables are completely normalized.

Let's work through each of the tables one at a time, beginning with the Books table, which at this point has just three fields. Specifically, add the primary keys from the Authors, Categories, and Publishers tables to Books. When you've finished, the Books table has seven fields:
Books        
Title         (PK)
ISBN          (PK)
Price
FirstNameFK   (FK)       Authors.FirstName         many-to-many
LastNameFK    (FK)       Authors.LastName          many-to-many
CategoryFK    (FK)       Categories.Category        many-to-many
PublisherFK   (FK)       Publishers.Publisher       one-to-many

Remember, the primary key in the Authors table is a complex key based on both the first and last name fields. Therefore, you must add both fields to the Books table. Notice that the foreign key field names include a FK suffix. Adding the suffix improves readability and is self-documenting. You may find it easier to track foreign keys if you identify them this way in their name. It's all right if the primary and foreign keys don't have the same name.

There are three relationships present: Books to Authors, Books to Categories, and Books to Publishers. What might not be as apparent to you are the problems with two of these relationships:
  • Books to Authors: A book can have more than one author.
  • Books to Categories: A book can have more than one category.

These two relationships represent many-to-many relationships. Earlier, we told you that tables can't directly accommodate these relationships and require a third linking table. (The Books to Publishers relationship is a one-to-many and is fine just as it's currently stated.)

Both of the newly discovered many-to-many relationships will require a linking table that contains the primary key from each table, as a foreign key. The new linking tables are:
BooksAuthorsmmlink
TitleFK (FK) Books.Title one-to-many
ISBNFK (FK) Books.ISBN one-to-many
FirstNameFK (FK) Authors.FirstName one-to-many
LastNameFK (FK) Authors.LastName one-to-many
 
BooksCategoriesmmlink
TitleFK (FK) Books.Title one-to-many
ISBNFK (FK) Books.ISBN one-to-many
CategoryFK (FK) Categories.Category one-to-many

No changes to the Categories, Authors, or Publishers tables are necessary. However, you must remove the FirstNameFK, LastNameFK, and CategoryFK foreign keys from Books:
Books
Title (PK)
ISBN (PK)
Price
PublisherFK (FK) Publishers.Publisher one-to-many

Now, let's move on to the Authors table, which currently has two fields. Each author relates to a ZIP code value in the ZIPCodes table. However, each ZIP code may relate to more than one author. To accommodate this one-to-many relationship, enter the primary key from the ZIPCodes table into the Authors table as a foreign key:
Authors
FirstName (PK)
LastName (PK)
ZIPCodeFK (FK) ZIPCodes.ZIPCode one-to-many

At this point, you're ready to tackle the remaining address components. It's strange to see them separated into tables, but that's the result of properly normalizing the data through BCNF. Each ZIP code value will have one corresponding city and state value. Each city and state value will be entered just once in its corresponding table. The ZIPCodes and Cities tables require foreign key fields to accommodate the relationships:
ZIPCodes
ZIPCode (PK)
CityFK (FK) Cities.City one-to-many
 
Cities
City (PK)
StateFK (FK) States.State one-to-many
 
States
State (PK)

From one to nine
In the end, you have nine tables: Books, Authors, Categories, Publishers, ZIPCodes, Cities, States, BooksAuthorsmmlink, and BooksCategoriesmmlink. Figure A shows a graphical representation of the sample Books database, at the end of this process. It's hard to imagine that one simple table of data can divide into nine.

Figure A
The original table now requires nine tables.


As a result of the sample's simplicity, you might be wondering just how this relational business helps. It seems that you're still storing redundant data, just differently—in the form of foreign keys. That's because our tables have only a few fields right now. Try to imagine a table with a dozen fields. Granted, you still have to store that table's primary key as a foreign key value in a related table, but that probably constitutes one or two extra fields at most. Compare that to the alternative of adding all dozen entries in that table for every record.

About

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

1 comments
stntn
stntn

Susan,

Thank you for this helpful article.

In a one-to-many relationship in Access, it seems that normally you are supposed to match the primary key in the "one" table with the foreign key in the "many" table.  When does it make sense to do this?  Conversely, when does it make sense to match a field other than the primary key in the "one" table with the foreign key in the "many" table?

Thanks for any help you can provide.