Any discussion on primary keys is bound to stir up a debate. Some developers insist that a primary key’s only purpose is to uniquely identify records. Others believe that as a part of the relational model, its purpose includes relating records. However, it's difficult to stir up this same kind of passion when you discuss foreign keys, because that key’s purpose seems to be so clear. The commonly accepted definition of a foreign key is when one table’s primary key occurs in another table. But is this really accurate?
Here, I will refute the established definition of the foreign key as a primary key in a related table, and in doing so, challenge the primary key’s overall importance in the relational model.
Take a glance at the primary key debate
A previous article examined the issues related to defining and using primary keys in a database design.
Defining the keys
Although widely accepted, the above definition of a foreign key isn’t technically correct. A foreign key is a column or set of columns in one table that matches a candidate key in another table. Some developers interpret this to mean a foreign key is simply an indexed column(s) in one table transferred to a related table. But this completely removes the primary key from the definition.
To clear up any confusion, let’s define the other key terms before we continue:
- A superkey is any column or set of columns that uniquely identifies each record in a table. Not every superkey is a good candidate key.
- A candidate key is a superkey containing the minimum number of columns to uniquely identify each record in a table. Not every candidate key is a good primary key.
- A primary key is the candidate key used to uniquely identify each record in a table.
- An alternate key is a candidate key not chosen as the primary key.
After reviewing these key terms, you can most accurately define a foreign key as the minimum number of columns necessary to uniquely identify each record in one table that occurs in a related table. By definition, a foreign key can be a primary key, but it isn’t required.
Beyond the definition is purpose
The purpose of a foreign key seems clear. A foreign key’s purpose is to:
- Establish a relationship between tables.
- Enforce data integrity rules that protect your data.
- Enable automatic updates.
If there is no relationship with another table, there is no need for a foreign key. You can turn to the relational model for clarification, but you won’t find it. Although relational database theory requires that each normalized table include a key that uniquely identifies each record, the model doesn’t require that the key be a primary key—only that there is a key. Some believe that the rule implies that the required key be a primary key. Consequently, these people identify the candidate keys and choose a primary key from the candidates, even though the relational model allows you to use nonprimary key values as a foreign key.
Despite the foreign key’s real definition, you are still taught to choose the best candidate key, promote it to primary key, and use the primary key as the related table’s foreign key. Perhaps this is a result of a learned habit that is regurgitated without scrutiny. Consider the following:
- The purpose of a unique identifier is to establish a relationship that protects the validity of your data, but you don’t need a primary key to do so.
- It makes no difference which candidate key you choose as the foreign key. If the candidate key (or keys) uniquely identifies each record, the model and your system both work.
Thus, defining the foreign key seems to have rendered the primary key irrelevant, despite its position in the relational model. As long as the key preserves the data’s integrity, building relationships on a candidate key seems to be just as valid as the long-accepted practice of using a primary key as the foreign key in related tables.
Will the real key please stand up?
By now, most of you are probably balking, and I don’t blame you. Relational theory requires that every table have a key and, as such, you’ve been filling your tables with primary keys even when you don’t use them, just in case. All this theorizing about just what a foreign key is or isn’t begs the question: Why even use a primary key?
Under the microscope, the primary key no longer appears to be the vital component of the relational model because:
- Candidate keys uniquely identify records.
- The system correctly uses the candidate key to handle relationships whether or not you promote the candidate key to the level of primary key.
Regardless of how you define it, the foreign key is a powerful key. I might even go so far as to say the foreign key is more important than the primary key. After all, a primary key isn’t required to establish and use a relationship. In contrast, without a foreign key there is no relationship between data items. You can set primary keys in every table, but until you transfer those values to another table, there’s no relationship. Further proof of the foreign key’s superiority over the primary key is this: The foreign key is mandatory; the primary key isn’t.
The reality is that the relational system you use may influence which column you choose as the table’s key more than the relational model. Many developers totally ignore the data and use a system-generated sequential value as primary key. In this case, the system dictates the key, not the data or the developer. The sequential values create a meaningless candidate key and, as with a natural candidate key (a key based on data), you don’t have to define the key as a primary key. The model supports relationships, referential integrity, and updates regardless of whether you define the field as a primary key.
A different view
This analysis flies in the face of all you’ve been taught, and I admit that seriously considering the elimination of primary keys from your tables might make you a bit queasy. I'm not calling for a relational revolution. On the other hand, the industry seems to be leaving the relational world behind anyway. In another decade or two, the foreign key may be as irrelevant as the primary key seems to be now.