Developers consider many factors during the design phase—how to normalize the data, how to create an easy-to-use interface, and so on. Without a doubt, one of the developer's most important goals is to guarantee the accuracy of the data. Let's discuss how to use referential integrity to protect a database from incorrect and missing data.
Who needs it?
After tackling normalization, primary keys, and relationships, you might not think about the importance of referential integrity. Suppose you maintain a database, a critical component to business success, that stores product and order data. Now, let's suppose you use natural data for primary keys, and you need to update a particular customer's primary key value. Later, you receive an angry customer call because the invoice is wrong or because a promised delivery didn't appear. Or, perhaps the customer gets the order, but never receives an invoice. You might not find out about that last mistake until the system is audited—and even then, it may still go undetected.
What happened? When you changed the customer's primary key value, you neglected to update the foreign key values for that customer's related records. You created orphans—foreign key records that don't match a primary key value. In other words, there is no way for the system to match order details to the customer.
Without referential integrity, you can enter unmatched foreign key values. The relational model doesn't allow these orphan records. Even though your database system may allow you to create orphans, it is the developer's job to prevent them. The easiest way to prevent orphans is to implement referential integrity.
What is referential integrity?
The relational model enforces referential integrity. That means you must enable your system's referential integrity feature to meet relational rules. Often, it is as simple as just enabling referential integrity between tables, but you need to understand the consequences of doing so (or not doing so). Technically, integrity rules maintain relationships and, therefore, the data. There are three types of integrity:
- Entity—You must uniquely identify each record.
- Referential—Each foreign key value must have a matching primary key value in a related table (or be Null).
- Business—These are rules that are specific to the business and unrelated to relational database theory.
With referential integrity disabled, you can enter data at any time, as long as the data doesn't violate other table or field properties such as validation rules and data types. In contrast, referential integrity enforces rules that limit when you can modify, add, or delete data. With referential integrity enabled, you cannot enter a new foreign key value until a matching primary key value exists in the related table. You cannot change a primary key value if there is a matching foreign key value in a related table. And, you cannot delete a primary key value if there are matching foreign key values in a related table. To delete primary key records under these conditions would create orphans.
Deciding when to enable referential integrity
Enable referential integrity in every database unless you have a specific reason not to. There are certain conditions that must be met before you can do so:
- The relationship for which you are enforcing referential integrity must be based on a primary key or a unique index.
- You can't enforce referential integrity between tables in two different databases. The tables must be in the same database.
- The relationship must be based on fields of the same data type. (Some systems will honor a relationship between an autonumbering field and a number field.)
Referential integrity is one of those features you need to experience to fully grasp. Using a books database, use your system to enable referential integrity between the Books and Publishers tables. Figure A shows the dialog box and the Relationships window in Microsoft Access. (You'll need to review your documentation to learn how to enable referential integrity in your system).
|Enable referential integrity between the Books and Publishers tables.|
Once you've enabled referential integrity between the two tables, you'll find data entry more restricted:
- You can't enter new foreign key values in Books.PublisherFK until you first enter the new value in Publishers.Publisher as a primary key value.
- You can't change a value in Publishers.Publisher (the primary key) if a matching value exists in Books.PublisherFK (the foreign key).
- You can't delete a value from Publishers.Publisher (the primary key) if a matching value exists in Books.PublisherFK (the foreign key).
Let's see what happens when you try to enter, delete, and change data that violates referential integrity between these two tables. (My examples will use Microsoft Access). First, open the Books table and enter the record for Nee Nee's Truck, as shown in Figure B. Trying to save the record generates an error, because RabbitPress doesn't exist as a primary key value in the Publishers table. You must enter a record for RabbitPress in the Publishers table before referential integrity accepts RabbitPress as a foreign key value.
|The system rejects RabbitPress as a foreign key.|
Next, try to change a primary key value. Specifically, open the Publishers table, select the Apress record (or Que or Sybex), and try to delete the record. Doing so generates an error, as shown in Figure C. Your system won't allow you to delete the record, because the Books table contains matching foreign key values. You can delete the record for O'Reilly, because there are no records in the Books table using O'Reilly as a foreign key.
|You cannot delete the Apress record|
Try to change Apress to RabbitPress to generate the previous error again. Remember, you cannot change the primary key value either, as long as there are matching foreign key values existing in the related table. On the other hand, you can change O'Reilly to RabbitPress.
Enabling cascade options
With referential integrity enforced, you may encounter trouble if you must change or delete a primary key value. You can disable referential integrity while making changes, or you can enable a cascade option. (Not all systems support cascade options; check your system's documentation to be sure.)
Enabling a cascade update option allows you to change a primary key value where matching foreign key values exist. The system simply updates all matching foreign key values accordingly. Let's examine a quick Microsoft Access example. Turn on the cascade update option between the Books and Publishers tables, as shown in Figure D (check your system documentation for instructions on enabling this option).
|Enable the cascade update option.|
Next, open the Publishers table and try to change Apress to RabbitPress. This time, the system accepts the change, instead of returning an error as before (as in Figure C). However, more has occurred than you realize. Open the Books table and check the PublisherFK values. As you can see in Figure E, the system accepted RabbitPress and, subsequently, changed all matching foreign key values from Apress to RabbitPress. You can change a primary key value and not create orphans.
|The cascade update option changed all the matching foreign key values.|
If the system doesn't allow you to edit an autonumbering data type, and a primary key is based on such a field, the cascade update option is useless. If you cannot change the primary key value, there is no conflict to resolve.
You can't delete a primary key when matching foreign key values exist in a related table. Enabling a cascade delete option lets you delete a primary key value. To do so, the system deletes foreign key records to avoid orphans.
For this next example, you must enable the cascade delete option (as in Figure D). Next, open the Publishers table and try to delete the first record (Apress). The system will probably display a warning message similar to the one shown in Figure F.
|Watch for a warning message when you delete a primary key value.|
Finally, open the Books table to view the results of deleting Apress from the Publishers table. The matching foreign key—the entire record, not just the foreign key value—was deleted, as you can see in Figure G. Look closely, and you'll notice the SQL: Access to SQL Server record is missing.
|The cascade delete option deletes matching foreign key records.|
The ramifications of using either cascade option, especially the delete option, can be devastating if used incorrectly. To protect your data, do not enable either option permanently. Instead, turn on an option when needed and be sure to turn it off when you're done.
The end of the line
Relational database theory can be intimidating, but once you understand the rules, you'll find them easy to implement—especially with some experience. In the end, there's nothing that beats a good design. A properly designed relational database withstands time and growth. Conversely, a poorly designed database is difficult to use and modify and could even harbor erroneous data. Don't guess with your design. Go through the design process step-by-step.
Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.