Data Management

Intelligent keys aren't that dumb

Intelligent keys are controversial in database design, yet users often demand them. Learn more about intelligent keys, how and when to use them, and how to keep your users happy at the same time.

Most data analysts proclaim that intelligent keys should never be used in a database. There are good reasons for this. Unfortunately, database designers get confused when talking with users and customers and end up reinterpreting this rule to mean you can’t give users intelligent keys at all. In this article, I explore the intelligent key concept.

What is an intelligent key?
A key is a field (or column) used to identify a specific record in a database. An intelligent key is a key that also contains information about the record it is the key for. A fairly common one is an invoice number. It is a key because it uniquely distinguishes one invoice from another. Some are just meaningless numbers (the best kind), but many invoice numbers are intelligent. They include data such as a date or customer ZIP code.

Here is an example of an intelligent invoice number:


The first eight digits (20010816) are the date the order was received (YYYYMMDD). This is followed by the state code, in this case Colorado (CO). Next is a four-digit customer number (1743) followed by a code for how the order was paid (C for credit card). The numeral 1, signifying the first order for that customer on that day, follows this. The company assumed that a customer could have only nine orders in one day, which may be a valid assumption for a mail-order company. There are several other assumptions and data limitations in that key.

Every record in a database must have a primary key
A primary key makes every record in a database unique from every other record. Some designers use a certain field, such as "employee number" or "phone number." Others use combinations of fields, such as address (street + city + state) or name (title + first + MI + last + suffix). Neither of these options is very good. You may want to make them unique keys and let the database enforce their distinctiveness, but the main key—the primary key—should never contain data entered by users.

Anything entered by hand can be entered incorrectly
You should never change, update, or modify a primary key. Too many other pieces of the database depend on it. Good designers always use an irrelevant number for a primary key. Using anything that is entered by users means you must allow for the possibility of fixing errors in data entry. Updating a primary key correctly is time-consuming and dangerous. You could end up with unlinked records in your database.

Users always want an intelligent key in their databases
It's the first thing users request. Intelligent keys are what they have been using all along as their database. The forklift drivers in the warehouse can communicate their needs to the order takers in the office and the truck drivers on the loading docks. If there is a concern about a shipment going to New York or certain orders taken on a specific day, they use the invoice number to talk among themselves and solve the problem.

When users say the invoice number must be in the database, designers immediately launch into all the modeling reasons why the users can't have it. The users get angry and think the designers are out-of-touch geeks and won't have anything to do with them or "their" database.

Patient designers avoid this problem
Patient designers don't tell users the invoice number won't be in the system until the users feel comfortable that their important data is being collected. A designer should just ask what every single piece of the key means. As you work with the users, together you can split the data into the individual elements. Stick different pieces into different tables. A good database design will output whatever the users need. It’s one definition of good design. If the users need an invoice number to print on a piece of paper for a warehouse worker, then give it to them. However, the database ought to handle all the hidden data according to the rules of normalization.

On one system I worked on, the most important thing was a "Project." Rather than argue with the users about how that wasn't really an entity, we put in a Project entity. We even filled it up with important attributes. Then we split off the attributes into separate tables, created relationships, and looked at other aspects of the business. After a couple of days, the users were the ones who realized the Project entity was irrelevant; all their data was contained in other entities, and a “Project” was only a collection of records that didn't really need a designation.

Work with the users
If you hold your tongue the first time users ask what is for them a reasonable request, things will work a lot better in the end. You'll get the information you need, and the users will become comfortable with your knowledge. When you start showing how the pieces are split up to make data entry faster and more accurate, they'll appreciate your expertise. When you finally discard their precious intelligent key, they will recognize it as an intelligent decision.

Editor's Picks