Data Management

Five common database development mistakes

Here are some of the most common gaffes Justin James sees in database development, along with advice on what to do to not make them.

I have been spending a lot of time working in databases designed and developed by other people. Some of these databases have been good, while others have had real issues, which has me thinking about the most common mistakes I see in database development. Here are five common mistakes, and what to do to not make them.

1: Primary key abuse

The worst database development mistake is developers who have no idea how a primary key should be used. I will make this as plain as possible: A primary key value should have nothing to do with the data in the row. Do not use application data for primary keys. Do not use concatenations or calculations of application data for primary keys. Do not use values that have any meaning for primary keys. Primary keys should be sequentially or randomly generated by the database upon row insertion and never changed except under the most unusual of circumstances. Primary keys need to be system values that the system manages so that you can do things like move the data to another system with safety, or change the underlying data without mangling relationships.

2: Improper normalization

When it comes to organizing data, I see the same mistakes in database design as I see in object design: Some developers like to turn everything into a relationship, and other developers never turn anything into a relationship. There is a middle ground. Unless you are in a special scenario (like working with a data warehouse, which has its own unique needs), here are some guidelines for knowing when data should or should not be split out into a related table:

  • If the data will be shared among multiple rows and a change in one should affect all of the other rows, move it to another table. Customers of a catalog should be split off from their orders, for example.
  • If the data will be shared among multiple rows, but a change in one should not affect all of the other rows, keep it in the same table. Alternatively, use a data versioning scheme so the rows related to the data before the change keep their current values, and the changed data goes into a new row and the key is updated for the parent row.
  • If the data can be updated or used separately from the rest of the row, putting it into a separate table can make good logical sense and provide advantages for data integrity if you can lock on rows but not columns.

3: Stored procedure overuse

Stored procedures are useful, but for most modern applications (not all), using them as often as was done in the past is overkill. Modern ORMs and other techniques make stored procedures a lot less necessary, and are much more nimble in the development process. More importantly, stored procedures are a maintenance disaster. There is no good or easy way to audit what applications are definitively using them, which means that any major changes usually mean that you write another stored procedure instead of altering the existing one, creating problems knowing which one to use. The next time you consider writing a stored procedure, make sure it is truly justified.

4: Not having a primary key

For every developer who uses the wrong types of data for primary keys, there seems to be a developer who does not believe in them. I cannot tell you how many times I have seen a design where I have been told, "well, just join the tables on these five columns and it should be unique." And of course, not too long after deployment, you find out there is just this one special case where they are not unique. This situation is also a big contributor to the proliferation of stored procedures and primary views, as subsequent developers struggle to mask the database schema mistakes from the view of their applications.

5: Hard deletes

There are times when a hard delete of data makes sense, but in my experience, those times are much less common than the number of times you will be restoring the database to a separate server or trolling through transaction logs in order to retrieve data that was deleted by mistake, or to audit a problem with the application. I almost always use soft deletes (marking a row as inactive) in my applications instead; the ORM in OutSystems Agile Platform is smart enough to automatically filter on it by default. If I weren't using Agile Platform, I would edit my ORM entities to filter on the soft deletes.

J.Ja

Keep your engineering skills up to date by signing up for TechRepublic's free Software Engineer newsletter, delivered each Tuesday.

About

Justin James is the Lead Architect for Conigent.

Editor's Picks