Data Management

Five common database development mistakes

Here are some of the most common gaffes Justin James sees in database development, along with advice on what to do to not make them.

I have been spending a lot of time working in databases designed and developed by other people. Some of these databases have been good, while others have had real issues, which has me thinking about the most common mistakes I see in database development. Here are five common mistakes, and what to do to not make them.

1: Primary key abuse

The worst database development mistake is developers who have no idea how a primary key should be used. I will make this as plain as possible: A primary key value should have nothing to do with the data in the row. Do not use application data for primary keys. Do not use concatenations or calculations of application data for primary keys. Do not use values that have any meaning for primary keys. Primary keys should be sequentially or randomly generated by the database upon row insertion and never changed except under the most unusual of circumstances. Primary keys need to be system values that the system manages so that you can do things like move the data to another system with safety, or change the underlying data without mangling relationships.

2: Improper normalization

When it comes to organizing data, I see the same mistakes in database design as I see in object design: Some developers like to turn everything into a relationship, and other developers never turn anything into a relationship. There is a middle ground. Unless you are in a special scenario (like working with a data warehouse, which has its own unique needs), here are some guidelines for knowing when data should or should not be split out into a related table:

  • If the data will be shared among multiple rows and a change in one should affect all of the other rows, move it to another table. Customers of a catalog should be split off from their orders, for example.
  • If the data will be shared among multiple rows, but a change in one should not affect all of the other rows, keep it in the same table. Alternatively, use a data versioning scheme so the rows related to the data before the change keep their current values, and the changed data goes into a new row and the key is updated for the parent row.
  • If the data can be updated or used separately from the rest of the row, putting it into a separate table can make good logical sense and provide advantages for data integrity if you can lock on rows but not columns.

3: Stored procedure overuse

Stored procedures are useful, but for most modern applications (not all), using them as often as was done in the past is overkill. Modern ORMs and other techniques make stored procedures a lot less necessary, and are much more nimble in the development process. More importantly, stored procedures are a maintenance disaster. There is no good or easy way to audit what applications are definitively using them, which means that any major changes usually mean that you write another stored procedure instead of altering the existing one, creating problems knowing which one to use. The next time you consider writing a stored procedure, make sure it is truly justified.

4: Not having a primary key

For every developer who uses the wrong types of data for primary keys, there seems to be a developer who does not believe in them. I cannot tell you how many times I have seen a design where I have been told, "well, just join the tables on these five columns and it should be unique." And of course, not too long after deployment, you find out there is just this one special case where they are not unique. This situation is also a big contributor to the proliferation of stored procedures and primary views, as subsequent developers struggle to mask the database schema mistakes from the view of their applications.

5: Hard deletes

There are times when a hard delete of data makes sense, but in my experience, those times are much less common than the number of times you will be restoring the database to a separate server or trolling through transaction logs in order to retrieve data that was deleted by mistake, or to audit a problem with the application. I almost always use soft deletes (marking a row as inactive) in my applications instead; the ORM in OutSystems Agile Platform is smart enough to automatically filter on it by default. If I weren't using Agile Platform, I would edit my ORM entities to filter on the soft deletes.

J.Ja

Keep your engineering skills up to date by signing up for TechRepublic's free Software Engineer newsletter, delivered each Tuesday.

About

Justin James is the Lead Architect for Conigent.

41 comments
Jaqui
Jaqui

really? how about a situation where the object oriented approach is the wrong way to go? a friend has developed an application that is currently being used in a few medical clinics locally, methadone maintenance program only clinics. he has found that a procedural approach dramatically improves performance, and reduces the code complexity for the app. and due to original client requirement, the db engine this thing uses is access. :/ interestingly enough, the "triplicate" prescription requirements of methadone are lower in security focus than the medical records are. :/ methadone, like a few other medications, requires a "triplicate" prescription form, which is a carbon copy of a prescription, on file in the patient's chart and available for reference if any questions are asked about that patient's treatment, by the authorities with legal access to the records.

alvalongo
alvalongo

I disagree with point 3 "Stored procedure overuse". It seems that Justin doesn't use Oracle. Store procedure it's all about in Oracle; they easy develop APIs and TAPI so the logic of bussinnes and others duties on database are common to any application who wants to use the database.

pauldav221
pauldav221

I can only say that because so many development environments do not use competent, experienced database developers, or do not have enough of them to review the design and coding of databases, that I have had a wonderful, well-paid career fixing many database environments, for many companies over the years. It is a never-ending problem for companies and one that has assured that I have steady, challenging employment. ? Because I want non-SQL developers to be properly educated, I see this response to item 3 (Stored Procedure Overuse) as an opportunity to educate others about why stored procedures are widely used and strongly recommended instead of non-stored procedure SQL. 3: Stored procedure overuse – my refutation From the email link: "???Stored procedures are useful, but for most modern applications (not all), using them as often as was done in the past is overkill. Modern ORMs and other techniques make stored procedures a lot less necessary, and are much more nimble in the development process. More importantly, stored procedures are a maintenance disaster. There is no good or easy way to audit what applications are definitively using them, which means that any major changes usually mean that you write another stored procedure instead of altering the existing one, creating problems knowing which one to use. The next time you consider writing a stored procedure, make sure it is truly justified." This argument often surfaces from developers who do not understand the complexity of modern database engines. By gaining an understanding of the benefits of using stored procedures, I hope that readers find many more reasons to doubt future articles that question the use of procs. I'll quote the words used in the emailed article link: ??? "Modern ORMs and other techniques make stored procedures a lot less necessary, and are much more nimble in the development process." Seriously, the statement above is so wrong that it always astounds me when I see it repeated here and there by developers who really do not understand the purpose and power of modern database engines. I'll try to keep this refutation as short and direct as possible. Stored procedures are precompiled by modern databases. That allows the database engine to generate and cache execution plans for evaluation by an optimization engine. Optimization engines are incredibly sophisticated and get more powerful with each release. They evaluate access and allocation of data memory, index memory, temp storage memory, cpu usage, table selectivity statistics, and other important criteria in order to provide an execution plan that is at least "good enough". "Good enough" means that after generating hundreds or thousands of possible execution plans, it chooses one or two that it thinks will be the best. Precompiled, properly written stored procedures allow reuse of cached execution plans. Without stored procedures, the optimizer is simply not nearly as fast or as smart. If the optimizer has to waste time doing something that it doesn't need to do, that is inefficient and not smart. With development of modern databases and use of parallel processing, more speed and sophistication are critical. Parallel processing is being used in many more processes, "under the covers" in modern database engines. If multiple processors must generate an execution plan for dynamically generated database code, then that is more wasted effort. Another major purpose of stored procedures is to keep developers from rewriting the same code over and over. Using stored procedures, it is extremely easy and very fast to search for stored procedures that use any database object (table columns, functions, assemblies, etc), even if one is not a SQL programmer. At least it is in SQL Server 2005 and later. I do this all the time, especially because I'm relatively new to the company and I am always searching for appropriate code. As a contractor at various times in my career, it has been critical for me to be able to find and use, or compare, or change existing code. SQL, the language of non-assembly stored procedures, is "set-based", not row-based. Set-based processing is capable of vastly more powerful data manipulation than row-based, non-SQL processes. And before I move on, yes, there are also stored procedures that are written in other languages and packaged as DLL assemblies, with SQL proc wrappers. They exist because of the greater benefits some other languages provide when required to do intense manipulation of character data strings. They are not used for the vast majority of data manipulation, string or otherwise, for good reason. And yes, there are non-SQL databases that are optimized to handle vast quantities of unstructured text data. They are used for specialized situations in "big-data" shops like Amazon and Google. However, for most situations, for most companies, SQL procs are the preferred tool. Stored procedures are the place to put business rules pertaining to the data (except for simple, front-end, data validation), not middle-tier apps. Middle-tier apps should use stored procedures to manage processing. Middle-tier and front end processes should be used to provide feedback in interactive applications, to users. The database is all about making it easy and fast to intelligently access and reuse common processes without causing the server to waste time essentially redoing the same things with each similar request. And when business changes, and it always does, it is easy to find and change applicable code in the database, regardless of what applications call that code. Ideally, stored procedures should be managed in a source control environment. That adds even more value because then one does not need to look at individual servers to find or review code. And one can quickly compare versions and rollback code if necessary. Source control environments usually provide great search tools too. ???"More importantly, stored procedures are a maintenance disaster. There is no good or easy way to audit what applications are definitively using them, which means that any major changes usually mean that you write another stored procedure instead of altering the existing one, creating problems knowing which one to use." This is stunningly wrong. The phrase "maintenance disaster" when combined with "stored procedures" is very clearly indicative of a development environment that is composed of very bad programming, lack of source control, poor application development management, or all of the above. In no case, should one ever feel that stored procedures are a "maintenance disaster" in a professional development environment. If one does, truly, one or more things are seriously wrong and require fixing with more training, proper tools, and education. Next, I must point out that "???what applications are using them" hints at what the real problem may be in some environments. If a development environment is properly managed, one does not worry about which applications are using what stored procedures. If the logic in the stored procedures is valid and non-redundant, you have no worries, because whatever apps are using them are obviously using the correct logic and that is a major point. It simply does not matter what apps are using what stored procedures if the stored procedure code is correct. In fact, if one knows that all one needs to do is make sure the stored procedures are correct, one does not need to worry about finding and changing each and every app that refers to the stored procedures. If the stored procedures are not correct, you can change them in that one place and all the apps using it will now be correct, instead of having to change every application. The person who maintains an application, in my vast experience, has never had a problem identifying what stored procedures are being used by the app. A stored procedure is either identified in the code or it is not being used in that code. When stored procedures call other stored procedures, that can appear to be a problem until one realizes that one should not be correcting stored procedures app by app. One should search directly for logic in only the stored procedures and correct it there. Doing this, one does not need to hunt down what apps are using what stored procedures. If one must know this for some reason, it is easy enough to run a simple trace to capture what stored procedures are being called by what apps, but really, that is usually done only when a bad development environment is being fixed so that it is properly managed and is not necessary after fixing the development process. As far as auditing goes, whether one is talking about code audits or data audits, professional auditors practically cry when they must look outside the database for things that should be coded in stored procedures in the database, because they know that means that each programmer has the discretion of creating dozens or more of redundant and slightly variant versions of code meant to accomplish the same task. If developers are coding new stored procedures to do something that is already in an existing stored procedure, then that probably means they either don't understand the stored procedure code or they don't know how to find or refactor existing stored procedures to split out logic parts into multiple sub procedures that allow for wider code reuse. As with any non-SQL development, the same rule applies – do not change an interface to a stored procedure to avoid causing problems to applications that are using it. That does not mean that one can't refactor the code to allow use of part of the code, but not all of the code, when it makes sense to do so. There are many other reasons that justify using stored procedures that I don't have time to mention including security and use of parameter data types. Many other knowledgeable people have written about this subject. Unfortunately, many under-educated, non-SQL developers have written misinformation, so all I can say is "good luck" with that. It is just more redevelopment work for people like me to fix things later. Additionally, at least with SQL Server and the Microsoft tools stack, it is easy to allow object-oriented programmers to develop interfaces with the database using stored procedures in an object-oriented context using tools like "Link_to_SQL". I'm sure other environments must have these tools which make it easy for someone with an object-orientation background to make use of database objects that already exist.

wheres_my_stuff
wheres_my_stuff

I only don't feel comfortable with the questioning about stored procedure overuse. We understand that it is a good practice to code at database level to mediate access and ensure data integrity. We've been doing that extensively for years with large application and we have found no problems.

gemidune
gemidune

My opinions: Data architecture is a science. The study of Set Theory and Relational Calculus is not a trivial undertaking. Is it no more reasonable to expect to be an expert data architect without study and experience as it is to be an expert software developer. In my experience, the talents and skills of architecture and development rarely overlap. Overall, I appreciate the article, this subject receives too little attention. Item 1. Natural keys can in some cases be used, however, it is generally much more straightforward and certain to continue working if surrogate keys are employed. Item 2. Normalization is a science, not magic. Look up Functional Dependency, etc. Item 3. Stored Procedures: If the architect has established only one mechanism to create and modify a specific set of data, there is generally little confusion about operations on that data set. Goes along with the Principle of Least Privilege. Item 4. Yikes! Don't remind me how many jobs I've had fixing the no primary key issue! A prime example of how a great developer may not be a capable architect. Item 5: Deletes? Not if you will ever audit the system, or need historical data.

mac
mac

At first, what you say makes good sense, but, if you want to look up an individual's data, say an employee, would you say "What's your primary key?". I learned that primary keys were supposed to allow quick retrieval of data. I agree with the post about natural and artificial keys. There are times when one or the other are appropriate.

george
george

I have run into this far more often in production systems than I care to count. Why is this report taking un-Godly long to run; because the database was never index correctly. When you ask questions of your data please index the "where" portion when practical.

m_ruse
m_ruse

It is true that databases are smarter and the performance advantages of stored procedures are not always as clear cut as it used to be; however there is one very clear advantage to stored proceudres over embedded SQL and that is they can be changed without a major release of software. This makes it possible to resolve a performance issue or even add some functionality without having to release any "software". And there is a very easy way to find out what's using a stored procedure; it's called a header added to the stored procedure. Also stored procedures are much more likely to be re-used; embedded SQL will just be copied or worse re-written (often in a slightly different way) and used throughout an application. A change in functionality then requires multiple changes throughout an application and greatly increasing the risk of bugs appearing in the system.

just.a.guy
just.a.guy

I agree with all that you say. To me there seems to be 3 classes of databases (systems). 1) the real time operational systems These systems only store state data for a system at the current point n time. 2) Data warehouse systems, These system only hold old data and are used to mine for trends, etc.. 3) Effective dated system. Due to auditing/validation requirements a lot of our systems today fall into this category. These systems need to keep both "history" and "current" data. There are two basic class for these types of system. Either the history data is kept with the current data or the history data is kept in a separate set of tables/files (e.g. a "log" table/file - either application or DBMS provided). A design consideration is to keep the history data with the current data in the same table or in a separate set of tables. Keeping history and current data in physically separate elements (either application or DBMS system provided) results in the need for processes to inventory, track, verify, the two sets of data. to prove that the system data is valid. Keeping the two classes of data in the same table data eliminates the need to prove that external history data is valid for the current separate table data. The separate table/file method is the easiest to implement from a programming view point. Using this implementation moves the design outside of the database. Keeping the history and current data in the same physical table is a major change in the design of the environment. In this situation, a "row" is replaced by a "version set" of rows, of which only one is the current value. One method to achieve this "version set" is use a "change date", column/field. Each row in the version set has a different "change date" value. The date value can be used to order the individual rows in history order. At least one system that I know, has used this concept to its fullest extent. This system uses only the "change date" to identify its data. The change date value is allowed to range over all possible values (past, current, and future). It is/was used n a HR system, where the HR staff would schedule raises, etc. before they actually occur. Then when the future time becomes current, the raise automatically occurs. This idea carried out across an entire system multiplies the complexity of of the SQL statements, and results (in some situations involving mutiple table joins) in long execution times, and very large quantities of data to be processed for a relatively small result set. If "future" events can eliminated from the version set, then things become easier. By eliminating FUTURE events you need a mechanism to separate the history rows from the current rows within the version set. For this you suggest using a "delete" flag. In this environment, the design consideration then moves to the delete flag and its effect on the "primary key"; its use in indexes and relationships. The initial use of a soft "delete" flag, only used "two" value "deleted" or not "deleted". To continue to use this type of delete flag, requires that a new primary key be obtained for each new version of the row in a version set. You cannot have two rows in the same table with the same primary key. I suggest changing a "soft delete" flag to a "soft active" flag. The use of the "flag" is to identify the active row out of all of the rows in the version set. Each row in all of the versions sets that make up the table has the same value. And this same value applies to all of the tables that make up the application system. When version of the row becomes apart of the history, its active flag is set to a value other than the "active" value, and a new version is inserted with active status value The value of the previous active is set to different from all of the other values for the row in the version set. There are many ways to choose how these other non-active values are determined. (time, sequence number, etc..). The result of this design is that the primary key becomes a union (concatenation), of the original primary key and the "active" flag. Implementing this type of design becomes simple. The tables are accessed through a view that has a "where clause" contains a condition on active flag column for the unique "active value". Concerning stored procedures, use views, triggers and stored procedures to effectively implement the "effective dated" system so that the end user sees the view of the data as they need. The predominant view used in most systems will be the "current point in time" view. Use these tools to implement the effective data hiding. Hope this helps someone who has to deal with these types of design choices. Use this if it is helpful.

guillegr123
guillegr123

The worst thing I have seen in a database is fields and tables named in multiple language, depending on which country was developed the part of the database.

Realvdude
Realvdude

About Primary keys, it seems the hard rule out of the posts, is that if the natural key could possibly change, use an artifical key. Another design flaw I have seen with primary keys, is that the key includes every column that may be used for lookup(select,join,etc), instead of using seperate indexes.

Tony Hopkinson
Tony Hopkinson

Don't store dates (and times and formatted floating/fixd point values) as strings! Lost count of the number of times people have asked for help after making that decision and I have to spend the first hour explaining that "06/07/2012" is a string not a date never mind which one or the two possibilties it could be.

sergio.bobillier
sergio.bobillier

I don't know if my Databases teacher was terribly bad but I clearly remember her telling us that there were two types of primary keys: Artificial and Natural, artificial are like the ones you mention, auto increment values or random keys, however there are also natural keys and they are parts of the data that can be used as primary key because they are unique for sure. Here in my country for example every citizen has an ID number that is unique country-wide (something like the SSN in the US) so I often use it as primary key for tables involving people like Customers tables. Am I wrong?

Tony Hopkinson
Tony Hopkinson

forever. I however have used it, until the business got closed down. One of the reasons being they could not adapt their systems to the changing market conditions. Needless to say it was a massive oracle set up designed by people too clever for anyone's good who bogged off to pastures new before the wheels came off. Oracle doesn't make SP design work, competent developers make SP designs work.

Tony Hopkinson
Tony Hopkinson

You have to remember in software development terms an SP design is / was preferred over a bunch of people who confuse a table in a Db with a table in word. People who make the other four and many more mistakes on a regular basis and don't even recognise them as mistakes. A code first ORM compensates for that lack, it essentally replaces the SPs, and it does it in way that makes sense to the overall development process, when a database is simply Some where to store stuff when you application isn't running. A few years ago I would have agreed with you without reservation, in some areas of software development I still would, but for the bulk of it, since the newer ORMs, not true anymore.

Justin James
Justin James

1. Yes, stored procs are pre-compiled. Modern DBs also cache plans for frequently used queries. As a result, what we're now seeing is that stored procs have lost the speed advantage they used to enjoy. 2. Stored procs save repetition. Yes, this is true. So does *any* ORM, or in fact, any system in which you can create functions, libraries, etc. And they will usually be much more "discoverable" to programmers than stored procs too. Stored procs are MISERABLE to work with as a developer. 3. "In no case, should one ever feel that stored procedures are a "maintenance disaster" in a professional development environment. If one does, truly, one or more things are seriously wrong and require fixing with more training, proper tools, and education." I have yet to see an environment where the stored procs were handled anywhere close to as well as the rest of the software code, even when professional, experienced DBAs were involved. Take the following scenario: * Table A is made with rows X, Y, Z. X is the primary key. * Stored proc "GetFromA" is made that accepts a value for X, and returns X, Y, and Z. * Column W is added to Table A. * The DBA cannot change GetFromA, because who knows if a piece of code is depending on GetFromA returning three columns? The DBA is forced to make another version of GetFromA, because ANY CHANGES AT ALL to GetFromA will potentially break applications that depend upon GetFromA. You see the problem? A FUNDAMENTAL rule of development is that you can't change the "signature" of functions, and there is no 100% accurate way in the world to audit who the consumers of a stored procedure are. And the columns returned by a store proc are part of its "signature" to the developer. Now do you see the problem? 4. "The person who maintains an application, in my vast experience, has never had a problem identifying what stored procedures are being used by the app. A stored procedure is either identified in the code or it is not being used in that code." Your experience is lacking in a lot of the circumstances and situations that I have been in. Because in environments where folks are using languages like Ruby, Python, Perl, etc. it is nearly impossible to do what you describe. Once you get into a full boat environment where folks can access the database through more than a few bottlenecked technologies, stored procs go from bad to worse on this stuff. J.Ja

Tony Hopkinson
Tony Hopkinson

basically on the level of abstraction the more modern ones provide I believe. A well done SP design is a thing of beauty, but say switching DBMS can be problematic to say the least, add in NoSql dbs.. So I've got to say if you've found no problems, ORMs have their own as well, you haven't had to face the correct sort of challenge.

Tony Hopkinson
Tony Hopkinson

temporary one does not deserve the appelation great. Its chapter 1 page1 I'm pretty good on the db design front, but if it was something really heavy I would recommend getting someone with your skills in, and a top whack DBA to tune the installation. Thing is there are a lot of developers who's sql doesn't go past simple crud on one table (often not even that far, look how many suck the entire table on to the client so they can process it in X) who passed all their exams , got their paper and are ajudged competent by the industry, not by me though...

Tony Hopkinson
Tony Hopkinson

I mean you'd have to be trying, in fact succeeding to be a complete prick or a total idiot! Nothing stopping you having a natural key in the db to index and search on, just don't make it the Primary Key. Why is this such a difficult concept for people. They didn't call it primary key for laugh you know, secondary and tertiary etc, are quite acceptable

Jeff.sergeant
Jeff.sergeant

unless you're using a very broken DBMS, you should be able to search and index on non-keys

Tony Hopkinson
Tony Hopkinson

ORMs over stored procedures. SPs have some distinct advantages as do ORMs, but in general terms ORM is probably going to a better bet. Certainly the more modern ones, though they've a bad rep to overcome after some of the earlier sadly misguided efforts. I can assure you he isn't arguing for bits of SQL dotted all over the code base, makes his hair stand on end that.

andrew232006
andrew232006

I've actually had to deal with databases that allowed users to use both formats so there was no way to know what the date actually was.

Justin James
Justin James

I see that SO often, it kills me... J.Ja

alvalongo
alvalongo

And waht happens if this "ID number" was captured wrong? If it is a Primary Key, correct the data involves in some cases updates on every table that uses as foreign key. That why subrogate keys are good.

mattohare
mattohare

Even if it doesn't change, it may not be available. I ran a system that used SSN as a primary key for a training system. The idea was that HR would provide the SSN as part of the initial booking. Guess what, they didn't have SSNs for temporary staff. That meant the booking clerk would make up a number. Then, to make matters worse, someone actually showed up with the SSN made up for another student. Never assume these natural keys will work. There's always something to make it fail. I always said the only exception was a lookup of US states and Canadian provinces. Then they went and changed Quebec from PQ to QU. There were codes for members of the European Union, but they changed Finland from SF to FIN.

RudHud
RudHud

I don't know about your country, Sergio, but in the US Social Security Numbers can and do change, usually in response to identity theft. When you hit a quarter-million personnel records (which I have seen), the probability of this happening is 1. Social security numbers have significant privacy expectations. Even having them in your database opens you up to extra security audits (or, as an alternative, lawsuits from people who've been victimized by your lax security). If you are not cutting employee paychecks you probably should think twice about having them in your database at all. I expect the same thing is true in other countries as well. National ID numbers are a great doorway for identity theft.

Jeff.sergeant
Jeff.sergeant

Say they want a business account and a personal account. Completely separate from each other. If you're controlling the generation of primary keys yourself, you can overcome all of these issues and more,

mikewor
mikewor

I take issue with your lecturer as true natural keys are almost mythical. What are called natural keys (such as the SSN and ID number) are really artificial keys issued by some organisation over which you have no control. Do you still want to bet the house on them? In many instances these identifiers are forged/changed by criminals (the VIN - vehicle id number) is easily changed on a stolen vehicle, the ICCID on your stolen phone's SIM card is re-flashed along with the handset ID, forged identity papers are given to illegal immigrantswho then use them to set up bank accounts etc etc. So I agree 110% with Justin - NEVER use a key that has meaning.

Charles Bundy
Charles Bundy

for enumerating type of primary keys (theory) but in practice you should distance yourself from the underlying data when trying to reference it. That is really what [i]Artificial[/i] means. Justin and Tony point out [b]very[/b] good reasons for doing this. I suppose the use of [i]Natural[/i] primary keys was a space saving measure similar to two digit years but it's practical need has come and gone. P.S. good list Justin. Have a great weekend!

Tony Hopkinson
Tony Hopkinson

While it's unlikely that a number like that would be changed by the issuing authority. Two simple scenarios would cause you problems What if some one made a data entry error and put the wrong number in and it took some time to discover the error? There are ways to cope, but they are much more code and testing than using a surrogate primary key and a natural unique key. Even more likely someone comes along and says they want to put people in the system before they know what the ID number is... Surrogate keys can seem to be a bit unnecessary, when you have a natural or compound one, but they give you low cost options when things change, and change is guaranteed. That's not to say I haven't used a natural key myself a time or two, but I have been made to regret doing so a few times as well, and those times cost way more than the apparent extra complexity of a surrogate key.

Justin James
Justin James

In very rare cases, that number can change. If I were dealing with a small system (say, a company directory for a medium sized company) it might not be a problem. But for a large system (a large bank or insurance company), that is a risk. More importantly, that kind of data should never potentially be exposed. How many systems put the ID of the record in the URL or some other cacheable area? Do you really want something like: editRecord.aspx?recordId=777-111-2211 where "777-111-22-11" is that person's SSN or equivalent sitting in the browser history? That is absolutely deadly from a security perspective. J.Ja

Justin James
Justin James

... to make me go anywhere near the Oracle ecosystem ever again. I think the only way I'd do it is if I could command "Oracle consultant rates". J.Ja

mattohare
mattohare

I've lost count of all the different ways people have trouble looking me up by my name. All three have more popular alternate spellings. Then there's the appostrophe in my last name that may or may not be there. It may even confuse a system as being a single quote. Can I not just give you my ID number? I'll agree that primary key values should be assigned by the system. That doesn't mean that it can't be used later by humans to find a record.

RudHud
RudHud

Your primary key should never be visible to a user. You should never place it in a report. You should never ascribe any meaning to it -- for instance, you should never write code that assumes that Key # 5 was assigned after Key #4, even when the Sequence keyword is used to generate them (Problems in a multi-server environment.) An arbitrary primary key identifies a row, and is used in joins. That's it. Do anything else to it and someday, somehow, you'll break its ability to join tables.

Justin James
Justin James

Yes, ORMs over SPs for typical CRUD stuff, and in many cases, use the ORM to push data to the logic layer and let code handle the logic, not SPs. SPs still do (and always will) have a lot of valid use cases, but they are far less than they used to be. J.Ja

Tony Hopkinson
Tony Hopkinson

or at least you aren't the only one being picked on. :( Because default set up of an OS tends to default to en-us, a mixture of clients is a regular occurrence in the UK. When you tie it in with allied issue. sql ="Insert SomeTable Values('" + cstr(dtp.Date) + "')" .... :( :( For those of you who don't know the above method has a technical name. It's called the my next sql injection attack.

Tony Hopkinson
Tony Hopkinson

key is that sort of critical personal identifier. My response was just a straight design one.

Tony Hopkinson
Tony Hopkinson

the level of over-complication has got to be a deliberate choice. Oracle Forms for instance, sort of like access on crack. Some of it is extremely clever, a lot of it is so extremely clever, it's dumb. The pay is good, probably should be as well. In all but some vey rare situations the value for money proposition is in my opinion, dubious at best.

Editor's Picks