I don't know if my Databases teacher was terribly bad but I clearly remember her telling us that there were two types of primary keys: Artificial and Natural, artificial are like the ones you mention, auto increment values or random keys, however there are also natural keys and they are parts of the data that can be used as primary key because they are unique for sure.
Here in my country for example every citizen has an ID number that is unique country-wide (something like the SSN in the US) so I often use it as primary key for tables involving people like Customers tables. Am I wrong?
Discussion on:
View:
Show:
In very rare cases, that number can change. If I were dealing with a small system (say, a company directory for a medium sized company) it might not be a problem. But for a large system (a large bank or insurance company), that is a risk.
More importantly, that kind of data should never potentially be exposed. How many systems put the ID of the record in the URL or some other cacheable area? Do you really want something like:
editRecord.aspx?recordId=777-111-2211
where "777-111-22-11" is that person's SSN or equivalent sitting in the browser history? That is absolutely deadly from a security perspective.
J.Ja
More importantly, that kind of data should never potentially be exposed. How many systems put the ID of the record in the URL or some other cacheable area? Do you really want something like:
editRecord.aspx?recordId=777-111-2211
where "777-111-22-11" is that person's SSN or equivalent sitting in the browser history? That is absolutely deadly from a security perspective.
J.Ja
key is that sort of critical personal identifier. My response was just a straight design one.
While it's unlikely that a number like that would be changed by the issuing authority.
Two simple scenarios would cause you problems
What if some one made a data entry error and put the wrong number in and it took some time to discover the error? There are ways to cope, but they are much more code and testing than using a surrogate primary key and a natural unique key.
Even more likely someone comes along and says they want to put people in the system before they know what the ID number is...
Surrogate keys can seem to be a bit unnecessary, when you have a natural or compound one, but they give you low cost options when things change, and change is guaranteed.
That's not to say I haven't used a natural key myself a time or two, but I have been made to regret doing so a few times as well, and those times cost way more than the apparent extra complexity of a surrogate key.
Two simple scenarios would cause you problems
What if some one made a data entry error and put the wrong number in and it took some time to discover the error? There are ways to cope, but they are much more code and testing than using a surrogate primary key and a natural unique key.
Even more likely someone comes along and says they want to put people in the system before they know what the ID number is...
Surrogate keys can seem to be a bit unnecessary, when you have a natural or compound one, but they give you low cost options when things change, and change is guaranteed.
That's not to say I haven't used a natural key myself a time or two, but I have been made to regret doing so a few times as well, and those times cost way more than the apparent extra complexity of a surrogate key.
for enumerating type of primary keys (theory) but in practice you should distance yourself from the underlying data when trying to reference it. That is really what Artificial means. Justin and Tony point out very good reasons for doing this.
I suppose the use of Natural primary keys was a space saving measure similar to two digit years but it's practical need has come and gone.
P.S. good list Justin. Have a great weekend!
I suppose the use of Natural primary keys was a space saving measure similar to two digit years but it's practical need has come and gone.
P.S. good list Justin. Have a great weekend!
I take issue with your lecturer as true natural keys are almost mythical. What are called natural keys (such as the SSN and ID number) are really artificial keys issued by some organisation over which you have no control. Do you still want to bet the house on them? In many instances these identifiers are forged/changed by criminals (the VIN - vehicle id number) is easily changed on a stolen vehicle, the ICCID on your stolen phone's SIM card is re-flashed along with the handset ID, forged identity papers are given to illegal immigrantswho then use them to set up bank accounts etc etc.
So I agree 110% with Justin - NEVER use a key that has meaning.
So I agree 110% with Justin - NEVER use a key that has meaning.
Say they want a business account and a personal account. Completely separate from each other. If you're controlling the generation of primary keys yourself, you can overcome all of these issues and more,
I don't know about your country, Sergio, but in the US Social Security Numbers can and do change, usually in response to identity theft. When you hit a quarter-million personnel records (which I have seen), the probability of this happening is 1.
Social security numbers have significant privacy expectations. Even having them in your database opens you up to extra security audits (or, as an alternative, lawsuits from people who've been victimized by your lax security). If you are not cutting employee paychecks you probably should think twice about having them in your database at all. I expect the same thing is true in other countries as well. National ID numbers are a great doorway for identity theft.
Social security numbers have significant privacy expectations. Even having them in your database opens you up to extra security audits (or, as an alternative, lawsuits from people who've been victimized by your lax security). If you are not cutting employee paychecks you probably should think twice about having them in your database at all. I expect the same thing is true in other countries as well. National ID numbers are a great doorway for identity theft.
Even if it doesn't change, it may not be available. I ran a system that used SSN as a primary key for a training system. The idea was that HR would provide the SSN as part of the initial booking. Guess what, they didn't have SSNs for temporary staff. That meant the booking clerk would make up a number. Then, to make matters worse, someone actually showed up with the SSN made up for another student.
Never assume these natural keys will work. There's always something to make it fail. I always said the only exception was a lookup of US states and Canadian provinces. Then they went and changed Quebec from PQ to QU. There were codes for members of the European Union, but they changed Finland from SF to FIN.
Never assume these natural keys will work. There's always something to make it fail. I always said the only exception was a lookup of US states and Canadian provinces. Then they went and changed Quebec from PQ to QU. There were codes for members of the European Union, but they changed Finland from SF to FIN.
And waht happens if this "ID number" was captured wrong?
If it is a Primary Key, correct the data involves in some cases updates on every table that uses as foreign key.
That why subrogate keys are good.
If it is a Primary Key, correct the data involves in some cases updates on every table that uses as foreign key.
That why subrogate keys are good.
Don't store dates (and times and formatted floating/fixd point values) as strings!
Lost count of the number of times people have asked for help after making that decision and I have to spend the first hour explaining that "06/07/2012" is a string not a date never mind which one or the two possibilties it could be.
Lost count of the number of times people have asked for help after making that decision and I have to spend the first hour explaining that "06/07/2012" is a string not a date never mind which one or the two possibilties it could be.
I've actually had to deal with databases that allowed users to use both formats so there was no way to know what the date actually was.
or at least you aren't the only one being picked on. 
Because default set up of an OS tends to default to en-us, a mixture of clients is a regular occurrence in the UK.
When you tie it in with allied issue.
sql ="Insert SomeTable Values('" + cstr(dtp.Date) + "')" ....

For those of you who don't know the above method has a technical name. It's called the my next sql injection attack.
Because default set up of an OS tends to default to en-us, a mixture of clients is a regular occurrence in the UK.
When you tie it in with allied issue.
sql ="Insert SomeTable Values('" + cstr(dtp.Date) + "')" ....
For those of you who don't know the above method has a technical name. It's called the my next sql injection attack.
About Primary keys, it seems the hard rule out of the posts, is that if the natural key could possibly change, use an artifical key. Another design flaw I have seen with primary keys, is that the key includes every column that may be used for lookup(select,join,etc), instead of using seperate indexes.
The worst thing I have seen in a database is fields and tables named in multiple language, depending on which country was developed the part of the database.
I agree with all that you say.
To me there seems to be 3 classes of databases (systems).
1) the real time operational systems
These systems only store state data for a system at the current point n time.
2) Data warehouse systems,
These system only hold old data and are used to mine for trends, etc..
3) Effective dated system.
Due to auditing/validation requirements a lot of our systems today fall into this
category. These systems need to keep both "history" and "current" data.
There are two basic class for these types of system. Either the history
data is kept with the current data or the history data is kept in a separate
set of tables/files (e.g. a "log" table/file - either application or DBMS provided).
A design consideration is to keep the history data with the current data
in the same table or in a separate set of tables.
Keeping history and current data in physically separate elements (either
application or DBMS system provided) results in the need for processes
to inventory, track, verify, the two sets of data. to prove that the system
data is valid.
Keeping the two classes of data in the same table data eliminates the need to prove that
external history data is valid for the current separate table data.
The separate table/file method is the easiest to implement from a programming
view point. Using this implementation moves the design outside of the database.
Keeping the history and current data in the same physical table is a major
change in the design of the environment. In this situation, a "row" is replaced
by a "version set" of rows, of which only one is the current value.
One method to achieve this "version set" is use a "change date", column/field.
Each row in the version set has a different "change date" value. The date
value can be used to order the individual rows in history order.
At least one system that I know, has used this concept to its fullest extent.
This system uses only the "change date" to identify its data. The change
date value is allowed to range over all possible values (past, current, and future).
It is/was used n a HR system, where the HR staff would schedule raises, etc.
before they actually occur. Then when the future time becomes current, the
raise automatically occurs.
This idea carried out across an entire system multiplies the complexity of
of the SQL statements, and results (in some situations involving mutiple
table joins) in long execution times, and very large quantities of data
to be processed for a relatively small result set.
If "future" events can eliminated from the version set, then things become
easier.
By eliminating FUTURE events you need a mechanism to separate
the history rows from the current rows within the version set.
For this you suggest using a "delete" flag.
In this environment, the design consideration then moves to the delete flag
and its effect on the "primary key"; its use in indexes and relationships.
The initial use of a soft "delete" flag, only used "two" value "deleted" or
not "deleted".
To continue to use this type of delete flag, requires that a new primary key
be obtained for each new version of the row in a version set. You cannot
have two rows in the same table with the same primary key.
I suggest changing a "soft delete" flag to a "soft active" flag. The use
of the "flag" is to identify the active row out of all of the rows in the
version set. Each row in all of the versions sets that make up the
table has the same value. And this same value applies to all of
the tables that make up the application system.
When version of the row becomes apart of the history, its active
flag is set to a value other than the "active" value, and a new version
is inserted with active status value The value of the previous active
is set to different from all of the other values for the row in the
version set. There are many ways to choose how these other
non-active values are determined. (time, sequence number, etc..).
The result of this design is that the primary key becomes a union
(concatenation), of the original primary key and the "active" flag.
Implementing this type of design becomes simple. The tables are
accessed through a view that has a "where clause" contains a
condition on active flag column for the unique "active value".
Concerning stored procedures, use views, triggers and stored procedures to effectively implement the "effective dated" system so that the end user sees the view of
the data as they need. The predominant view used in most systems will be the "current
point in time" view. Use these tools to implement the effective data hiding.
Hope this helps someone who has to deal with these types of design choices.
Use this if it is helpful.
To me there seems to be 3 classes of databases (systems).
1) the real time operational systems
These systems only store state data for a system at the current point n time.
2) Data warehouse systems,
These system only hold old data and are used to mine for trends, etc..
3) Effective dated system.
Due to auditing/validation requirements a lot of our systems today fall into this
category. These systems need to keep both "history" and "current" data.
There are two basic class for these types of system. Either the history
data is kept with the current data or the history data is kept in a separate
set of tables/files (e.g. a "log" table/file - either application or DBMS provided).
A design consideration is to keep the history data with the current data
in the same table or in a separate set of tables.
Keeping history and current data in physically separate elements (either
application or DBMS system provided) results in the need for processes
to inventory, track, verify, the two sets of data. to prove that the system
data is valid.
Keeping the two classes of data in the same table data eliminates the need to prove that
external history data is valid for the current separate table data.
The separate table/file method is the easiest to implement from a programming
view point. Using this implementation moves the design outside of the database.
Keeping the history and current data in the same physical table is a major
change in the design of the environment. In this situation, a "row" is replaced
by a "version set" of rows, of which only one is the current value.
One method to achieve this "version set" is use a "change date", column/field.
Each row in the version set has a different "change date" value. The date
value can be used to order the individual rows in history order.
At least one system that I know, has used this concept to its fullest extent.
This system uses only the "change date" to identify its data. The change
date value is allowed to range over all possible values (past, current, and future).
It is/was used n a HR system, where the HR staff would schedule raises, etc.
before they actually occur. Then when the future time becomes current, the
raise automatically occurs.
This idea carried out across an entire system multiplies the complexity of
of the SQL statements, and results (in some situations involving mutiple
table joins) in long execution times, and very large quantities of data
to be processed for a relatively small result set.
If "future" events can eliminated from the version set, then things become
easier.
By eliminating FUTURE events you need a mechanism to separate
the history rows from the current rows within the version set.
For this you suggest using a "delete" flag.
In this environment, the design consideration then moves to the delete flag
and its effect on the "primary key"; its use in indexes and relationships.
The initial use of a soft "delete" flag, only used "two" value "deleted" or
not "deleted".
To continue to use this type of delete flag, requires that a new primary key
be obtained for each new version of the row in a version set. You cannot
have two rows in the same table with the same primary key.
I suggest changing a "soft delete" flag to a "soft active" flag. The use
of the "flag" is to identify the active row out of all of the rows in the
version set. Each row in all of the versions sets that make up the
table has the same value. And this same value applies to all of
the tables that make up the application system.
When version of the row becomes apart of the history, its active
flag is set to a value other than the "active" value, and a new version
is inserted with active status value The value of the previous active
is set to different from all of the other values for the row in the
version set. There are many ways to choose how these other
non-active values are determined. (time, sequence number, etc..).
The result of this design is that the primary key becomes a union
(concatenation), of the original primary key and the "active" flag.
Implementing this type of design becomes simple. The tables are
accessed through a view that has a "where clause" contains a
condition on active flag column for the unique "active value".
Concerning stored procedures, use views, triggers and stored procedures to effectively implement the "effective dated" system so that the end user sees the view of
the data as they need. The predominant view used in most systems will be the "current
point in time" view. Use these tools to implement the effective data hiding.
Hope this helps someone who has to deal with these types of design choices.
Use this if it is helpful.
It is true that databases are smarter and the performance advantages of stored procedures are not always as clear cut as it used to be; however there is one very clear advantage to stored proceudres over embedded SQL and that is they can be changed without a major release of software. This makes it possible to resolve a performance issue or even add some functionality without having to release any "software". And there is a very easy way to find out what's using a stored procedure; it's called a header added to the stored procedure. Also stored procedures are much more likely to be re-used; embedded SQL will just be copied or worse re-written (often in a slightly different way) and used throughout an application. A change in functionality then requires multiple changes throughout an application and greatly increasing the risk of bugs appearing in the system.
ORMs over stored procedures. SPs have some distinct advantages as do ORMs, but in general terms ORM is probably going to a better bet. Certainly the more modern ones, though they've a bad rep to overcome after some of the earlier sadly misguided efforts.
I can assure you he isn't arguing for bits of SQL dotted all over the code base, makes his hair stand on end that.
I can assure you he isn't arguing for bits of SQL dotted all over the code base, makes his hair stand on end that.
Yes, ORMs over SPs for typical CRUD stuff, and in many cases, use the ORM to push data to the logic layer and let code handle the logic, not SPs. SPs still do (and always will) have a lot of valid use cases, but they are far less than they used to be.
J.Ja
J.Ja
I have run into this far more often in production systems than I care to count. Why is this report taking un-Godly long to run; because the database was never index correctly. When you ask questions of your data please index the "where" portion when practical.
At first, what you say makes good sense, but, if you want to look up an individual's data, say an employee, would you say "What's your primary key?". I learned that primary keys were supposed to allow quick retrieval of data. I agree with the post about natural and artificial keys. There are times when one or the other are appropriate.
unless you're using a very broken DBMS, you should be able to search and index on non-keys
Your primary key should never be visible to a user. You should never place it in a report. You should never ascribe any meaning to it -- for instance, you should never write code that assumes that Key # 5 was assigned after Key #4, even when the Sequence keyword is used to generate them (Problems in a multi-server environment.)
An arbitrary primary key identifies a row, and is used in joins. That's it. Do anything else to it and someday, somehow, you'll break its ability to join tables.
An arbitrary primary key identifies a row, and is used in joins. That's it. Do anything else to it and someday, somehow, you'll break its ability to join tables.
I've lost count of all the different ways people have trouble looking me up by my name. All three have more popular alternate spellings. Then there's the appostrophe in my last name that may or may not be there. It may even confuse a system as being a single quote. Can I not just give you my ID number?
I'll agree that primary key values should be assigned by the system. That doesn't mean that it can't be used later by humans to find a record.
I'll agree that primary key values should be assigned by the system. That doesn't mean that it can't be used later by humans to find a record.
I mean you'd have to be trying, in fact succeeding to be a complete prick or a total idiot!
Nothing stopping you having a natural key in the db to index and search on, just don't make it the Primary Key.
Why is this such a difficult concept for people. They didn't call it primary key for laugh you know, secondary and tertiary etc, are quite acceptable
Nothing stopping you having a natural key in the db to index and search on, just don't make it the Primary Key.
Why is this such a difficult concept for people. They didn't call it primary key for laugh you know, secondary and tertiary etc, are quite acceptable
My opinions:
Data architecture is a science. The study of Set Theory and Relational Calculus is not a trivial undertaking.
Is it no more reasonable to expect to be an expert data architect without study and experience as it is to be an expert software developer.
In my experience, the talents and skills of architecture and development rarely overlap.
Overall, I appreciate the article, this subject receives too little attention.
Item 1. Natural keys can in some cases be used, however, it is generally much more straightforward and certain to continue working if surrogate keys are employed.
Item 2. Normalization is a science, not magic. Look up Functional Dependency, etc.
Item 3. Stored Procedures: If the architect has established only one mechanism to create and modify a specific set of data, there is generally little confusion about operations on that data set. Goes along with the Principle of Least Privilege.
Item 4. Yikes! Don't remind me how many jobs I've had fixing the no primary key issue! A prime example of how a great developer may not be a capable architect.
Item 5: Deletes? Not if you will ever audit the system, or need historical data.
Data architecture is a science. The study of Set Theory and Relational Calculus is not a trivial undertaking.
Is it no more reasonable to expect to be an expert data architect without study and experience as it is to be an expert software developer.
In my experience, the talents and skills of architecture and development rarely overlap.
Overall, I appreciate the article, this subject receives too little attention.
Item 1. Natural keys can in some cases be used, however, it is generally much more straightforward and certain to continue working if surrogate keys are employed.
Item 2. Normalization is a science, not magic. Look up Functional Dependency, etc.
Item 3. Stored Procedures: If the architect has established only one mechanism to create and modify a specific set of data, there is generally little confusion about operations on that data set. Goes along with the Principle of Least Privilege.
Item 4. Yikes! Don't remind me how many jobs I've had fixing the no primary key issue! A prime example of how a great developer may not be a capable architect.
Item 5: Deletes? Not if you will ever audit the system, or need historical data.
temporary one does not deserve the appelation great.
Its chapter 1 page1
I'm pretty good on the db design front, but if it was something really heavy I would recommend getting someone with your skills in, and a top whack DBA to tune the installation. Thing is there are a lot of developers who's sql doesn't go past simple crud on one table (often not even that far, look how many suck the entire table on to the client so they can process it in X) who passed all their exams , got their paper and are ajudged competent by the industry, not by me though...
Its chapter 1 page1
I'm pretty good on the db design front, but if it was something really heavy I would recommend getting someone with your skills in, and a top whack DBA to tune the installation. Thing is there are a lot of developers who's sql doesn't go past simple crud on one table (often not even that far, look how many suck the entire table on to the client so they can process it in X) who passed all their exams , got their paper and are ajudged competent by the industry, not by me though...
I only don't feel comfortable with the questioning about stored procedure overuse. We understand that it is a good practice to code at database level to mediate access and ensure data integrity. We've been doing that extensively for years with large application and we have found no problems.
basically on the level of abstraction the more modern ones provide I believe. A well done SP design is a thing of beauty, but say switching DBMS can be problematic to say the least, add in NoSql dbs..
So I've got to say if you've found no problems, ORMs have their own as well, you haven't had to face the correct sort of challenge.
So I've got to say if you've found no problems, ORMs have their own as well, you haven't had to face the correct sort of challenge.
I can only say that because so many development environments do not use competent, experienced database developers, or do not have enough of them to review the design and coding of databases, that I have had a wonderful, well-paid career fixing many database environments, for many companies over the years. It is a never-ending problem for companies and one that has assured that I have steady, challenging employment. ? Because I want non-SQL developers to be properly educated, I see this response to item 3 (Stored Procedure Overuse) as an opportunity to educate others about why stored procedures are widely used and strongly recommended instead of non-stored procedure SQL.
3: Stored procedure overuse my refutation
From the email link:
"???Stored procedures are useful, but for most modern applications (not all), using them as often as was done in the past is overkill. Modern ORMs and other techniques make stored procedures a lot less necessary, and are much more nimble in the development process. More importantly, stored procedures are a maintenance disaster. There is no good or easy way to audit what applications are definitively using them, which means that any major changes usually mean that you write another stored procedure instead of altering the existing one, creating problems knowing which one to use. The next time you consider writing a stored procedure, make sure it is truly justified."
This argument often surfaces from developers who do not understand the complexity of modern database engines. By gaining an understanding of the benefits of using stored procedures, I hope that readers find many more reasons to doubt future articles that question the use of procs.
I'll quote the words used in the emailed article link:
??? "Modern ORMs and other techniques make stored procedures a lot less necessary, and are much more nimble in the development process."
Seriously, the statement above is so wrong that it always astounds me when I see it repeated here and there by developers who really do not understand the purpose and power of modern database engines. I'll try to keep this refutation as short and direct as possible.
Stored procedures are precompiled by modern databases. That allows the database engine to generate and cache execution plans for evaluation by an optimization engine. Optimization engines are incredibly sophisticated and get more powerful with each release. They evaluate access and allocation of data memory, index memory, temp storage memory, cpu usage, table selectivity statistics, and other important criteria in order to provide an execution plan that is at least "good enough". "Good enough" means that after generating hundreds or thousands of possible execution plans, it chooses one or two that it thinks will be the best. Precompiled, properly written stored procedures allow reuse of cached execution plans. Without stored procedures, the optimizer is simply not nearly as fast or as smart. If the optimizer has to waste time doing something that it doesn't need to do, that is inefficient and not smart.
With development of modern databases and use of parallel processing, more speed and sophistication are critical. Parallel processing is being used in many more processes, "under the covers" in modern database engines. If multiple processors must generate an execution plan for dynamically generated database code, then that is more wasted effort.
Another major purpose of stored procedures is to keep developers from rewriting the same code over and over. Using stored procedures, it is extremely easy and very fast to search for stored procedures that use any database object (table columns, functions, assemblies, etc), even if one is not a SQL programmer. At least it is in SQL Server 2005 and later. I do this all the time, especially because I'm relatively new to the company and I am always searching for appropriate code. As a contractor at various times in my career, it has been critical for me to be able to find and use, or compare, or change existing code.
SQL, the language of non-assembly stored procedures, is "set-based", not row-based. Set-based processing is capable of vastly more powerful data manipulation than row-based, non-SQL processes. And before I move on, yes, there are also stored procedures that are written in other languages and packaged as DLL assemblies, with SQL proc wrappers. They exist because of the greater benefits some other languages provide when required to do intense manipulation of character data strings. They are not used for the vast majority of data manipulation, string or otherwise, for good reason. And yes, there are non-SQL databases that are optimized to handle vast quantities of unstructured text data. They are used for specialized situations in "big-data" shops like Amazon and Google. However, for most situations, for most companies, SQL procs are the preferred tool.
Stored procedures are the place to put business rules pertaining to the data (except for simple, front-end, data validation), not middle-tier apps. Middle-tier apps should use stored procedures to manage processing. Middle-tier and front end processes should be used to provide feedback in interactive applications, to users. The database is all about making it easy and fast to intelligently access and reuse common processes without causing the server to waste time essentially redoing the same things with each similar request. And when business changes, and it always does, it is easy to find and change applicable code in the database, regardless of what applications call that code.
Ideally, stored procedures should be managed in a source control environment. That adds even more value because then one does not need to look at individual servers to find or review code. And one can quickly compare versions and rollback code if necessary. Source control environments usually provide great search tools too.
???"More importantly, stored procedures are a maintenance disaster. There is no good or easy way to audit what applications are definitively using them, which means that any major changes usually mean that you write another stored procedure instead of altering the existing one, creating problems knowing which one to use."
This is stunningly wrong. The phrase "maintenance disaster" when combined with "stored procedures" is very clearly indicative of a development environment that is composed of very bad programming, lack of source control, poor application development management, or all of the above. In no case, should one ever feel that stored procedures are a "maintenance disaster" in a professional development environment. If one does, truly, one or more things are seriously wrong and require fixing with more training, proper tools, and education.
Next, I must point out that "???what applications are using them" hints at what the real problem may be in some environments. If a development environment is properly managed, one does not worry about which applications are using what stored procedures. If the logic in the stored procedures is valid and non-redundant, you have no worries, because whatever apps are using them are obviously using the correct logic and that is a major point. It simply does not matter what apps are using what stored procedures if the stored procedure code is correct. In fact, if one knows that all one needs to do is make sure the stored procedures are correct, one does not need to worry about finding and changing each and every app that refers to the stored procedures. If the stored procedures are not correct, you can change them in that one place and all the apps using it will now be correct, instead of having to change every application.
The person who maintains an application, in my vast experience, has never had a problem identifying what stored procedures are being used by the app. A stored procedure is either identified in the code or it is not being used in that code. When stored procedures call other stored procedures, that can appear to be a problem until one realizes that one should not be correcting stored procedures app by app. One should search directly for logic in only the stored procedures and correct it there. Doing this, one does not need to hunt down what apps are using what stored procedures. If one must know this for some reason, it is easy enough to run a simple trace to capture what stored procedures are being called by what apps, but really, that is usually done only when a bad development environment is being fixed so that it is properly managed and is not necessary after fixing the development process.
As far as auditing goes, whether one is talking about code audits or data audits, professional auditors practically cry when they must look outside the database for things that should be coded in stored procedures in the database, because they know that means that each programmer has the discretion of creating dozens or more of redundant and slightly variant versions of code meant to accomplish the same task.
If developers are coding new stored procedures to do something that is already in an existing stored procedure, then that probably means they either don't understand the stored procedure code or they don't know how to find or refactor existing stored procedures to split out logic parts into multiple sub procedures that allow for wider code reuse. As with any non-SQL development, the same rule applies do not change an interface to a stored procedure to avoid causing problems to applications that are using it. That does not mean that one can't refactor the code to allow use of part of the code, but not all of the code, when it makes sense to do so.
There are many other reasons that justify using stored procedures that I don't have time to mention including security and use of parameter data types. Many other knowledgeable people have written about this subject. Unfortunately, many under-educated, non-SQL developers have written misinformation, so all I can say is "good luck" with that. It is just more redevelopment work for people like me to fix things later.
Additionally, at least with SQL Server and the Microsoft tools stack, it is easy to allow object-oriented programmers to develop interfaces with the database using stored procedures in an object-oriented context using tools like "Link_to_SQL". I'm sure other environments must have these tools which make it easy for someone with an object-orientation background to make use of database objects that already exist.
3: Stored procedure overuse my refutation
From the email link:
"???Stored procedures are useful, but for most modern applications (not all), using them as often as was done in the past is overkill. Modern ORMs and other techniques make stored procedures a lot less necessary, and are much more nimble in the development process. More importantly, stored procedures are a maintenance disaster. There is no good or easy way to audit what applications are definitively using them, which means that any major changes usually mean that you write another stored procedure instead of altering the existing one, creating problems knowing which one to use. The next time you consider writing a stored procedure, make sure it is truly justified."
This argument often surfaces from developers who do not understand the complexity of modern database engines. By gaining an understanding of the benefits of using stored procedures, I hope that readers find many more reasons to doubt future articles that question the use of procs.
I'll quote the words used in the emailed article link:
??? "Modern ORMs and other techniques make stored procedures a lot less necessary, and are much more nimble in the development process."
Seriously, the statement above is so wrong that it always astounds me when I see it repeated here and there by developers who really do not understand the purpose and power of modern database engines. I'll try to keep this refutation as short and direct as possible.
Stored procedures are precompiled by modern databases. That allows the database engine to generate and cache execution plans for evaluation by an optimization engine. Optimization engines are incredibly sophisticated and get more powerful with each release. They evaluate access and allocation of data memory, index memory, temp storage memory, cpu usage, table selectivity statistics, and other important criteria in order to provide an execution plan that is at least "good enough". "Good enough" means that after generating hundreds or thousands of possible execution plans, it chooses one or two that it thinks will be the best. Precompiled, properly written stored procedures allow reuse of cached execution plans. Without stored procedures, the optimizer is simply not nearly as fast or as smart. If the optimizer has to waste time doing something that it doesn't need to do, that is inefficient and not smart.
With development of modern databases and use of parallel processing, more speed and sophistication are critical. Parallel processing is being used in many more processes, "under the covers" in modern database engines. If multiple processors must generate an execution plan for dynamically generated database code, then that is more wasted effort.
Another major purpose of stored procedures is to keep developers from rewriting the same code over and over. Using stored procedures, it is extremely easy and very fast to search for stored procedures that use any database object (table columns, functions, assemblies, etc), even if one is not a SQL programmer. At least it is in SQL Server 2005 and later. I do this all the time, especially because I'm relatively new to the company and I am always searching for appropriate code. As a contractor at various times in my career, it has been critical for me to be able to find and use, or compare, or change existing code.
SQL, the language of non-assembly stored procedures, is "set-based", not row-based. Set-based processing is capable of vastly more powerful data manipulation than row-based, non-SQL processes. And before I move on, yes, there are also stored procedures that are written in other languages and packaged as DLL assemblies, with SQL proc wrappers. They exist because of the greater benefits some other languages provide when required to do intense manipulation of character data strings. They are not used for the vast majority of data manipulation, string or otherwise, for good reason. And yes, there are non-SQL databases that are optimized to handle vast quantities of unstructured text data. They are used for specialized situations in "big-data" shops like Amazon and Google. However, for most situations, for most companies, SQL procs are the preferred tool.
Stored procedures are the place to put business rules pertaining to the data (except for simple, front-end, data validation), not middle-tier apps. Middle-tier apps should use stored procedures to manage processing. Middle-tier and front end processes should be used to provide feedback in interactive applications, to users. The database is all about making it easy and fast to intelligently access and reuse common processes without causing the server to waste time essentially redoing the same things with each similar request. And when business changes, and it always does, it is easy to find and change applicable code in the database, regardless of what applications call that code.
Ideally, stored procedures should be managed in a source control environment. That adds even more value because then one does not need to look at individual servers to find or review code. And one can quickly compare versions and rollback code if necessary. Source control environments usually provide great search tools too.
???"More importantly, stored procedures are a maintenance disaster. There is no good or easy way to audit what applications are definitively using them, which means that any major changes usually mean that you write another stored procedure instead of altering the existing one, creating problems knowing which one to use."
This is stunningly wrong. The phrase "maintenance disaster" when combined with "stored procedures" is very clearly indicative of a development environment that is composed of very bad programming, lack of source control, poor application development management, or all of the above. In no case, should one ever feel that stored procedures are a "maintenance disaster" in a professional development environment. If one does, truly, one or more things are seriously wrong and require fixing with more training, proper tools, and education.
Next, I must point out that "???what applications are using them" hints at what the real problem may be in some environments. If a development environment is properly managed, one does not worry about which applications are using what stored procedures. If the logic in the stored procedures is valid and non-redundant, you have no worries, because whatever apps are using them are obviously using the correct logic and that is a major point. It simply does not matter what apps are using what stored procedures if the stored procedure code is correct. In fact, if one knows that all one needs to do is make sure the stored procedures are correct, one does not need to worry about finding and changing each and every app that refers to the stored procedures. If the stored procedures are not correct, you can change them in that one place and all the apps using it will now be correct, instead of having to change every application.
The person who maintains an application, in my vast experience, has never had a problem identifying what stored procedures are being used by the app. A stored procedure is either identified in the code or it is not being used in that code. When stored procedures call other stored procedures, that can appear to be a problem until one realizes that one should not be correcting stored procedures app by app. One should search directly for logic in only the stored procedures and correct it there. Doing this, one does not need to hunt down what apps are using what stored procedures. If one must know this for some reason, it is easy enough to run a simple trace to capture what stored procedures are being called by what apps, but really, that is usually done only when a bad development environment is being fixed so that it is properly managed and is not necessary after fixing the development process.
As far as auditing goes, whether one is talking about code audits or data audits, professional auditors practically cry when they must look outside the database for things that should be coded in stored procedures in the database, because they know that means that each programmer has the discretion of creating dozens or more of redundant and slightly variant versions of code meant to accomplish the same task.
If developers are coding new stored procedures to do something that is already in an existing stored procedure, then that probably means they either don't understand the stored procedure code or they don't know how to find or refactor existing stored procedures to split out logic parts into multiple sub procedures that allow for wider code reuse. As with any non-SQL development, the same rule applies do not change an interface to a stored procedure to avoid causing problems to applications that are using it. That does not mean that one can't refactor the code to allow use of part of the code, but not all of the code, when it makes sense to do so.
There are many other reasons that justify using stored procedures that I don't have time to mention including security and use of parameter data types. Many other knowledgeable people have written about this subject. Unfortunately, many under-educated, non-SQL developers have written misinformation, so all I can say is "good luck" with that. It is just more redevelopment work for people like me to fix things later.
Additionally, at least with SQL Server and the Microsoft tools stack, it is easy to allow object-oriented programmers to develop interfaces with the database using stored procedures in an object-oriented context using tools like "Link_to_SQL". I'm sure other environments must have these tools which make it easy for someone with an object-orientation background to make use of database objects that already exist.
1. Yes, stored procs are pre-compiled. Modern DBs also cache plans for frequently used queries. As a result, what we're now seeing is that stored procs have lost the speed advantage they used to enjoy.
2. Stored procs save repetition. Yes, this is true. So does *any* ORM, or in fact, any system in which you can create functions, libraries, etc. And they will usually be much more "discoverable" to programmers than stored procs too. Stored procs are MISERABLE to work with as a developer.
3. "In no case, should one ever feel that stored procedures are a "maintenance disaster" in a professional development environment. If one does, truly, one or more things are seriously wrong and require fixing with more training, proper tools, and education."
I have yet to see an environment where the stored procs were handled anywhere close to as well as the rest of the software code, even when professional, experienced DBAs were involved. Take the following scenario:
* Table A is made with rows X, Y, Z. X is the primary key.
* Stored proc "GetFromA" is made that accepts a value for X, and returns X, Y, and Z.
* Column W is added to Table A.
* The DBA cannot change GetFromA, because who knows if a piece of code is depending on GetFromA returning three columns? The DBA is forced to make another version of GetFromA, because ANY CHANGES AT ALL to GetFromA will potentially break applications that depend upon GetFromA.
You see the problem?
A FUNDAMENTAL rule of development is that you can't change the "signature" of functions, and there is no 100% accurate way in the world to audit who the consumers of a stored procedure are. And the columns returned by a store proc are part of its "signature" to the developer.
Now do you see the problem?
4. "The person who maintains an application, in my vast experience, has never had a problem identifying what stored procedures are being used by the app. A stored procedure is either identified in the code or it is not being used in that code."
Your experience is lacking in a lot of the circumstances and situations that I have been in. Because in environments where folks are using languages like Ruby, Python, Perl, etc. it is nearly impossible to do what you describe.
Once you get into a full boat environment where folks can access the database through more than a few bottlenecked technologies, stored procs go from bad to worse on this stuff.
J.Ja
2. Stored procs save repetition. Yes, this is true. So does *any* ORM, or in fact, any system in which you can create functions, libraries, etc. And they will usually be much more "discoverable" to programmers than stored procs too. Stored procs are MISERABLE to work with as a developer.
3. "In no case, should one ever feel that stored procedures are a "maintenance disaster" in a professional development environment. If one does, truly, one or more things are seriously wrong and require fixing with more training, proper tools, and education."
I have yet to see an environment where the stored procs were handled anywhere close to as well as the rest of the software code, even when professional, experienced DBAs were involved. Take the following scenario:
* Table A is made with rows X, Y, Z. X is the primary key.
* Stored proc "GetFromA" is made that accepts a value for X, and returns X, Y, and Z.
* Column W is added to Table A.
* The DBA cannot change GetFromA, because who knows if a piece of code is depending on GetFromA returning three columns? The DBA is forced to make another version of GetFromA, because ANY CHANGES AT ALL to GetFromA will potentially break applications that depend upon GetFromA.
You see the problem?
A FUNDAMENTAL rule of development is that you can't change the "signature" of functions, and there is no 100% accurate way in the world to audit who the consumers of a stored procedure are. And the columns returned by a store proc are part of its "signature" to the developer.
Now do you see the problem?
4. "The person who maintains an application, in my vast experience, has never had a problem identifying what stored procedures are being used by the app. A stored procedure is either identified in the code or it is not being used in that code."
Your experience is lacking in a lot of the circumstances and situations that I have been in. Because in environments where folks are using languages like Ruby, Python, Perl, etc. it is nearly impossible to do what you describe.
Once you get into a full boat environment where folks can access the database through more than a few bottlenecked technologies, stored procs go from bad to worse on this stuff.
J.Ja
You have to remember in software development terms an SP design is / was preferred over a bunch of people who confuse a table in a Db with a table in word. People who make the other four and many more mistakes on a regular basis and don't even recognise them as mistakes.
A code first ORM compensates for that lack, it essentally replaces the SPs, and it does it in way that makes sense to the overall development process, when a database is simply Some where to store stuff when you application isn't running.
A few years ago I would have agreed with you without reservation, in some areas of software development I still would, but for the bulk of it, since the newer ORMs, not true anymore.
A code first ORM compensates for that lack, it essentally replaces the SPs, and it does it in way that makes sense to the overall development process, when a database is simply Some where to store stuff when you application isn't running.
A few years ago I would have agreed with you without reservation, in some areas of software development I still would, but for the bulk of it, since the newer ORMs, not true anymore.
I disagree with point 3 "Stored procedure overuse".
It seems that Justin doesn't use Oracle.
Store procedure it's all about in Oracle; they easy develop APIs and TAPI so the logic of bussinnes and others duties on database are common to any application who wants to use the database.
It seems that Justin doesn't use Oracle.
Store procedure it's all about in Oracle; they easy develop APIs and TAPI so the logic of bussinnes and others duties on database are common to any application who wants to use the database.
forever.
I however have used it, until the business got closed down. One of the reasons being they could not adapt their systems to the changing market conditions. Needless to say it was a massive oracle set up designed by people too clever for anyone's good who bogged off to pastures new before the wheels came off.
Oracle doesn't make SP design work, competent developers make SP designs work.
I however have used it, until the business got closed down. One of the reasons being they could not adapt their systems to the changing market conditions. Needless to say it was a massive oracle set up designed by people too clever for anyone's good who bogged off to pastures new before the wheels came off.
Oracle doesn't make SP design work, competent developers make SP designs work.
... to make me go anywhere near the Oracle ecosystem ever again. I think the only way I'd do it is if I could command "Oracle consultant rates".
J.Ja
J.Ja
the level of over-complication has got to be a deliberate choice. Oracle Forms for instance, sort of like access on crack. Some of it is extremely clever, a lot of it is so extremely clever, it's dumb.
The pay is good, probably should be as well. In all but some vey rare situations the value for money proposition is in my opinion, dubious at best.
The pay is good, probably should be as well. In all but some vey rare situations the value for money proposition is in my opinion, dubious at best.
really?
how about a situation where the object oriented approach is the wrong way to go?
a friend has developed an application that is currently being used in a few medical clinics locally, methadone maintenance program only clinics. he has found that a procedural approach dramatically improves performance, and reduces the code complexity for the app.
and due to original client requirement, the db engine this thing uses is access. :/
interestingly enough, the "triplicate" prescription requirements of methadone are lower in security focus than the medical records are. :/
methadone, like a few other medications, requires a "triplicate" prescription form, which is a carbon copy of a prescription, on file in the patient's chart and available for reference if any questions are asked about that patient's treatment, by the authorities with legal access to the records.
how about a situation where the object oriented approach is the wrong way to go?
a friend has developed an application that is currently being used in a few medical clinics locally, methadone maintenance program only clinics. he has found that a procedural approach dramatically improves performance, and reduces the code complexity for the app.
and due to original client requirement, the db engine this thing uses is access. :/
interestingly enough, the "triplicate" prescription requirements of methadone are lower in security focus than the medical records are. :/
methadone, like a few other medications, requires a "triplicate" prescription form, which is a carbon copy of a prescription, on file in the patient's chart and available for reference if any questions are asked about that patient's treatment, by the authorities with legal access to the records.
- Keyboard Shortcuts:
- Prev
- Next
- Toggle

































