Data Management

10+ ways to screw up your database design

Database developers often fall into certain design traps -- and it's generally the users who pay the price. See how you can improve your design process and build a more flexible, reliable, and efficient database.

Database developers are people too -- you make mistakes just like everyone else. The problem is, users are the ones who pay for those mistakes when they have to work with slow, clunky applications. If you're part of your company's IT team, a mistake could cost you your job. If you're a freelance developer, a bad database usually means a lost client -- and even more lost business, if word gets around. Here are some common design pitfalls to watch out for.

Note: This article is also available as a PDF download.

1: The data's unimportant; it's the architecture that matters

It doesn't matter if your code sings, if you don't know the data. You want the two to work in harmony and that means spending some time with the people who use and manipulate all that data. This is probably the most important rule: Before you do anything, you absolutely must get intimate with the data. Without that firsthand knowledge, you might make errors in judgment that have far-reaching consequences -- like dragging the whole application to a complete halt.

A nonchalant attitude that the data isn't important isn't a sign of laziness. It's a mistaken perception that anything that doesn't work quite right early on can be fixed later. I just don't agree. Doing it right from the bottom up will produce a foundation that can grow and accommodate change quickly. Without that foundation, any database is just a few Band-Aids away from disaster.

2: I can do anything with a little code

Some developers are so skilled that they can make just about anything happen with a bit of code. But you can take a good thing too far. One of my biggest complaints about the developers' psyche is that they want to solve everything with code, even when a system feature exists to handle the need. They claim it's just easier -- easier for them, maybe, but not necessarily easier for those maintaining the database. My recommendation is to use the built-in features, when available, unless you don't know the RDBMS well enough. If that's the case, see #3.

3: I can use whatever RDBMS you have

This brings me to the next point: Developers who think the system is unimportant because their coding ability is the only magic they need. Wrong! Unless your hands are tied, choose the best system for the job. You'll save your client time and money and build a reputation as an honest and comprehensive developer. You might not have a choice, of course. If you find yourself stuck with a system you're not familiar with or that's just not right for the job, you might want to excuse yourself from the project. You're going to take the fall for that decision eventually, even if you didn't make it.

4: That doesn't need an index

Very little affects performance like failing to apply an index or applying an index incorrectly. It isn't rocket science, and there are guidelines that help. But many developers still avoid the task altogether. Without proper indexing, your database will eventually slow down and irritate users. Perhaps the only thing that causes as much trouble as no indexing is too much indexing.

There's a lot of free information on indexing, but here are my simplest recommendations on the subject:

  • Learn about the index construction the RDBMS uses.
  • Learn when to apply those indexes for optimum coverage.
  • Run the RDBMS's index analyzation or tune-up tool for hints.

5: This database doesn't require referential integrity

Enforcing referential integrity protects the validity of your data by eliminating orphans (foreign keys that have no related primary key entity). For instance, in a sales database, you might have an ordered item that doesn't point to a customer -- not a good idea. If your RDBMS supports referential integrity, I recommend that you use it.

6: Natural keys are best

Relational database theory relies on keys, primary and foreign. Natural keys are based on data, which of course has meaning within the context of the database's purpose. Natural keys are obsolete now that we have systems that can generate sequential values, known as surrogates. They have little purpose beyond identifying entities. (They are usually an auto-incrementing data type).

The superiority of natural versus surrogate keys is a hotly debated topic. Just bring it up in your favorite development list or forum, sit back, and watch the show. Here's the nitty-gritty though:

  • Natural keys can be unwieldy and awkward to maintain. It might take several columns to create a unique key for each record. It's doable, but do you really want to accommodate that kind of structure if you don't have to?
  • Primary keys are supposed to be stable; nothing about data is stable. It changes all the time. In contrast, there's no reason to ever change a surrogate key. You might delete one, but if you have to change a surrogate key, something's wrong with your design.

The biggest argument for natural keys is one of association. Proponents insist that you need to be able to associate the key to the actual record. Why? Keys are used by the RDBMS, not users. The other most commonly heard argument is that surrogate keys allow duplicate records. My response is to apply indexes appropriately to avoid duplicate records.

I recommend surrogate keys -- always, which is an invitation to hate mail, but it's my recommendation just the same. I can think of no circumstance where a natural key would be preferable to a surrogate.

7: Normalization is a waste of time

Just writing that title hurts. Unfortunately, I do run into developers who don't take normalization seriously enough. Normalization is the process of removing any repeating groups and redundant data to related tables. This process supports the RDBMS by theory and design. Without normalization, a RDBMS is doomed. Despite its importance, many developers make a cursory pass through the data and normalize very little, and that's a mistake you should avoid. Take the time to break down your data, normalizing at least to 2nd or 3rd Normal form.

8: You can't normalize enough

The previous point may seem to imply that normalization is the panacea of database design. But like code, too much of a good thing can slow things down. The more tables and joins involved in pulling data together into meaningful information, the slower the database will perform. Don't overdo it -- be thorough without being obsessed.

If your normalization scheme requires several tables to generate a common view, you've gone too far (probably). In short, if performance slows and there's nothing wrong with the connection, the query, and so on, excessive normalization might be the culprit.

9: It'll perform just as well with real data

Failing to test a database for scalability is a huge mistake. During the development stage, it's acceptable to work with a scant amount of data. On the other hand, a few rows of test data just can't provide a realistic view of how the database will perform in a production environment. Before going live, be sure to test your database with real data, and lots of it. Doing so will expose bottlenecks and vulnerabilities.

You can blame the database engine for choking on real data -- nice work if you can get it (and the client believes you).

10: Only the most elegant code is good enough for my clients

This attitude is another example of how too much of a good thing can be bad. We all want to write the best code possible, but sometimes, good enough is, well, good enough. Time spent optimizing routines that already perform well and accurately can be money down the drain for your client. If the database runs great with a bit of ugly code, so what? Is the trade-off worth the extra time and money you'll spend to optimize the code to its fullest? I'm betting your client would answer in the negative. I'm not saying write clunky code. Nor am I suggesting that you write code that performs poorly because doing so makes your job easier. I'm saying, don't put your client's money into optimizing something that works fine as is. Put that time into good design and a solid foundation -- that's what will support the best performance.

11: You can back it up later

If the data is important enough to store, it's important enough to protect. Hardware breaks. Mistakes happen. A backup plan should be part of your development process, not an afterthought: I meant to do that. How often should you back up the database, where will you store those backups, and so on, are questions to answer up front, not after your client losses important data.

12: You promised that wouldn't change

The client promised that a specific business rule would never change and you believed it. Never believe them! Don't take the easy way out on this one; apply the best design and logic so that change is easy. The truth is, once users become accustom to the database, they'll want more -- and that means change. It's just about the only part of the whole development process you can depend on.

13: Yes, I can give you the moon

Some developers are so ambitious. Wanting to give users everything they want in the first version is a nice sentiment, but it's also impractical. Unless the project is small with a specific focus, producing a foundation version that can go into production quickly is preferable. Users won't get everything they asked for, but they'll have a production database much sooner. You can add features with subsequent versions. The client gets work quickly and you get job security.


About

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

97 comments
kerryjn
kerryjn

There is reason for both surrogate and natural keys. A primary example is if you are implementing a type 6 hybrid slowing changing dimension structure to the database. I believe that a database can exist with both natural and surrogate keys. If the key is simple, primarily belongs to a static reference table then leave it natural. Why complicate things? On the other hand if there is the slightest chance the natural key will change over time, or if you have 10s of 1000s of keys, then create a numeric surrogate key. The join performance will be more optimal on the numeric key over some cryptic alpha key. Just my 2c.

verelse
verelse

...but since I code in large-scale, heterogenous, multi-locational RDBs, I reckon I will continue to use what works where it works. Massive updates? Better use a surrogate unless you just love page splits and long I/O for reordering pages. Lots of queries against target (i.e. all or part of natural) key? Best use natural key. Sorry, there ain't no one right answer, as Heinlein might not have said. Better learn how the damn thing works before you stick your key in and start 'er up. Most seem to forget that the point of a database is ONLY to get stuff OUT. The only reason you put stuff in, of course, being that you wanna get it back out again. Or is your database write only? The answer to your key question then will be: how will I be fetching the data? Is it beneficial to slow the insert process to speed the retrieval? Most cases youbetcha. The number of retrievals per insert is phenomenally high (and a big ole honkin hint on where your design effort should go). If you are joining on a dozen tables to get a regular query, YOUR DATABASE SUCKS. This does not mean you are a bad person unless you never fix it. If you follow some arcane rule or another and it does not work for you, just F*CKING break the rule (even normalization). I guess I will repeat it just to be obnoxious: The point of a database is ONLY to get stuff OUT. Focus on that and forget what anyone tells you: if you get out of the database the right data everytime, you will get the money and everyone else will be blogging for free.

RayJeff
RayJeff

...competence. Very ironic you say that. That cheap numptie..well, they were (and still is) the MIS director for the college. And as he had told me, because he was in his current position, he was a system analysis. Ummm...yea. ?:|

RayJeff
RayJeff

I tend to think that depending on the situation, for lack of a better word...depending on what the natural value in that specific table, then using a natural key/value as a primary may work. For the database I developed, even though I used the Autonumber as a primary key in some tables, I used the SSN/Social Security Numbers for mainly all of the tables. The reason why is because of the users of the database. The database held data on students who were planning working on an Education degree, students who were formally admitted into the Education degree program and the teachers who graduated from the Education program and are teachers in the field. Because the college is a very small liberal arts college, the faculty take on the hands-on approach. They make their business to know each student. With the students being known on a first name basis, any kind of information searching would be based on knowing the name. And with me working in the division and also conducted various training sessions with potential students going into the program, I knew them by name as well. Now, using a student's name as a natural key would obviously work in on table, but can't be used in other tables that are referencing other sets of data. Hence, using the only type of data that uniquely references a student-their SSN. In the case of this database, the Autonumber was a meaningless value/key. it was just used to number each record.

Dynic-IT
Dynic-IT

I'd just like to say that, while I agree with the points raised in this article, point 12 is the BIG one, and the first point I hold true in my own carrier. Take nothing for granted either. I once had a financial system crash because I hard coded that there were 12 months in the year. The financial controlled decided that the needed to close of the 12th month without all the transactions being in, so he added a thirteenth month to the system to post the outstanding transactions in.

pjboyles
pjboyles

Saw a database application that was tested solely in a LAN environment that was deployed in a WAN enviroment. Testing of the database itself was fine. Testing of the client in a LAN was fine. Queries that took seconds on the LAN took 15 plus minutes over the WAN. So don't overlook the client portion of the design.

jszivos
jszivos

It's probably the Network Administrator inside of me speaking, but all I can say is "Typical DB Engineer article..." Completely overlooking the security of sensitive data. It is the responsibility of the Engineer to design the database in such a manner to maintain the integrity of the secure data. Stored procedures should be created to simplify the management of the database. Granted the Project Manager or DB Manager should be aware of security when the database is integrated into the system, the DB Engineer needs to either design the database with security in mind or document steps for the integration/management team to ensure security.

TwistedDragon
TwistedDragon

I think that is quite possibly the best code article I have read in months, well done indeed

Jaqui
Jaqui

[b]3: I can use whatever RDBMS you have[/b] This is actually close to a truism. HOW you design the actual db to work with the specific RDBMS is what is important. just as how you code the queries is different for each one. but if you pay attention to the specifics of the rdbms, then pretty much any one can do the job for any project. I wouldn't use sqlite or access for a major enterprise project, any more than you would, but if the structure and queries are done right, then MySQL, postgresql, mssql, oracle are effectively interchangeable. MySQL would be more work for the full enterprise replication and security, so obviously not the best choice, but workable.

Oz_Media
Oz_Media

While everyone seems fixated on technical issues, the big picture is always over looked...USE. This is not the IT database, it is not used to store your update schedule, code snippets, links etc. This is for the users in the company to work more efficiently, generate more revenue and provide business operations statistics to the right people, who each have very unique needs. Sitting down with THE ENTIRE team and giving them a week to offer feedback as to their specific needs is imperative. Sales reps may say they need to search by certain fields, for demographic marketing, but you still need to show them exactly what you are able to do with those fields, the input format can effect indexing immensely. Once they understand what the COULD do, they will tell you what they NEED to do and what they would LIKE to do. A database doesn't just have to operate correctly it has to track and provide data to users. Nobody has even mentioned integrating a pipeline management system, which is imperative for the people who raise the money that pays you. Does anyone actually know what proper pipeline management entails? Of course sales management will have different needs, operations and administrators also have different needs. When it comes to designing the database get input from ALL corners, then you can build it once and not be lookign at it a month after implementation saying, 'yes I will add that to the next revisions'. YOU as an IT manager or tech, will never be able to determine what sales reps, management and administrators WANT and NEED in a database, just as they wouldn't know how to set up your updating schedule. You must get feedback, be an open book and listen to what the needs are, offer solutions and options to them that they may not even be aware is possible. I've been on the losing end of this more than once, where a database is designed and just thrown into place. Within 5 minutes the holes and missed fields/indexing is found and you then have to wait for revisions and even then many aren't addressed properly. Do it right, do it once.

TheLinuxGuru
TheLinuxGuru

In the same way that you can overdo normalization, we've found that excessive 'referential integrity' has its downside: performance! We've taken an established SQL product and ripped out all references, hand-written updates rather than updating via views and written routines to manually handle cascading deletes rather than letting the database do it, and achieved nearly 30% increase in responsiveness! Fortunately for us, our customers only have read rather than read/write access to the databases from their query writer, so we're "safe" to do this. But it did surprise us just how much faster this made it (and a little more portable)

pnj2411
pnj2411

Here's one more - Any old name will do. It's always worth spending a bit of time up front defining a naming convention - single or plural for table names, abbreviate or not, decide on the order of words in m:n association tables, etc. And use a unique prefix for all tables in a system so they can cohabit nicely with other databases that have a 'Customer' table. And agree not to touch the underscore key when typing any name...

brucelofland
brucelofland

Referential integrity and normalization the the most common problems I have found. Developers what to provide referential integrity in their code because they don't want the database telling them what they can and cannot do. They assume that their code is always bug-free and never crashes though. Normalization is another area that most people just don't get. There has been many times when I was called in to develop an app that someone prototyped but got stuck because they didn't understand this fundamental concept. Other times professional developers de-normalized for the sake of efficiency and stuck their users with unneeded limitations in apps. Many times the efficiency gained was miniscule. http://blog.pmtechnix.com

kerryjn
kerryjn

There is reason for both surrogate and natural keys. A primary example is if you are implementing a type 6 hybrid slowing changing dimension structure to the database. I believe that a database can exist with both natural and surrogate keys. If the key is simple, primarily belongs to a static reference table then leave it natural. Why complicate things? On the other hand if there is the slightest chance the natural key will change over time, or if you have 10s of 1000s of keys, then create a numeric surrogate key. The join performance will be more optimal on the numeric key over some cryptic alpha key. Just my 2c.

Tony Hopkinson
Tony Hopkinson

defined as someone who erroneously believes they know what they are doing.

Tony Hopkinson
Tony Hopkinson

If half way through the year, having collected loads of data, said student says , that's not my SSN, it's ... Another simple change your design could have struggled with, is wanting to enter some about a student before they had an SSN. How likely such possibilities are is certainly a consideration, but I like to work on the basis that any change I can anticipate is likely to happen, mainly because I was caught flatfooted last time. :(

Tony Hopkinson
Tony Hopkinson

53 weeks in a year was a common manouever as well. Semantic ambiguity, both parties assume they know exactly what a year is. The more common the concept the more likely that there are differences is a good rule. A week with more than seven days is one I've seen as well.

Tony Hopkinson
Tony Hopkinson

Firm got rid of it's own servers switch to a datacentrre. This was the backend to an online web function. Even with a timeout at 900 seconds larger clients simply timed out. Turned out to be a short sighted design optimisation. Basically it executed a lot of small queries. Get all the vendor ids. Build a list of unpopulated vendor objects. Iterate through them, witha details query, which included all the oprdernumbers, iterate through them get all the items, blah blah blah. All I did was flatten it, but that would have been a less optimal solution on the lan...

Oz_Media
Oz_Media

Security needs will vary by company, if you are storing client information (such as financial details)you will need a stronger level of security. If it is just a CRM or pipeline management tool, you can set it up outside 7-11 without a password and it wouldn't make much difference, most of that information is publicly available anyway.

Tony Hopkinson
Tony Hopkinson

A DBA's role is priviliges. Priveleiges are assigned to a user, then a system adminstrator controls who has access to what privilege set. One thing is not the other. I've seen designs where sps were used to implement security, didn't think much of them at all. Wrong tool for the job. Privileges take a bit of effort and thought but they won't ruin your schema. Bypassing the mechanism through reams of overhead and sql code could do so easily.

beck.joycem
beck.joycem

Back in the 70's I designed order processing and stock control systems. Users were usually surprised that I started by talking to management in each department, but then went on to sit with stock control clerks, purchasing clerks, sales order takers etc., and learned about the actual needs of the computer component of the system I was to design. Only a component? The world does not end at the keyboard - a system, to me, included the communication paths, flow of paperwork, working methods and schedules, with the computer system strung between it all, doling it all together. I'd write a Client Requirements document and insist that everybody got to read it and offer comments and corrections before I got down to the actual system design. Now these methods - time-consuming and expensive - seem to be returning to favour.

Tony Hopkinson
Tony Hopkinson

They irritate developers nearly as much as customers.

jasmine2501
jasmine2501

It should not be surprising that NOT doing something will result in a performance gain. However, as you said, this is a really bad idea. It allows users and client programs to break the data itself. We have this problem in a system here - all the DB rules are implemented in the API layer, so guess what? We have broken data all the time.

Tony Hopkinson
Tony Hopkinson

stored procdeures first, and any thing else before I tried that. If someone important comes along and says. "Our customers now need to be able to change the data, I told them it would be easy" Once the tranquilisers have worn off, you are in for a world of hurt.

beck.joycem
beck.joycem

How could he leave that one out? The joy of calling a field 'Name' in Access, when I was first learning it - chaos of hysterical proportions.

chris
chris

This assumes that the application database will always be Access. Right now i'm upsizing an application from Access to Oracle and guess what. Table and column names with spaces are not allowed in Oracle.

Tony Hopkinson
Tony Hopkinson

The ones that really irritate are illegal names. [Order Number] or [Order] Or [In_Con-Sists Ancies] Hyphens are great Select [Minus-One-Contour] now Select Minus-One-Contour No column called minus, one or contour or may be worse still 42, because there are!

g33kspeak
g33kspeak

What's wrong with using underscores in the names? I've long advocated using conventions such as usp_ or _XREF type names.

Tony Hopkinson
Tony Hopkinson

Cookie cutters, possibly, incompetents possibly, requirements that meant any other design would be hideously complex maybe. I love referential integrity. If my code breaks the rules, I fix it, job done. Mind you I've been a DBA as well... Without it, issues run from testers thinking you are a careless pratt, to becoming firmly convinced that little blue men from Rigel just played a practical joke on you. The thing about normalisation and referential integrity, is you can relax contraints with very little effort. Try to retrofit them into an established code base though and you just went high cost, high risk and you are going to have to explain relational database theory to the men with the pointy hair styles. Not worth it.... Denormalisation, and reducing data integrity are the last resort.

RayJeff
RayJeff

I agree with you 100%. And I really didnt have a choice. I had to anticipate because of working with people who didn't know what they wanted, needed or not needed in a database. Your example about a student with SSN issues. That would be something that wouldn't come up. The reason why is that a student who plans on majoring in Education, once they are fully admitted into the program, they have to get a mandatory background check through the state's law enforcement division or SLED for short *lol*. It's a very through check, because the students will be in the schools doing practical learning and directed learning as a student teacher. And if it were to come up that the student had no SSN, it would not be a problem. The reason why is that there the database is one part of an etire syste, that tracks the student form the first time they say they want to major in Education until the end when they have graduated and are teaching in a school. There are checkpoints built-in within the system to check the student's information through the 5 areas of progression through the Education program. The system as a whole is designed to catch errors (among other things). That is one of the main guidelines and requirements of the accrediation. I'm by no means sayng that the system as a whole (not just the database) is perfect. But being as error-free as possible is one of the major goals.

Oz_Media
Oz_Media

I've seen what happens when 'they' get to hold the reins...garbage. It costs a lot more to send people back to the drawing board to start again than it does to investigate properly, as you described.

Oz_Media
Oz_Media

They also pay your salary, as irritating as that may be. No them, no you. :)

TheLinuxGuru
TheLinuxGuru

"It should not be surprising that NOT doing something will result in a performance gain." I meant doing it 'manually' by us scripting it rather than relying on the underlying database features to carry out the same tasks. It's not that we're "not doing xxx" - it's more that we have switched from doing "xxx" via the built-in SQL features (e.g. referential integrity, cascading deletes) to doing "xxx" the long-winded, hand-written SQL procs checking way, which in theory should be slower - but was in fact faster. The logic is obviously far simpler in our SQL procs than everything that the server itself had to validate in the previous incarnation, hence the gain in speed. And yes, it would allow the client programs to break the DB if they chose to bypass our functions and procs that exist in the DB, and manipulate the data directly. Fortunately we have a 2-tier protection against that (the first being that their report writer is only granted read access to their databases) and the second being that there isn't a true SQL interface to the data: everything the end-user can do is parsed via the report writer, which we have full control of. Any read-write bespoke amendments are charged for and coded by ourselves. There is a downside to be honest: we sqldump the data for one of our larger clients, who then import it into a local database and produce reports via Business Objects. It would make building the initial BO universe a heck of a lot easier if the restrictions were in the SQL table schemas rather than in a separate bespoke control file (which we use for our own integrity checks). Guess I should just be happy that this isn't installed on customer's own sites and that they cannot update the data directly, I don't want to contemplate the potential mess if they did!

TheLinuxGuru
TheLinuxGuru

Yes, we have used SQL stored procedures for most of the updates and for the creates. Fortunately, because it is a 'real world' situation, the pools of App Servers are not on the same virtual machines as the SQL Servers, so we did at least get a good chunk of the data manipulation logic into the SQL DB itself rather than following the programmers' preference of coding *everything* into the scripts on the App Servers (hence dragging data unnecessarily across the LAN).

Tony Hopkinson
Tony Hopkinson

Wrapping with [] simply tells the parser not to test the names for legality. Seeing as the GUI does this for you...

hal
hal

Right now I'm building an EDW from a bizarre source system where the developers took a very liberal approach to table and column names. In addition to the embedded spaces, I have to deal with column names like [display type(en)] and [10%taxtype]. Maybe my T-SQL code can deal with them but my ETL toolset is going nuts. Insane!

Tony Hopkinson
Tony Hopkinson

Camelcase on a case sensitive database is just as much fun. Consistency is all I desire, except if it's spaces, having to surround everything in the chosen delimiter is more irritating than anything else.

mattohare
mattohare

That's been my approach. In a large environment, there would be different systems for production and decision support. Smaller environments (like a department/division of a bank in my case) it may seem that separate systems would be overkill. So, where there was a need for complex links that would bog down the system, I put in some stored procedures that would keep a separate set of decision support tables up to date. That kept the production system fast (the stored procedure to update the DSS table acted very fast) and the decision support system fast (use two or three tables instead of a dozen). My philosophy was to make a normalisation strategy that suited the purpose.

Tony Hopkinson
Tony Hopkinson

It's one of the crosses we have to bear. Cheap numpties with delusions of competence. You'd think after a while, businesses would learn a lesson and stop employing numpties to design it wrong and then professionals for the horrendous bodges to make it work. But what do we know we are just propeller heads.

RayJeff
RayJeff

BOTH. He neither understood the data nor (and) understood its purpose. And I'm not saying this because of the fact that I hated working with him anyway :-( And I'm sure he didn't test the queries before giving them to me. I know because the database was on one of the college's servers. The only time I did any work with it (the 2nd database) was when I updated the data from the 1st database (the one I designed) nightly before leaving work. I would check the timestamping to see for the tables and queries and usually the last timestamp would be from me from the night before or if I did something that day. The only time the IT guy had anytime of contact with the data or the faculty was through me when we had our "weekly meetings". I did all of the work...correction; I DID all the work. All of the system analysis; all of the interviews with the faculty, of not just within my division, but with the 4 other divisions as well, since the database needed the data of the other academic programs outside of Education. I did all of the grunt work. Already had a working database for him to use to create his version. Mind you, because of developing a database of that size and importance was something I never done before, I always was working with consultants that was provided by my supervisor. Consultants who either had experience working with schools goi9ng through the accreditation process and actual database specialists who developed similar databases for the schools they are employed with for the accreditation. Plus, I was getting the databases (mine and the other one) peer reviewed by other database specialists. And without fail, when the first database was compared to the second, everyone said that the second was very poorly designed. And not to say that the one I designed was perfect, but under the circumstance (me not having enough time to before the accreditation date), the database was fine. All and all, it all comes down to me being able to pull off designing a working database that got the job done within a very short period of time, when the IT guy had 3 years before I was hired for my job to do all of the analysis; to get an understanding of the data that was needed to go into the database.

Tony Hopkinson
Tony Hopkinson

or it's purpose? If a query takes hours to develop, (as opposed to run, which might the result of other more necessary design compromises), someone forgot why the database existed in the first place. To use...

RayJeff
RayJeff

The data and understanding how it meshes with the direct users of said data is ALWAYS important. I had that understanding because I worked with the professors in the division where the data was coming from. I could go to them and vice versa to discuss the requirements and the needs of the data and the database. And likewise, influenced how the database design. I had to "worked" with an IT person on the database. They took all of the design info from the database I designed and created a "new" database. And while you and others have mentioned that the design might look textbook, in reality it doesn't work; well, same here. For a while, I thought I was looking at a textbook example. But, for what that database was designed for, it just didn't work. The ultimate example was one crucial SQL query. It was a major query as it gathered data from the majority of the main info tables. In essence, the query returned no data. That same query that was used in the database I created worked fine. What makes this example so important was that the database was being presented on the last day of a week-long visit by a group of examiners. The examiners were on-campus because the program in the division I worked for was being reviewed for initial accreditation for a national body. The database was the last part of review, as the examiners needed to see how data academic, faculty and college-wide data was being storage, aggregated and reported. Came in that morning to do the final checks of the database and that particular query didn't run. What it came down to was that I had 8 hours to fix the query before the examiners came in to review it. You would think that the IT person would be the one to fix the problem, but they didn't-I wonder why? Everyone, from my supervisor, the division chairperson, even the IT person was on pins and needles, coming by almost every hour to check my progress. I don't even remember how I fixed it-I probably was able to modify the query to run like the query in the other database. But, I did with a little bit of time to spare. But, it' goes to show that if the IT person had an understanding of the data, then there wouldn't been a need to create another database and there wouldn't been a major problem to come up in the 11th inning of the game.

Tony Hopkinson
Tony Hopkinson

The business heard MySQl is free, Tony's twiddling his thumbs, so he can reinvent all the wheels and we'll not spend our ready cash on your preferred off the shelf solution... It's not daunting, it's just a lot of resource to expend before you can start adding value. If you can make them change their minds, more power to you, doesn't always happen though. So do you like your wheels square or hexagonal, round is no longer an option.

Oz_Media
Oz_Media

Asset management software is available off the shelf for business, Red Beam for example. No such thing as double selling a single item. You either get it and are told so right away, or you don't, no need for an IT department to code anything there. It just integrates it with the database (which is so bloody easy!) and we're all done. As for selling what hasn't been done yet, I've seen that too, it's the fault of the sakes rep and his mistake to clear up with the client, nothing to do with IT again. It's actually a fairly common issue with bad sales reps, often they don't last though, nobody needs such headaches and possible law suits. Sorry but I've seen the issue you raise, and have addressed them and move don too, didn't need IT or database changes in any way at all. These are rudimentary issues for any sales office, perhaps daunting 20 years ago though.

Tony Hopkinson
Tony Hopkinson

Sales or any other 'client' say they need this, but business sets the constraints. As a for instance. All the sales reps, handheld update a common data base. Now it's quite possible that Tony and Oz could gabble away and convince two different buyers to buy the special edition vinyl Take That picture disc box set that some tasteless idiot stocked. So we both hit the button and mark it as sold. We can do this because sales did not want exclusive locking. ie one of us grabs it before actually making a purchase just in case, leaving the other gabbling for no reason. Of course one us and hopefully the buyer is going to be disappointed but OZ who's better at this gets the commission and Tony who's really a developer at heart goes back to what he's good at. Now if you want optimistic locking, in sqlserver it's easy because support is built in. In say Mysql (last I checked anyway) you have to invent it yourself for every table in the database... All too often constraints are set without any reference to requirements. And of course the other requiremenst from sales is they want it yesterday. Worse still Oz being an entrepreneurial type of formidable repuation has also sold the software, which we haven't actually built yet and it doesn't work like he what he said when he sold it... On a heavier note. If you are building an MIS system which is probably the most likely product to mutate as it's designed and used, cast as little as possible in stone. Get the basic entities right. eg, reps products, customers and prices. Attach useful properties. The you can report products sold by rep. If your properties are deliveryid, orderid and invoice you can link and mine those systems. Do the minimum possible to get some high value accurate data, whatever you do don't try and collect everything in one go. The general screw up in this sort of system, is replication of existing data in another format in a 'lossy' fashion. Lots of data, no useful information, which is where you are now.... Never done an MIS system for sales, but I have done a fair few others. One of my rules is no data is better than bad data. Building one is an iterative series of tweaks if you have a solid foundation. Then when the sales manager wants say a report on how effective his regional boys are to set targets, that report might not yet exist, but the information does... Concentrate on correlation properties Fred sold this at that on Fri Bill sold those at this on Fri Now you have Sales by rep Sales on Friday Sales by product Actual versus expected of you have a price book and you use a property to link it. Probably lots more stuff you could get out of it. The trick in the design is to collect accurate data that makes analysis are reporting which are your value propositions cost effective. Another tip is to set up the collection first. Get something of value to show in your front end. Spending all the money building it and then having to wait months to have enough data to get the value out it, won't win you any points with the bean counters. Basically phase your requirement so your ROI kicks in as early as possible, but don't engineer out future values. No one ever gets this right first time becuase right will change, try not get anything horribly wrong first time is a better way to go.

Oz_Media
Oz_Media

You write code, sales reps don't. Sales reps and administrators will tell you what data needs to be tracked, searched, managed. they will tell you what fields are imperative, which can go on a tabbed page. hey will tell you when NOTES fields are too short or too long, don't wrap properly or can't be searched. They mine the data after all, they need to be able to mine the data that THEY want to see. I've seen databases which were designed by IT, sure they physically work, more often than not, but the content is garbage, almost not even worth tracking. I am having just that issue now with a company I have joined as a national sales manager. I can't get sales staff to produce reports and data that I need to succeed because the last IT threw it together himself without any real research as to what is needed, though I don't think such guidance would have really helped at the time anyway. Either way, I now have to convince a company to invest tens of thousands right out of the gate so I can be effective at my job. I am compiling stats the hard way, so far the owners are shocked at how the data is actually crunched and what they can find out about the needs and advantages of their sales teams. It's a hard way to make a business case because the data simply isn't there to be quickly compiled in a click or two, which essentially is making my case the hard way.

Tony Hopkinson
Tony Hopkinson

a pile of crap and that of course will be my fault!. :p But I was referring to those DBA types who produce classroom schemas. Look great on paper, die on their arse in the real world.

Tony Hopkinson
Tony Hopkinson

comes from the foolish use of access as learning tool. Because in that the data gets sucked down anyway, and the database driver is optimised to take advantage of it. So they get soon to be poor habits and practice burnt in early....

Tony Hopkinson
Tony Hopkinson

Developers are notorious for taking potshots at their own feet. If they had to do what you are doing, they'd have learnt a valuable lesson. Mind you SQL knowledge is patchy amongst developers, may be they just didn't know they could do Select DisplayType as 'display type(en)' ...

mattohare
mattohare

The decision support that was de-normalised was just along for the ride. The application got quite complicated with all the business requirements. We needed long lists of things for the calculation (Budget, Actual, etc.). The Decision Support tables would summarise the long lists for reporting and analysis. I have to say that the application was quite fun except for the 2-hour meetings that ended up with 'ingore all that we discussed here, we'll have to try again'.

Tony Hopkinson
Tony Hopkinson

design, starting with one, well that's an issue. :p If that's what you want or need, why in Cthulu's name are you using a relational database in the first place?

Editor's Picks