Software interacts with users, who are typically engaged in entering or maintaining data related to a business process. An interesting problem arises when developers are designing a system and attempting to determine how much information is needed, at minimum, for a user to move through the process. I’m not talking strictly about maintaining data integrity, although that’s part of it. I’m referring to information that should be captured for business reasons—reasons that are determined by business people.

Two extreme viewpoints
One school of though says that after data integrity needs are met, you should require only data that’s necessary to feed any other systems that receive information from your application. Now, certainly you have to meet data requirements for those other systems, but why should you stop there? Blindly following this approach can result in a few potential problems, the foremost being that your software won’t do everything to help the user perform a task. Instead, it will trust (or instruct) the user to do it. You also run the risk of invalid data if some new business process is implemented that must operate on data already contained in your system—data that lacks some crucial information your new process needs.

You can also go to the other extreme and enforce every imaginable data requirement. This isn’t an optimal approach, either. Here, you run the risk of creating software that’s simply too inflexible to use or that features oddball requirements like forcing a user to enter a first name for a business. Besides, you’re also tying down future programmer hours, should any of those data requirements change.

How do you determine how far to take business data requirements? Such a decision is usually a tradeoff between programmer time and someone else’s time, but what strategies or guidelines can you employ to help make this decision?

A behind-the-site illustration
Here’s an example taken from TechRepublic. Articles that appear on the TechRepublic Web site are processed through a content management system (CMS) that controls the flow of articles through our editing process. It also provides a way to specify metadata that affects how the Web site behaves when you view an article. This metadata enables the display of things like author photos and column signatures. Among the metadata items we specify for every article are search keywords and links to sites that relate to the article topic. Both of these are procedural data requirements for us but for different reasons.

Keywords are required because we use them for relational behaviors, such as promotions for our TechMails and for aggregating articles on similar topics into collections. Because these pieces of information are required by a client application down the road from it, the CMS won’t allow us to get an article into the system without at least one keyword.

On the other hand, the CMS doesn’t enforce certain process rules, like including links to related sites. These links are chosen by our editors to ensure that the sites are relevant to the issues an article discusses. Since the CMS will happily accept an article with no associated sites defined, our editorial staff makes sure that links to related sites are included, if appropriate, before publishing an article.

Links to external sites can be valuable but are not always helpful or available, so enforcing their specification with the CMS would require a continuing investment of programming hours to develop the necessary procedures for optional inclusion. Therefore, we decided not to enforce this process rule programmatically and opted to rely on user training and human inspection by editors to selectively enforce it, trading development time for editorial time. Editorial time is cheaper than development time, and the absence of related sites doesn’t affect site functionality, so we have the editors enforce that rule.

Finding the middle ground
You have to locate a happy medium somewhere between the “enforce it all” and the “get only what you positively need” extremes, but is there a litmus test to tell you where that medium lies? Cost analysis, like in our CMS example, is one that should immediately jump out at you: Does the potential cost of having to regularly change this data requirement outweigh the cost involved in having that data periodically not supplied, including any human inspection to detect an omission?.