As consultants, we tend to concentrate on storing data securely and efficiently when designing database applications. This is especially true when an application is brand new.
But those in management and executive positions concentrate on what the stored data can reveal about their company and how they can use that information to steer the company toward higher returns for stakeholders while outwitting the competition. Here, we'll look at the differing requirements of data entry and report generation. We'll also look at some of the design strategies and compromises we can choose to address these differences.
Shorter reporting cycles
Before IT began to change the way businesses work and compete with each other, annual reports were the norm. It was considered adequate to look over a year's worth of accounting transactions, do some ratio analysis, and then plan to improve things during the coming year.
With the IT revolution, information technology was seen as a way to streamline day-to-day operations and facilitate the annual reporting process. As managers and executives realized the importance of the data collected by their new computerized processes, they demanded a look at the data quarterly, then weekly, and then daily, bringing us to what is now called the real-time enterprise. In the real-time enterprise, the reporting cycle has reached zero length; all information about the state of a business is available all the time.
Well-designed reports are necessary to make this information usable. With proper user training, the ability to create ad-hoc queries and reports on the fly extends the power of this information even further, creating added value for your clients.
The conflicting needs of reporting and data entry
One of the mantras of database design is "index all fields that will be searched regularly." The object is to speed up queries that reference those fields. At the same time, we're cautioned that unnecessary indexing will slow record insertions and updates because all the indexes must be reshuffled as data changes.
A typical compromise is to use a minimum of field indexing in busy tables, such as those used for order entry, and to index everything in tables that will be used regularly for analytical purposes, such as customer information. But what happens when someone in management wants to analyze customers' product ordering habits?
A client may have thousands of records in a nicely designed Order table. Columns would probably include OrderID, OrderDate, and CustomerID, along with all the customary indexing. Though perfectly structured, the information contained in such a table is all but useless to management in this form.
If the database follows the classic Order/Order Detail model, meaningful queries such as which customers spent the most money, where most customers are located, or which products generated the most orders on which days of the week or month will all involve joins with two or more related tables. Complex SQL commands designed to sort or group order information and relate it to customer and product information can slow to a crawl and might also affect ongoing data entry tasks.
A common solution is denormalization, in which that carefully crafted database schema is tossed in favor of increased performance. An example is the need for users to see information such as month-to-date and year-to-date sales information on the same screen as customer information. The argument could be made that this is information about orders, not customers, and rightly belongs in a different table.
Viewing the order and customer information together would therefore involve joining the two tables via a SQL command stored as a view or query. Even with today's fast processors and optimization tools, surfing through a series of customer records on-screen necessitates a lot of network activity. Denormalization in this case would entail adding MTD and YTD fields to the Customer table so that these figures are always available quickly and without executing joins.
The downside is that these fields must now be updated programmatically by way of triggers or stored procedures, where available, or by way of client-side code. Safeguards such as transaction-based rollbacks must be put in place along with automation of chores such as periodic "rolling" of monthly and yearly figures.
How is a consultant supposed to give frontline customer service people fast data entry and updates while at the same time giving management instant reporting at their fingertips?
Lots of data can be good and bad
As a newly deployed database application is used, the amount of data it contains grows. Unless data has been migrated from a legacy application, it may take some time before the new application contains enough data to be useful in trend analysis.
For example, if you're gathering sales data, a week's worth may show that Wednesday is the biggest day of the week for sales volume. Until a few Wednesdays have come and gone, it would be impossible to conclude that this is a continuing phenomenon and not just a random event.
As the amount of data collected grows, allowing more meaningful reporting, the insertion and update performance of large, heavily indexed tables, as well as queries that depend on multiple joins and grouping, can begin to degrade. It can seem that your perfectly executed database plan has become a victim of its own success. The database application that was praised for its speed when it was new and contained only a few thousand records may be criticized for its slowness a few months later when its tables contain hundreds of thousands or even millions of records.
But how much data is enough, and how much data is too much? One of my clients is a registered charity that operates a telephone help line that is logged in a database. Only now, after accumulating almost five years' worth of call data, does the client feel confident about using that data to help secure future funding requirements based on quantities and types of calls handled.
To another client, which places people in the financial industry, information about candidates and available positions that is more than a month old is almost meaningless. If a client with an order-processing database had 100 regular customers who placed an average of two orders per week, the Orders table would accumulate 10,000 records per year. If each order had an average of 10 line items, the Order Details table would accumulate 100,000 records per year. How could such a client, even with these relatively modest data storage requirements, hope to look back at 10 years' worth of data in a trend analysis report?
The case for a separate reporting database
Long-term storage is just one of the compelling reasons for considering the creation of a reporting database. Records can be appended to the separate reporting database from the "production" or live database immediately using database triggers, when available, or programmatically from the client application.
An alternative is a batchlike daily (or any other suitable time period) append. Records of business transactions such as product orders can be stored indefinitely in the reporting database for future reporting and long-term trend analysis. At the same time, records more than a year old can be removed from the production database to keep it trim and responsive. Defunct customers and discontinued products that are stored in the reporting database can be removed from the production database without losing related order records.
A couple of my clients with smaller databases didn't feel the need for a separate reporting database. They both ran into a similar problem. They wanted to delete former employees but soon realized they could not do so without losing related order information. As a less-than-perfect workaround, I had to create a Terminated status for employees as well as some code to remove their ability to log in to the database. Drop-down lists and searches had to be modified to filter out employees with a Terminated status. It would have been much simpler to be able to move them to their rightful place in a separate database of archived information.
By controlling the number of records the production database contains, not only will you see a performance boost in the database, but you'll get an added performance bonus from the fact that no long, complex queries are being run against the data by management reports while records are being added or updated. No one will be attempting to run a report against records, pages of data, or even entire tables that might be locked during updating.
The reporting database can be simply a copy of the production database in structure, but the real advantage of having separate production and reporting databases is the ability to design one for fast data entry and the other for fast data retrieval. Tables in the reporting database can be denormalized much further than those in the production database. Order records, for example, might include customer name and address information.
A long-term relationship
From a consultant's perspective, perhaps the best argument for a greater emphasis on reporting is the chance to develop a long-term relationship with your client. As data accumulates, clients will see more ways that it can be used to help guide their strategic planning. Each time they do so, they'll see yet another return on their original investment.