Big Data

Use this architecture to structure your business intelligence solutions

Business intelligence gives organizations a tactical advantage by transforming operational data into high-value information and putting it in the hands of the right people at the right time. Read on to learn about how BI solutions are commonly structured.

Despite big spending on information systems, many organizations are finding there’s still a shortage of truly useful information, due to the fragmentation of data. How can organizations resolve this “Information Paradox”?

Business intelligence (BI) solutions can put the information pieces back together to provide businesses with a more meaningful, complete picture. In this article, I’ll describe the conceptual architecture commonly used for BI.
Earlier this year, we published Dan Pratte’s “The intelligent organization: An introduction to BI,” followed by “Is your business intelligence suffering from the Information Paradox?” In the latter, Pratte described the causes of the “Information Paradox,” which occurs when despite spending huge amounts of money on information systems, an organization still finds itself with little useful business information. He cited fragmentation as a principal cause, owing to the traditional approach of chopping up information along application lines, which produces results that are “data rich” but “information poor.”
The common conceptual architecture diagram
The conceptual architecture common to the business intelligence solution is depicted in Figure A below. While all deployments don’t function exactly like the system seen below, this architecture is becoming more and more common.

(Note: We will use the term “data warehouse” in this article, but the architecture is very similar for data marts as well.)

Figure A
This is the conceptual architecture common to the business intelligence (BI) solution.

Source data
Source data comes from one or more operational databases and sometimes from third-party data, like census or industry data. Source databases are most often relational; however, delimited text files and spreadsheets are common as well.

Extraction, transformation, and loading (ETL)
ETL describes the processes used to copy data from source databases to the data warehouse, but it’s more complicated than simply moving bits. Quite often, data is transformed and validated on the fly. The end-to-end ETL tasks are combined into “packages,” which are scheduled to run automatically at preset times. Microsoft’s Data Transformation Services (DTS), part of SQL Server 2000, is an example of an ETL tool.

The data warehouse
Transformed source data is consolidated into a single relational database called a “data warehouse.” The warehouse physically resides on an industrial-strength relational system, such as Microsoft’s SQL Server, Oracle, or IBM’s DB2.

A data warehouse contains read-only data depicting the state of an organization’s information at regular points in time—weekly, daily, or even hourly. Keep in mind that data quality is especially important. You’ll be wasting time and money if end users have issues with the credibility of your data.

Another important consideration is query speed. Frustration results if many seconds pass between submitting a query and displaying results. Relational databases don’t always respond quickly to complex queries, so multidimensional “cubes” are increasingly used to bridge the gap, which we’ll discuss in the next section.

The cube
A cube is a complex, efficient, and proprietary data structure that includes data and data aggregations (precalculated summary information), as well as security information that controls who can access what. Cubes are lightning-fast when responding to complex queries—at least compared with relational databases. Also, the data within cubes is almost always compressed to reduce physical storage requirements.

A multidimensional cube may contain tens of millions of records (an individual scan at the supermarket is a record, for example) and may reach several gigabytes in size.

Finally, cube updates are often performed nightly as part of the ETL package so current information is available first thing in the morning.

End-user tools
A variety of applications is available to meet the requirements of end users, or you may write your own if you so choose. Client applications fall into one of seven areas:
  1. OLAP. This common term stands for “On-Line Analytical Processing,” meaning the end user has direct—or “on-line”—access to the cube by way of PC-based analysis software. Browsing the information freely permits spontaneity that makes spotting trends and relationships easier. Better client applications combine an intuitive interface with rich visuals designed for rapid comprehension.
  2. Static and live reporting. Static reports give you a view of information arranged in a predetermined way—sales by month by region, for example. Different views, say sales by product category, require an expensive special report. BI systems make it fantastically simple to generate special reports.

    Live reports allow end users to interactively manipulate information and drill down to more granular levels, in more or less predefined ways.
  3. Balanced scorecards. Scorecards represent a remarkable technology from the minds of Robert Kaplan and David Norton of the Harvard Business School. Scorecards emphasize frequent and timely relevant measuring of individual and team performance against key financial and nonfinancial objectives. Scorecards directly reinforce strategy because performance measures map directly to your organization’s strategic initiatives.
  4. Budgeting and forecasting. In most organizations, budgeting is traditionally painful, tedious, and distracting. Your accounting team painstakingly assembles and then distributes budget packets to departments and branches throughout the organization. Each department then spends many additional and painful hours completing the packets. Later, the packets are collected, consolidated, and reviewed. Often, this cycle is repeated another time or two before the budget is finalized.

    BI-based budgeting offers several advantages: better forecasting, faster consolidation, and the ability to analyze the up-to-the-minute consolidated view using the rich tools of OLAP. Translation? Better budgets—faster and more in-line with business realities.
  5. Data mining. Here the objective is to recognize patterns and relationships not apparent through simpler analysis methods. Data mining models generally describe buyer characteristics (which group is best targeted given a specific product) or predict a dependent value (which product is best targeted given a specific group). Data mining is the core of high-efficiency database marketing.
  6. Exceptions and notifications. When key performance measures are out of line, software agents instantly take notice and take immediate action. Traditional systems require that humans first observe and comprehend out-of-bounds facts before action is taken. Advanced systems allow end users or managers to link events with appropriate notifications. For example, if non-billable overtime hours were greater than 2 percent of total hours, Joe and his boss will receive an e-mail, a fax will go out to Sue, and Fred will be paged.
  7. Business process input. Human beings traditionally carry information from process to process. Now we’re designing information systems that will access information without human intervention. An automated purchase-order system may receive information directly from the BI system to establish a reorder quantity based on an analysis of what’s in the sales pipeline and possibly other factors. Because humans aren’t involved, value-chain processes move faster, better, and at a lower cost.

You’ll do well to remember that your users have different skills and requirements. Seek to facilitate rapid comprehension of the system. Often, I prefer to carefully consider the critical few measures and present them like the dashboard of a car, communicating high-value information at a glance. This dashboard scenario offers me the capability for direct-access drill down when high-fidelity detail is needed.

In upcoming articles, I’ll discuss in more detail these areas of BI conceptual architecture, which have only briefly been considered here. I’ll use that opportunity to explore some tools and technologies—beginning with Microsoft Analysis Services (part of SQL Server 2000)—converging to place BI solutions in the mainstream.
We’d like to hear your questions about business intelligence. E-mail us your queries, and we may address them in future articles.

Editor's Picks