SolutionBase: Deploying your data warehouse on the Web

You've got a lot of information about your organization stored in databases scattered everywhere. By placing a data warehouse on the Web, you can make information easily accessible to employees as well as potential partners. Here's how.

Most organizations don't realize the value of their historical data to outsiders (partner companies, marketers, parts suppliers, etc.). The prevalence of the data warehouse, however, along with the need to examine past successes and failures, and the demand for enhanced business-performance monitoring, make it inevitable that organizations may want to extend the availability of their data warehouses to the Web.

If you present the idea of a Web-accessible data warehouse to your clients, they may well go for it. But you must first have a plan to design and deploy it.

What will you do differently?

First, the change of venue is everything: A warehouse on the Web is, by definition, a different creature than your client's in-house vault. Why? For the same reason we have a Web (and, for that matter, an interstate highway system) in the first place. We want anybody to be able to get to it from anywhere. "Anybody" and "anywhere" mean that you must follow some new rules in the design and deployment of the warehouse.

Broader usage means a different granularity

More users, particularly remote users who aren't part of your client's organization, probably mean that the data in the warehouse will be put to a broader and less focused array of uses than would be the case in-house. In particular, the external user will have different ideas about the time periods from which to draw and the possible levels of summary detail.

While you can determine that the in-house user community will be satisfied with certain historical records summarized on a monthly level, it could well be that your remote users will desire data more highly resolved on a weekly or daily level. Or, the local users might never be interested in product sales summaries by anything other than region. But if the warehouse is open to the manufacturers who provide your client with product components, those manufacturers might be interested in summaries of activity on specific products by altogether different criteria.

The take-home lesson? In order to be useful to outsiders, your Web warehouse may need more data, aggregated by factors other than those defining the in-house warehouse.

Summarize on the server side

The computational intensity of mass data summary is a key concern in warehouse design and is one of the reasons that predetermining appropriate granularity is drilled into warehouse designers early on. Warehouse data users want a short-as-possible jump from the initial summary level to the level they desire.

This will be more difficult to achieve in a Web warehouse because you need to leave more possible levels of summary open to your potential remote users. So it's likely that there will be more, not less, subsequent summarizing going on before warehouse data is properly framed for a user's analysis.

For this reason, your default practice should be to perform these summary operations on the server side of a Web transaction. There are two consequences, both positive for the remote user. First, this will minimize the amount of data that must be moved across the Internet to the client by eliminating all detail records not of interest. Second, it will minimize the computational burden on the client once the data arrives. The flip side is that, in your design, you must allow for this computational burden on the server.

Perform analytics and metric operations on the client side

Once you've summarized the level the user desires, all the analytical operations to be performed on that data should be handled by the client system. Why? To begin with, the client has cycles to burn, and your server doesn't. It's doing its part by handling the summary phase. Analytics consume only the smallest fraction of computing power that vast summary operations consume.

In addition, analytics and performance metrics are often variable "what-if" frameworks that are, for all practical purposes, roll-your-own applets. To perform them on the server side would mean offering users a more static set of possible analytic operations (which wouldn't please them) or building in the tedious step of moving that code from the client to the server for execution, which would be silly.

Is there some money in this idea?

Establishing an in-house warehouse is great for enabling detailed performance metrics within your client's walls. But outside users may want to get their hands on the same data for different reasons -- and they may be willing to pay handsomely to get it.

I consult in one of the domestic brokerage industries, and I cater to a client base that has passively gathered marketing data in every sales region of the continental United States for a decade. These brokers haven't been doing anything with that data other than comparing this year to last year, etc.

The value of that information to the manufacturers that these brokers represent, however, is incalculable. Analytics performed on this data by the manufacturers, if the brokers would pool it and make it available, could reveal volumes of secrets about the fluctuation of sales on thousands of products and what factors influence those fluctuations. The manufacturers would pay handsomely for that data, and it would be a bargain at any price. Discussions about how these brokers might work together to pool their data, and how they might profit by making it available via Web warehouses, are underway.

Are you in a similar position? Could a Web warehouse become an important strategic advantage to your partner companies, and would they pay for it? You won't know until you ask. First consider how to build such a warehouse for yourself, and then sit down with your partner companies and work through all the possibilities.