Decision support and performance-clarifying analytics aren't all there is to data warehousing. Your client's oceans of old data could have the obscure secrets of past successes and failures churning beneath the surface and awaiting discovery. So, how do you find this treasure?
If you read the literature on trend-hunting in data warehouses, it all sounds very reasonable and even methodical—a "How-To-Beat-The-Slot-Machines" sort of logic that promises riches from nowhere if you simply take the right steps in the right order. However, in the real world, the literature is a recipe for frustration, which many consultants have had to learn the hard way. There is no random stroll leading to a pot of gold. You must have some idea of where you're headed before you get underway.
In a follow-up article I will look at methods of analyzing your search results, and ways to spot data relationships that yield business improvements.
Make a wish list
If the point of the exploration is to identify performance-enhancing trends, then your natural starting point is to know where your client's performance needs to be enhanced. Before you begin exploring for intelligence that you can apply to the client's business processes, you need to assess which processes need enhancing, and have specifics in mind when you start out—a wish list. Then you and your client will be primed to spot something useful when it surfaces.
Suppose your client is a manufacturer of widgets, and they have a high quality rating and correspondingly low failure rate, but their competitor's numbers are even higher. You also notice they have random failures of widgets in commercial use, say one out of every two hundred, and the failures seem to be random. Are these failures truly random, or is some unknown factor at work? Finding this out is something you want on your wish list. You'd dig into their warehouse data to explore the possibility that something in their manufacturing process needs fixing.
Perhaps your client is simply trying to increase its market share, so you want to look at its customer base. With its warehouse data on customers and sales, you can analyze different ways to categorize its customers. You're out to explore which features of its products or services appeal to various characteristics of the client's customer base. No single database or application can give you this, but warehouse exploration will.
We'll return to these examples to see how warehouse data analysis might deliver answers, but the point here is this: Have an idea of what you'd like to see happen in your client's business when you begin exploring for ways to improve it.
Setting up data for exploration
Once you know, in general, what you want to accomplish, your next task is to set up the old data for search and analysis. Here's where a big mistake can occur (and I know because I made it). Data is configured in the warehouse for performance analytics and other regular uses that evolve over time. However, a warehouse also batches together data items by category and type in well-understood relationships. Unfortunately, you want exactly the opposite: to compare data items in hand to other items that do not have a well-understood relationship to your key item. By definition, you're not sure what data items these might be. That's why you're exploring.
Where is the big mistake? By failing to isolate data items of unknown influence from those with well-known relationships in comparing with your key item, you confound your results. This is pretty close to straightforward scientific inquiry: You must rid your comparison from possible confounding factors, as far as you possibly can. Failing to do so will give you false positives—relationships which seem to influence events in your client's company, but really don't. You'll see that you can send your client down some pretty destructive roads if you're not careful!
A step-by-step approach
Here is a guide to help you set up your searches:
- Understand that the successful result of this search is to find data items, previously thought to be unrelated, that rise and fall (or otherwise change) in tandem with one another. That is, you're looking for data items that clearly change in a way that demonstrate a relationship between them. This is the functional goal of your data exploration effort.
- Hone in on your key item(s). If the goal is to connect the fluctuation of a performance-oriented data item to the fluctuation of an unknown data item, you have to have that known item to get rolling. In the first example above, the known key item is faulty widgets, graphed by date of manufacture. In the second, it's customer characteristics, by age. There are many such items attenuating your client's business performance: profitability, market share, order fulfillment cycles, manufacturing costs, advertising costs, seasonal sales, commissions, lead times, turns-per-quarter—pick a few and work with them.
- Consider that most warehouse explorations are time-oriented; that is, you are asking how processes have improved, or failed to improve, over time. How have revenues fluctuated over time? It's important not to set this period too long or too short, and it can be different for every problem. When setting out to search for ways to improve your client's business, make time frame a central factor in your search. Does significant fluctuation occur in your known data item in a day, week, or quarter? Whichever time frame you choose, set that interval as the one you will study, and collect enough data to cover enough intervals to spot meaningful trends in the fluctuations. Also important is to decide on a meaningful measurement and make that it is the basis of your comparison.
- Check yourself before beginning to explore. Get your graph of key-item-over-time, in the right increment and across a sufficient period of intervals, and then do a comparison against some factor you know to be influential. For instance, if you're mapping sales of a specific product across the previous four quarters, you should be able to do a comparison graph against the dates of recent advertising campaigns and see some correlation between sales of the product and the days the promotions were in effect. In this way, you can verify that the tools you're using, the data you're pulling for comparison, and other factors are all effective.
Time to dig
Now it's simply a matter of digging in and exploring. To do the job thoroughly, you'll interview the relevant users to learn what factors should influence your key items, and in this way, have an idea of how to leave the path in order to begin sifting through variables that are unknown. You won't know that you've hit on something until you see it. The task will require creativity, diligence, insight, off-the-wall thinking from your client, and a bit of dumb luck.
Examples at work
How well does this work? Let's look at the possible results of such relationships in our previous examples. In the manufacturing problem, you have an unexplained and seemingly random occurrence of manufactured widgets that fail in the field. You go into your data, with your key item being the date of manufacture. You graph the occurrence of faulty manufacturing across time, and see that there are days during which several faulty widgets emerged from your client's factory in batches. But comparing these dates to the quality checks on components shows that no faulty components correspond to your graph. In your focused search, however, you looked at factors that seemed unrelated—including the physical plant's environmental logs. Bingo! The dates of faulty manufacture were smog alert days; air purity was at a low, and you have reason to investigate whether heavier air filtering in the plant can reduce the number of faulty widgets.
In the market share problem your analysis of customers' personal characteristics across time shows gaps; you're losing customer loyalty at various points in the aging spectrum. Your marketing people already know this, but when you look at these characteristics, you notice that in each group of loyal customers, there's a high incidence of style sense – surveys show that your client's customers care as much about style as they do about function in your client's products. And in matching these characteristics against the client's product lines, you see that there are drop-offs in customer loyalty as they hit these various gaps in age. There are few products to appeal to the customers' style sense as it evolves. Loyalty among customers age 18-24 is high, but then it vanishes, and the graph picks up again in the mid-30s. Now you can offer your client a business strategy: work on products that cater to customers in these transitional years, so that brand loyalty is cultivated as customers age.
Scott Robinson is a 20-year IT veteran with extensive experience in business intelligence and systems integration. An enterprise architect with a background in social psychology, he frequently consults and lectures on analytics, business intelligence and social informatics, primarily in the health care and HR industries.