Decision support and performance-clarifying analytics aren’t
all there is to data warehousing. Your client’s oceans of old data could have
the obscure secrets of past successes and failures churning beneath the surface
and awaiting discovery. So, how do you find this treasure?

If you read the literature on trend-hunting in data
warehouses, it all sounds very reasonable and even methodical—a “How-To-Beat-The-Slot-Machines”
sort of logic that promises riches from nowhere if you simply take the right
steps in the right order. However, in the real world, the literature is a
recipe for frustration, which many consultants have had to learn the hard way.
There is no random stroll leading to a pot of gold. You must have some idea of where you’re headed before you get underway.


In a follow-up article I will look at methods of analyzing
your search results, and ways to spot data relationships that yield business

Make a wish list

If the point of the exploration is to identify
performance-enhancing trends, then your natural starting point is to know where
your client’s performance needs to be enhanced. Before you begin exploring for
intelligence that you can apply to the client’s business processes, you need to
assess which processes need enhancing, and have specifics in mind when you
start out—a wish list. Then you and your client will be primed to spot
something useful when it surfaces.

Example one

Suppose your client is a manufacturer of widgets, and they
have a high quality rating and correspondingly low failure rate, but their
competitor’s numbers are even higher. You also notice they have random failures
of widgets in commercial use, say one out of every two hundred, and the
failures seem to be random. Are these failures truly random, or is some unknown
factor at work? Finding this out is something you want on your wish list. You’d
dig into their warehouse data to explore the possibility that something in their
manufacturing process needs fixing.

Example two

Perhaps your client is simply trying to increase its market
share, so you want to look at its customer base. With its warehouse data on
customers and sales, you can analyze different ways to categorize its
customers. You’re out to explore which features of its products or services
appeal to various characteristics of the client’s customer base. No single
database or application can give you this, but warehouse exploration will.

We’ll return to these examples to see how warehouse data
analysis might deliver answers, but the point here is this: Have an idea of
what you’d like to see happen in your client’s business when you begin
exploring for ways to improve it.

Setting up data for exploration

Once you know, in general, what you want to accomplish, your
next task is to set up the old data for search and analysis. Here’s where a big
mistake can occur (and I know because I made it). Data is configured in the
warehouse for performance analytics and other regular uses that evolve over
time. However, a warehouse also batches together data items by category and
type in well-understood relationships. Unfortunately, you want exactly the
opposite: to compare data items in hand to other items that do not have a well-understood
relationship to your key item. By definition, you’re not sure what data items
these might be. That’s why you’re exploring.

Where is the big mistake? By failing to isolate data items
of unknown influence from those with well-known relationships in comparing with
your key item, you confound your results. This is pretty close to straightforward
scientific inquiry: You must rid your comparison from possible confounding
factors, as far as you possibly can. Failing to do so will give you false
positives—relationships which seem to influence events in your client’s
company, but really don’t. You’ll see that you can send your client down some
pretty destructive roads if you’re not careful!

A step-by-step approach

Here is a guide to help you set up your searches:

  1. Understand
    that the successful result of this search is to find data items,
    previously thought to be unrelated, that rise and fall (or otherwise
    change) in tandem with one another. That is, you’re looking for data items
    that clearly change in a way that demonstrate a relationship between them.
    This is the functional goal of your data exploration effort.
  2. Hone
    in on your key item(s). If the goal is to connect the fluctuation of a
    performance-oriented data item to the fluctuation of an unknown data item,
    you have to have that known item to get rolling. In the first example
    above, the known key item is faulty widgets, graphed by date of
    manufacture. In the second, it’s customer characteristics, by age. There
    are many such items attenuating your client’s business performance:
    profitability, market share, order fulfillment cycles, manufacturing
    costs, advertising costs, seasonal sales, commissions, lead times,
    turns-per-quarter—pick a few and work with them.
  3. Consider
    that most warehouse explorations are time-oriented; that is, you are
    asking how processes have improved, or failed to improve, over time. How
    have revenues fluctuated over time? It’s important not to set this period
    too long or too short, and it can be different for every problem. When setting
    out to search for ways to improve your client’s business, make time frame
    a central factor in your search. Does significant fluctuation occur in
    your known data item in a day, week, or quarter? Whichever time frame you
    choose, set that interval as the one you will study, and collect enough
    data to cover enough intervals to spot meaningful trends in the
    fluctuations. Also important is to decide on a meaningful measurement and
    make that it is the basis of your comparison.
  4. Check
    yourself before beginning to explore. Get your graph of key-item-over-time,
    in the right increment and across a sufficient period of intervals, and
    then do a comparison against some factor you know to be influential. For instance, if you’re mapping sales
    of a specific product across the previous four quarters, you should be able
    to do a comparison graph against the dates of recent advertising campaigns
    and see some correlation between sales of the product and the days the
    promotions were in effect. In this way, you can verify that the tools
    you’re using, the data you’re pulling for comparison, and other factors
    are all effective.

Time to dig

Now it’s simply a matter of digging in and exploring. To do
the job thoroughly, you’ll interview the relevant users to learn what factors should influence your key items, and in
this way, have an idea of how to leave the path in order to begin sifting
through variables that are unknown. You won’t know that you’ve hit on something
until you see it. The task will require creativity, diligence, insight,
off-the-wall thinking from your client, and a bit of dumb luck.

Examples at work

How well does this work? Let’s look at the possible results
of such relationships in our previous examples. In the manufacturing problem,
you have an unexplained and seemingly random occurrence of manufactured widgets
that fail in the field. You go into your data, with your key item being the
date of manufacture. You graph the occurrence of faulty manufacturing across
time, and see that there are days during which several faulty widgets emerged
from your client’s factory in batches. But comparing these dates to the quality
checks on components shows that no faulty components correspond to your graph.
In your focused search, however, you looked at factors that seemed unrelated—including
the physical plant’s environmental logs. Bingo! The dates of faulty manufacture
were smog alert days; air purity was
at a low, and you have reason to investigate whether heavier air filtering in
the plant can reduce the number of faulty widgets.

In the market share problem your analysis of customers’
personal characteristics across time shows gaps; you’re losing customer loyalty
at various points in the aging spectrum. Your marketing people already know
this, but when you look at these characteristics, you notice that in each group
of loyal customers, there’s a high incidence of style sense – surveys show that
your client’s customers care as much about style as they do about function in
your client’s products. And in matching these characteristics against the
client’s product lines, you see that there are drop-offs in customer loyalty as
they hit these various gaps in age. There are few products to appeal to the
customers’ style sense as it evolves. Loyalty among customers age 18-24 is
high, but then it vanishes, and the graph picks up again in the mid-30s. Now
you can offer your client a business strategy: work on products that cater to
customers in these transitional years, so that brand loyalty is cultivated as
customers age.