Advances in data networks and storage mean organizations
capture far
more data
than they ever have – perhaps a stream of measurements from
manufacturing equipment, from vehicles, or from game-changers like web-enabled
refrigerators (no, I’ve never seen one either).

The enterprise CTO may have the data storage part all figured
out – their MongoDB cloud database
is in place, or they rent DBaaS from Cloudant.
But why? What does an enterprise do with all this unstructured data?

The first thing is to identify what the enterprise wants.
Analytics can be an area of blind faith – if the enterprise is not
clear about its big data needs
, it may just hope that something good pops
out.

Identify the big data
needs.

Big
data analytics
, like all IT, is subordinate to business needs. An
organization must figure out their requirements before working on big data.

No two organizations are the same, so there is always a
variation in needs. The IT department may receive requirements like these.

  • Crunch data for instant reports.
  • Decode telemetry on the fly.
  • Find a needle in a haystack in a vast quantity
    of signals.
  • Find the regular operational patterns in a vast
    quantity of signals.

Analytics is a service-oriented area so the CTO could just finish
his work there and outsource the rest. If he decides to keep it in-house, he
needs a few more things.

Get some analytics
applications.

Analytics applications help turn large data sets into business
value. The enterprise uses analytics tools to tackle the difficult job of doing
something useful with their unstructured data.

Data analytics products are one of the big
data technologies
and live in a data scientist’s toolbox. Analytics products
don’t usually deliver ready-made business value.

When an organization purchases analytics applications, they
must leave plenty of cash for the training budget. Complex tools are not
intuitive.

Write a big data policy.

Managing large data sets is a difficult job. The big data
manager has plenty of moving parts to configure to meet these requirements.

  • What is the retention policy? What parts of the
    data pool can be deleted, and when? What happens to the rest of the historical
    data?
  • What is the data protection policy? Who gets to
    view data? What are the privacy implications? What are the legal restrictions?
  • Where is the data stored? If a cloud provider is
    holding the data, how do we get it back?
  • What kind of meta-data is required? How can
    anyone identify the purpose of a big data store?
  • How many data sets are there, and how can they
    be blended?

Assemble an analysis
team.

The first part of building a team is partnering up a business
executive and an IT sponsor. Both are required.

There may be a data warehouse and data miners in the
organization, but probably no data
scientists
. There are a few ways of getting some.

  • Hire experts. Pros
    are in demand
    .
  • Hire people with the right capability and let
    them learn.
  • Spot the budding statisticians in your
    organization and grab them.

Spotting capability means looking for clues. John Foreman is chief scientist at
Mailchimp and writes a blog on
data science
. If someone is a fan of his work, that’s a clue. Perhaps one
of the data miners has an artistic streak. The person obsessively dragging
consumer behaviour out of click trails is worth talking to.

That still leaves some
gaps.

A few huge organizations, like telecoms companies and global
retailers, have been battling with the problem of analytics for decades. They
have specialist teams, home-grown tools, and years of experience. Alongside
their expensive specialized capabilities, a brave new world of big data and commoditized
data analytics is appearing. There is quite a way to go.

  • The
    enterprise
    is doing new things with existing data sets, rather than
    collecting new data.
  • Plenty of big
    data tools
    exist, but few tools ready for business users.
  • Organizations in many parts of the world have
    not started exploiting big data.
  • Better
    machine learning
    is required to extract signal from noise.

It takes statistical, technical and business expertise to get value
from big data. Even where the analytics tools exist, they must be tailored for
business needs – it’s not a one-size-fits-all world.

Over to you, big data startups around the world.
Plug those gaps.