Advances in data networks and storage mean organizations capture far more data than they ever have - perhaps a stream of measurements from manufacturing equipment, from vehicles, or from game-changers like web-enabled refrigerators (no, I've never seen one either).
The enterprise CTO may have the data storage part all figured out - their MongoDB cloud database is in place, or they rent DBaaS from Cloudant. But why? What does an enterprise do with all this unstructured data?
The first thing is to identify what the enterprise wants. Analytics can be an area of blind faith – if the enterprise is not clear about its big data needs, it may just hope that something good pops out.
Identify the big data needs.
Big data analytics, like all IT, is subordinate to business needs. An organization must figure out their requirements before working on big data.
No two organizations are the same, so there is always a variation in needs. The IT department may receive requirements like these.
- Crunch data for instant reports.
- Decode telemetry on the fly.
- Find a needle in a haystack in a vast quantity of signals.
- Find the regular operational patterns in a vast quantity of signals.
Analytics is a service-oriented area so the CTO could just finish his work there and outsource the rest. If he decides to keep it in-house, he needs a few more things.
Get some analytics applications.
Analytics applications help turn large data sets into business value. The enterprise uses analytics tools to tackle the difficult job of doing something useful with their unstructured data.
Data analytics products are one of the big data technologies and live in a data scientist's toolbox. Analytics products don't usually deliver ready-made business value.
When an organization purchases analytics applications, they must leave plenty of cash for the training budget. Complex tools are not intuitive.
Write a big data policy.
Managing large data sets is a difficult job. The big data manager has plenty of moving parts to configure to meet these requirements.
- What is the retention policy? What parts of the data pool can be deleted, and when? What happens to the rest of the historical data?
- What is the data protection policy? Who gets to view data? What are the privacy implications? What are the legal restrictions?
- Where is the data stored? If a cloud provider is holding the data, how do we get it back?
- What kind of meta-data is required? How can anyone identify the purpose of a big data store?
- How many data sets are there, and how can they be blended?
Assemble an analysis team.
The first part of building a team is partnering up a business executive and an IT sponsor. Both are required.
There may be a data warehouse and data miners in the organization, but probably no data scientists. There are a few ways of getting some.
- Hire experts. Pros are in demand.
- Hire people with the right capability and let them learn.
- Spot the budding statisticians in your organization and grab them.
Spotting capability means looking for clues. John Foreman is chief scientist at Mailchimp and writes a blog on data science. If someone is a fan of his work, that's a clue. Perhaps one of the data miners has an artistic streak. The person obsessively dragging consumer behaviour out of click trails is worth talking to.
That still leaves some gaps.
A few huge organizations, like telecoms companies and global retailers, have been battling with the problem of analytics for decades. They have specialist teams, home-grown tools, and years of experience. Alongside their expensive specialized capabilities, a brave new world of big data and commoditized data analytics is appearing. There is quite a way to go.
- The enterprise is doing new things with existing data sets, rather than collecting new data.
- Plenty of big data tools exist, but few tools ready for business users.
- Organizations in many parts of the world have not started exploiting big data.
- Better machine learning is required to extract signal from noise.
It takes statistical, technical and business expertise to get value from big data. Even where the analytics tools exist, they must be tailored for business needs - it's not a one-size-fits-all world.
Over to you, big data startups around the world. Plug those gaps.
Nick Hardiman builds and maintains the infrastructure required to run Internet services. Nick deals with the lower layers of the Internet - the machines, networks, operating systems, and applications. Nick's job stops there, and he hands over to the designers and developers who build the top layer that customers use.