filing cabinet

Filing: When data really was big Photo: Shutterstock

Big data? How can data be big? I thought it was all just tiny 0s and 1s.
That it is, but when those tiny 1s and 0s get together, they can become a whole other phenomenon, known as big data.

As we get told repeatedly by vendors, analysts and anybody with half an interest in the subject, the volume of data we generate is growing at a rate of knots.

But big data is more than that. The type of data we collect, and where we store it, is also a factor in defining big data.

Tell me more.
Well, back in the day, 80 per cent of corporate information was kept on paper, while the remaining 20 per cent was kept in electronic form. Of that 20 electronic per cent, 80 per cent was held in databases.

Oh, how things have changed. Across the business spectrum 80 per of companies’ information is in electronic form and at least 80 per cent of that is outside a database, ad hoc data kept in files somewhere.

Add to that the range of information that is being gathered compared to what businesses stored five or 10 years ago, and you can see immediately how things have changed. Organisations are collecting data from all sorts of sources that they previously wouldn’t have studied – CCTV images, data from social networks, video and audio files, health metrics, sensor feeds, blogs, and web traffic logs.

Five years ago, some of these data sources didn’t exist or weren’t on corporate radar. Now, thanks to a change in business’ attitudes and storage prices being on a downward trajectory, companies can gather and keep hold of all this information.

This unstructured data is radically different from the statistics businesses have typically gathered and kept in relational databases – there’s a lot more of it, for one thing, and it resists easy analysis and isn’t over-keen on getting into databases.

This coupling of unstructured and structured data is called big data.

So let me get this right – when you say big data, I’m thinking of huge volumes of corporate data from all sorts of sources?
That would be a fair assessment, yes. Some definitions of big data also count speed as a factor – the information coming in is being scrutinised on a near- to real-time basis – while once a retail chain’s sales data may have been collated and analysed on monthly or quarterly basis, now that information can be examined store-by-store, moment-by-moment.

So why are we talking about big data now?
Because a good few of tech’s big name vendors – EMC, IBM, Oracle – have woken up to the potential for sales behind this data explosion. Unstructured data needs systems and software in place before organisations can turn it into useful business insight, with those vendors all queuing up to offer products aimed at taming the big data overload.

So what’s in it for me? Why would I be interested?
Most industries are collecting these large pools of data – gathering more and more bits and bytes on their customers, employees and suppliers – and so have a resource they can exploit.

Some of the more obvious scenarios where big data might come in to play are…

…based around retail – by analysing weather and economic conditions, information on local events, supply chain details, real-time sales data, social network sentiment monitoring, information on employee performance and all sorts of other factors, retail chains can adapt their marketing promotions or sales strategy on the fly to maximise the number of units flying off shelves.

Financial services could also put big data to work by examining all sorts of data sources – voice recordings, demographic information, social networking behaviour, credit and transaction history for example – to determine likely sources of financial fraud.

Some industry watchers have even posited that big data might have a future in health: instead of going to the doctor to find out what’s wrong with you, big data systems could diagnose illnesses by crunching data from various sources – sensors detecting your temperature, information from your pacemaker or a study of your DNA. These systems could work out what might be going awry, or predict what condition might befall you in the future, enabling you or the health service to act accordingly.

Right, I get you. So the future of big data analysis is bright then?
Well, there are a couple of dark clouds on the big data analysis horizon at the moment – one, according to McKinsey & Company, will be the lack of big data trained staff, “particularly of people with deep expertise in statistics and machine learning, and the managers and analysts who know how to operate companies by using insights from big data”, the consultancy said in a recent report. That talent deficit could run to a 190,000 job shortfall in deep statistical analysis by 2018 in the US alone.

The other problem is a more superficial one – literally. The challenge with gathering data from so many sources and crunching it through huge amounts of computing resource is how to present it back to the poor old user at the other end. A bit of work on the front end of big data systems still has to be done before they’re as friendly as they might be.

Big data will remain problematic for large corporations over the coming years thanks to issues including how to present data gathered in a useful fashion, according to analyst house Gartner.

“Collecting and analysing the data is not enough – it must be presented in a timely fashion so that decisions are made as a direct consequence that have a material impact on the productivity, profitability or efficiency of the organisation. Most organisations are ill prepared to address both the technical and management challenges posed by big data; as a direct result, few will be able to effectively exploit this trend for competitive advantage,” it said recently.

It’s not a situation that’s likely to change any time soon, either: until at least the end of 2015, more than 85 per cent of Fortune 500 organisations will fail to effectively exploit big data for competitive advantage, Gartner reckons.