Big data: Cheat Sheet

You know all that information that fits into nice relational databases? It's not that...

filing cabinet

Filing: When data really was big Photo: Shutterstock

Big data? How can data be big? I thought it was all just tiny 0s and 1s.
That it is, but when those tiny 1s and 0s get together, they can become a whole other phenomenon, known as big data.

As we get told repeatedly by vendors, analysts and anybody with half an interest in the subject, the volume of data we generate is growing at a rate of knots.

But big data is more than that. The type of data we collect, and where we store it, is also a factor in defining big data.

Tell me more.
Well, back in the day, 80 per cent of corporate information was kept on paper, while the remaining 20 per cent was kept in electronic form. Of that 20 electronic per cent, 80 per cent was held in databases.

Oh, how things have changed. Across the business spectrum 80 per of companies' information is in electronic form and at least 80 per cent of that is outside a database, ad hoc data kept in files somewhere.

Add to that the range of information that is being gathered compared to what businesses stored five or 10 years ago, and you can see immediately how things have changed. Organisations are collecting data from all sorts of sources that they previously wouldn't have studied – CCTV images, data from social networks, video and audio files, health metrics, sensor feeds, blogs, and web traffic logs.

Five years ago, some of these data sources didn't exist or weren't on corporate radar. Now, thanks to a change in business' attitudes and storage prices being on a downward trajectory, companies can gather and keep hold of all this information.

This unstructured data is radically different from the statistics businesses have typically gathered and kept in relational databases – there's a lot more of it, for one thing, and it resists easy analysis and isn't over-keen on getting into databases.

This coupling of unstructured and structured data is called big data.

So let me get this right – when you say big data, I'm thinking of huge volumes of corporate data from all sorts of sources?
That would be a fair assessment, yes. Some definitions of big data also count speed as a factor – the information coming in is being scrutinised on a near- to real-time basis – while once a retail chain's sales data may have been collated and analysed on monthly or quarterly basis, now that information can be examined store-by-store, moment-by-moment.

So why are we talking about big data now?
Because a good few of tech's big name vendors – EMC, IBM, Oracle – have woken up to the potential for sales behind this data explosion. Unstructured data needs systems and software in place before organisations can turn it into useful business insight, with those vendors all queuing up to offer products aimed at taming the big data overload.

So what's in it for me? Why would I be interested?
Most industries are collecting these large pools of data – gathering more and more bits and bytes on their customers, employees and suppliers – and so have a resource they can exploit.

Some of the more obvious scenarios where big data might come in to play are...