EU

Big data: Two truths and five myths

Along with the hype, the concept of big data comes with its own collection of misconceptions and half-truths that one CTO is keen to dispel.

Mention big-data analytics to Bob Harris, CTO at UK broadcaster Channel 4, and he immediately produces a list of pet hates. Top of the bugbears are IT people moaning about the technology's inherent difficulties.

"All the time I come across people who tell me why they cannot do things. I don't know about you but my job is to do things, not can't do things. In reality, people hide behind the complexity," Harris said.

"If you use the communities, you can meet people who are doing the same stuff. It's just about finding out how people are overcoming problems and what people are using the technology for," he said.

Vast quantities of data

According to Harris, rapidly collecting and analysing vast quantities of data is at the heart C4's drive to improve the experience of viewers and differentiate the channel from rivals.

"It starts in the R&D strand within Channel 4 and I'm always playing with the next thing. For me right now that tends to be that real-time Storm stuff and things like that," he said.

Business intelligence has been well established at C4 for years, Harris said, with industry-standard proprietary models and real-time data warehousing. But now Hadoop and Amazon's Elastic MapReduce are the organisation's primary big-data platform and Harris is also experimenting with the R statistical analytics language and Mahout for machine learning.

At the recent Whitehall Media Big Data Analytics conference in London, Harris set out his take on a list of preconceptions that bedevil the technology:

1. Relational databases can do big data

"I meet people who tell me this is nothing that can't be done on RDBMS. If you believe that fundamentally, quit now. This cannot be done on an RDBMS and I've been working with those since they started. If you think you can do it with last-generation technology, you're probably not doing big data."

Verdict: Myth

2. Big-data analytics is a completely different approach

"When I started in IT, it was called data processing and we did everything in batch. We crunched the data, we printed it out and we did it again. You look at the way Hadoop works, it takes that big dataset, breaks it into little pieces, rips through them sequentially, puts them into a shuffled sort and then whacks them through the reducer and out come the results. It is actually a batch pipeline. People like me started there originally."

Verdict: Myth

3. Open source is the only option

"No, it isn't but I am amused by how many products from companies that are more associated with proprietary products are actually using open source in there somewhere. I'm a cloud man, I'm an open-source man but for me open source is largely the future."

Verdict: Myth

4. It's really difficult

"Well, it has got a steep learning curve, that's certainly true. To sell it to our own teams, I spent a very long weekend hacking Python code in MapReduce just to demonstrate that I could rip through a few millions lines of data very quickly. When I was confident enough to think, 'I can write this stuff', that's when you go and find the people in your teams who really want to move forward with this technology.

Verdict: Myth

5. Big data is immature and lacks tools

"That's true. In reality Hadoop, which we're pretty much all hanging our futures on, went 1.0.0 in 2011. So if you've got a policy that says you do nothing before the 3.0 version, you're in trouble. Hive is 0.11, Pig is 0.11.1, so most of this stuff hasn't even got to 1.0 yet. It is immature."

Verdict: Truth

6. It's totally incompatible with your BI platform and tools

"Most importantly, this is not incompatible with what you've already got. When you crunch through 20 billion rows of data and get 10 million rows of results out the end, what is the best place to put that? It's in an RDBMS. You put it back into an RDBMS, you put it back into your current reporting system and you use your sunk investment in your current reporting to make use of that. Think of this as ETL on steroids."

Verdict: Myth

7. It's difficult to find skilled and experienced staff

"Yes it is. So go hang out where they hang out. Go to the Meetups, go to the user groups. Dress down a bit, put your baseball cap on, go mix with them - it's a lot of fun. I was at the Storm user group Meetup and we had Nathan Marz, the author of it on Skype from the US. You get a chance to say to the guy, 'How are we meant to be doing this?', 'Did you think about that?' And it's brilliant depth you can get into. With the best will in the world, you can't do that with proprietary products."

Verdict: Truth

About

Toby Wolpe is a senior reporter at TechRepublic in London. He started in technology journalism when the Apple II was state of the art.

3 comments
ChaoticMike
ChaoticMike

>>It's the nature of the beast until things settle down again My 20 years in the business have taught me one thing... things don't settle down. There is always change, always at the edges, sometimes in the middle of functional approaches to doing things, which is when disruption occurs. And for IT to deliver, we can't work in a vacuum: we do what the business wants (different sometimes from what it needs) and it can be a real battle to persuade a large public sector organisation to think strategically and then successfully execute against that strategy.

aureolin
aureolin

... but no proofs. This article is a nice bit of eye candy, but there's no substance to it. You say "These are myths" yet you provide nothing (not even links!) to back it up. Unlike 'PeteDude' above, this is NOT something that can be referred to in the future when Big Data issues come up.

PeteDude
PeteDude

Hope to refer to this article in the future when I have Big Data questions! Technology paradigms are changing rapidly enough that we are being forced to work with a lot of tools that are immature. It's the nature of the beast until things settle down again. Best that we IT pros just get with the program and deliver the goods!

Editor's Picks