Big Data

Hadoop: Cheat Sheet

An elephant-themed, open-source way to tackle big data...

...major backer of the open-source project since.

Earlier this year, Yahoo spun out its Hadoop efforts to make Hortonworks, a company that works on Hadoop development as well as providing services for companies wanting to install Hadoop.

Hadoop toy elephant

The symbol of Hadoop is a yellow elephantCreative Commons: Erik Eldridge

Do companies need help installing Hadoop then?
Well, one of the criticisms levelled at Hadoop is indeed that it isn't too easy to manage and use - it's more of a job for the technically minded than the average end user.

"Installing, configuring and administering a production-scale Hadoop cluster requires considerable system administration expertise. Interacting with Hadoop requires a detailed knowledge of programming languages," a recent report by analyst house Gartner said.

It's worth noting that a number of companies are working on solving the installation problem, including Dell which recently announced a product called Crowbar that automates the installation of Hadoop onto commodity servers and has been generating a bit of buzz about the project.

Other business problems that need solving before Hadoop can see more widespread uptake, according to Gartner, is the need for better integration with existing business intelligence tools and the development of a user interface for non-technical end users, perhaps focusing on data visualisation.

That shouldn't put companies off deploying it, mind. Organisations wanting to jump on the Hadoop bandwagon could lose the first-move advantage if they're put off by technical considerations, Gartner said.

So where did Hadoop come from?
An interesting question. The inspiration for Hadoop was a couple of papers published by Google.

Think about it - when it comes to big data, there are few companies gathering quite as much as Google. After all, it's trying to index the entire web and more besides.

In the heady mid-2000s, Google came up with the idea for its own distributed computing system, publishing two papers to that effect - Google File System and MapReduce (both PDFs).

This inspired technologist Doug Cutting - who'd been involved in two open-source search projects, the software library Lucene and web crawler Nutch - to create Hadoop as a way of enabling these projects to take advantage of distributed computing.

Hadoop itself is named after a toy elephant owned by Cutting's son.

ZDNet UK's Jack Clark contributed to this report.

About Jo Best

Jo Best has been covering IT for the best part of a decade for publications including, Guardian Government Computing and ZDNet in both London and Sydney.

Editor's Picks

Free Newsletters, In your Inbox