Big Data investigate

Big data skills: Should data scientist be your next job?

A look at how IT workers can prepare to meet business demand for big data analytics skills.

With big data analytics comes a new role for IT workers, the data scientist. But just what is a data scientist and how do you become one?

The data scientist role is designed to help organisations make sense of the large number of disparate data sources that analytics platforms like Hadoop can interrogate.

Data scientists help organisations work out which of the many different internal and external datasets they should link together and query to generate useful insights for their business.

Big data analytics is sometimes sold as a boon for IT workers, with analyst house Gartner predicting that within three years there will be 4.4 million staff working on big data projects. But data scientist isn't a role just for computer science graduates, it is also suited experts in other scientific and mathematical disciplines.

Kirk Dunn, chief operating officer at Hadoop specialist Cloudera, said the job requires someone who on the one hand understands large-scale machine learning algorithms and programming and on the other is a statistician.

Dunn said that Cloudera has been training up scientists from inside and outside the IT industry to become data scientists - teaching statistical experts the necessary computer science and computer scientists the necessary statistical skills: "You can't hire this generation of data scientists, you have to build them."

Academics that specialise in data analytics and research,  such as econometrics or epidemiologists, are well suited to the job, he said.

What's important for a data scientist is an ability to know how and why businesses should be looking to link up data, he said.

"The data scientist takes a higher order view of things: for example, if they're at a retailer, correlating weather data with point of sale information and looking at their relationship to the supply chain.

"There are these differing types of data that aren't normalised for the same use but a data scientist should be able to architect something that says 'When this happens over here let’s look over here to see if there's a result'.

"It's understanding the relationships between data and how they interact with each other."

Courses for aspiring data scientists are growing in number, Cloudera runs its own introduction to data science course, as do Columbia University, The University of Washington, and UC Berkeley, and storage giant EMC. As well as full paid for courses costing more than $1,000 there are also a variety of online and DVD courses offered by the likes of EMC and Cloudera.

Brendan Moran, data scientist at EMC, said that software developers who want to become a data scientists need to be willing to reappraise how they approach problem solving.

"It is about the mind set and the difference between being an engineer and a scientist. A coder (engineer) will take a problem, and then start building the solution. A scientist will start questioning if it is possible, and if it is valid. Developers will therefore need to move away from the defined solution mindset," he said.

While CIOs may be holding off investing in big data, Cloudera's Dunn said the scarcity of data scientists means that companies are willing to pay well for qualified candidates, and that demand will only continue to grow.

"It’s a very rare and scarce commodity. As scarce as it is that makes it precious," he said.

Not just data scientists

Big data analytics requires more than data science skills and Cloudera has also trained 15,000 Hadoop developers and administrators.

While queries can be written for Hadoop in SQL, taking full advantage of the system does require existing database administrators to update their skills to take in the breadth of uses Hadoop can be put to.

"Hadoop has 15 open source projects - Pig, Hive Zookeeper etc," said Dunn.

"Each of those has their own particular capability and use. To be certified on Hadoop is to understand all those things and have some proficiency in all or some of them."

Cloudera offers a range of certification and training programmes for Hadoop administration and development - none longer than one week.

Prices vary, but as an indication training as a Hadoop-certified administrator costs in the region of $3,000.

Certain courses have pre-requisites, such as familiarity with SQL or database concepts, but there are no entrance exams yet.

About

Nick Heath is chief reporter for TechRepublic UK. He writes about the technology that IT-decision makers need to know about, and the latest happenings in the European tech scene.

5 comments
pawintx
pawintx

I've have eight years experience as a data analyst and QC primarily for Oracle, writing SQL, reports, etc. Who is the best resource to seek on how to move my career path towards big data analytics, data scientist so I can be a precious commodity?

Dr_Zinj
Dr_Zinj

They always use "s" in place of "z" in words like organize.

gauravkumar37
gauravkumar37

I'm currently a Software Engineer and have the fanatics for cloud computing and BigData analytics. Does it make sense for me to move from programming to data scientist role??? Thanks, GK

macmanjim
macmanjim

It must happen like magic. Just become a data scientist and employment awaits. With employers looking for unicorns because of the arrogance of supply, we'll just wave the wand and become data scientists. May be Joe Gerardi can wave a magic wand and change Luke Murton into Derek Jeter.

kirkhill
kirkhill

Wow, theres not much that gets past you. UK writers use 's' because they are writing in English. Don't need to be any kind of scientist to work that out. :)