Big Data

Learn these 3 languages now if you want to become a data scientist

Demand for developers with data science skills continues to grow. Here's what you need to learn to break into a career in the field.

Looking to expand your skills in the tech realm? Demand for developers with data science skills is currently "very strong" among businesses, according to Shu Wu, director of Indeed Prime, with "tremendous growth" over the last four years for data scientist job postings.

"Job outlook is strong and data science roles command a high average salary, but the competition is tough," Wu said. "A data scientist that is an expert at examining data is great, but someone who can make data digestible for the entire organization is pinnacle."

Technology advances and the massive volumes of online data available are affecting every sector and have tremendous impacts on the economy, said Karen Panetta, IEEE fellow and dean of graduate engineering at Tufts University. This so-called "data avalanche" is not just about the sheer volume of data, but also the speed at which it changes and grows, and the diverse types of data available.

"Knowing how to use a spreadsheet and a traditional database will not suffice in the emerging Big Data revolution," Panetta said. "Analyses need to be done in real-time, where decisions can be critical. Being able to simply know how to use the software tools is only part of this challenge. Understanding the data across disciplines, being able to communicate its meaning, and using statistics will be the differentiating factors from a traditional 'number cruncher.'"

SEE: How to build a successful data scientist career (free PDF)

In terms of learning a programming language that allows you to work with data, "the standard across the board for any language is to find something and do it," said Forrester analyst Mike Facemire. "The great thing about writing code is that doing it wrong is a great learning experience." Facemire recommends going to Github to see examples, and finding a data set that interests you and learning to analyze it.

Ultimately, it's more important to understand how to solve a problem by breaking it down into smaller pieces than it is to know the language itself, Facemire said. "At the end of the day it's just a way to interface with a computer," he said. "The computer doesn't care which language you use, it cares more that you broke down your problem properly and solved it properly to get the proper outcome."

Some educational institutions have created data science degree programs, including Northeastern University, Boston University, CUNY and Merrimack College. Some of these schools offer online courses, and lower-cost programs and seminars are available through the IEEE Computer Society, Panetta said.

If you want to pursue a career in data science, you should consider learning one of the following three languages.

1. R

R is a language and framework used for data miners for developing statistical software and data analysis, Panetta said.

The language saw a large surge as data analysis and data science become more prevalent in the past couple years, Facemire said. It's popularity has since leveled off a bit, however. R has tooling that is built for data scientists, with extensions and plugins specifically for that purpose.

"It is essential when learning a language like R that individuals understand the fundamental mathematical skills," Panetta said. "It would be disastrous if we just trusted the outputs of software without knowing what we were truly measuring and without understanding the data we were providing it as input."

SEE: The 10 easiest programming languages to learn

2. Python

Python is a general purpose language, which is already hardy, and includes tooling that can fit into environments that require visualizations that will appear in websites or on mobile, Facemire said. It is also more readable than R, he added.

"If you're at the point in my career when you're thinking, 'I want to be a data scientist—which language should I learn?' I would look at both R and Python and see which makes sense to you," Facemire said. "Both are absolutely viable." Businesses usually don't prioritize one over the other in terms of required skills for data scientists, he added.

3. Java

Java was recently ranked as one of the most favored and most versatile language to write in, according to a survey from WP Engine. It's another general-purpose programming language that is specifically designed to have as few implementation dependencies as possible. It can be used to build virtually anything, particularly scalable, multithreaded platforms, and has a strong user base.

Java is also an interpreted language—unlike C and C++, Java doesn't require as much lower-level understanding of the hardware, Panetta said. That makes it easier for those studying in disciplines beyond computer science and engineering to learn it. Java is also the most in-demand coding language in terms of tech job postings, according to Indeed.

Image: iStockphoto/scyther5

Also see

About Alison DeNisco

Alison DeNisco is a Staff Writer for TechRepublic. She covers CXO and the convergence of tech and the workplace.

Editor's Picks

Free Newsletters, In your Inbox