Big Data

Cheat sheet: How to become a data scientist

If you are interested in pursuing a career in data science, this primer is a good reference for information about salary, hottest job markets, training, and more.

Data scientists are in high demand, taking the coveted no. 1 spot on Glassdoor's Best Jobs in America list in both 2016 and 2017 and boasting high average salaries for those with the right skill set. In 2012, the Harvard Business Review billed data scientist as the "sexiest job of the 21st century."

"One of the big reasons we continue to see such demand for data scientists is every company out there is becoming a tech company," Allison Berry, Glassdoor community expert, told TechRepublic. "In any industry that has to deal with digitized data, or has an app or an online presence, you need people who can help support all of that and find insights from the data."

However, we are currently facing a shortage of professionals with data science skills: By 2020 the number of annual job openings for all data savvy professionals in the US will increase to 2.7 million, IBM predicted. Those with data science skills can command an average salary of $96,441 in the US as of October 2017, with 0.9% year-over-year growth, according to Glassdoor.

To help those interested in the field better understand how to break into a career in data science, we've created a guide with the most important details and resources.

SEE: All of TechRepublic's smart person's guides

Executive summary

  • Why is there an increased demand for data scientists? Nearly every company now has the ability to collect data, and the amount of data is growing larger and larger. This has led to a higher demand for employees with specific skills who can effectively organize and analyze this data to glean business insights.
  • What are some of the data scientist job roles? Core data scientist, researcher, and big data specialist are the top job titles in the data science field.
  • What skills are required to be a data scientist? Python, R, and SQL are the top three skills found in data science job listings, according to Glassdoor. Nine out of every 10 data science job postings on Glassdoor require knowledge of at least one of those languages.
  • Which industries have the hottest markets for data scientists? Demand for data scientists is expected to grow in the coming years. Some 59% of all data science and analytics job demand currently is in finance and insurance, professional services, and IT industries, according to IBM.
  • What is the average salary of a data scientist? The national median base salary for data scientists was $96,441 as of October 2017, with 0.9% year-over-year growth, according to Glassdoor. Salaries are much higher in some cities, such as San Francisco and Seattle.
  • What are typical interview questions for a career in data science? Typical interview questions for a data science job may include walking the interviewer through a past project, describing your experience working with teams and communicating to leadership, and completing an exercise.
  • Where can I find resources for a career in data science? The Data Science Association, The Institute for Operations Research and the Management Sciences, and the International Institute for Analytics are national and international organizations where you can seek out information about the profession as well as certification and training options. A number of online courses in programming languages such as Python, R, and SQL are available from many providers.

Additional resources:

Why is there an increased demand for data scientists?

As every company becomes a tech company to some degree, the need for skilled professionals who can analyze that data and glean business insights increases.

"As the size of data at companies grow larger and larger, there is higher demand for employees with specific skills who can effectively organize and analyze this data," Pablo Ruiz Junco, Glassdoor economic research fellow, told TechRepublic. "At the same time, the amount of people with these skills is still relatively low compared to the demand, which results in higher pay."

SEE: Job description: Data scientist (Tech Pro Research)

Technology advances and the massive volumes of online data available are affecting every sector, and have tremendous impacts on the economy, Karen Panetta, IEEE fellow and dean of graduate engineering at Tufts University, told TechRepublic. This so-called "data avalanche" is not just about the sheer volume of data, but also the speed at which it changes and grows, and the diverse types of data available.

"Knowing how to use a spreadsheet and a traditional database will not suffice in the emerging Big Data revolution," Panetta said. "Analyses need to be done in real-time, where decisions can be critical. Being able to simply know how to use the software tools is only part of this challenge. Understanding the data across disciplines, being able to communicate its meaning, and using statistics will be the differentiating factors from a traditional 'number cruncher.'"

Additional resources:

What are some of the data scientist job roles?

Generally speaking, data scientists mine purchasing data and analyze it for specific company interests, and then work with marketing departments to capitalize on that knowledge. These workers must be familiar with data-gathering software, programming, and warehousing techniques.

SEE: How to build a successful data scientist career (free PDF) (TechRepublic)

Data science jobs fall into three main roles: Core data scientists, researchers, and big data specialists, according to Glassdoor research.

Core data scientists make up 71% of open jobs, and are likely to have skills in Python, R, and SQL. These professionals command an average estimated salary of $116,203 in the US. Researchers claim 15% of open jobs, and are likely to have skills in SAS, MATLAB, Java, Hadoop, Python, and R. The average estimated researcher salary is $112,346. Big data specialists take up the remaining 14% of open data science jobs, and with skills in Spark, Hive, Hadoop, Java, Python, they command an average estimated salary of $121,246, Glassdoor found.

Within these positions include specialties such as data engineers, which need skills such as Apache Hadoop, Java, and Python, and finance and risk analytics managers, which use skills such as risk management, financial analysis and planning, and SQL, according to IBM.

Additional resources:

What skills are required to be a data scientist?

Here are the 10 most frequently mentioned skills in job postings for data science positions, and the percentage of job postings they are found in, according to a September 2017 Glassdoor report:

1. Python (72%)

2. R (64%)

3. SQL (51%)

4. Hadoop (39%)

5. Java (33%)

6. SAS (30%)

7. Spark (27%)

8. MATLAB (20%)

9. Hive (17%)

10. Tableau (14%)

Nine out of every 10 job postings in the sample required at least Python, R, and/or SQL skills, Glassdoor found. These skills are closely interconnected, which makes them "bread and butter skills" that every data science job seeker should learn, according to the report.

"If you're looking to enter the field of data science, and build a solid foundation of experience that will stand out in the eyes of future employers, there are three core skills you need: Python, R and SQL," Pablo Ruiz Junco, Glassdoor economic research fellow, told TechRepublic. "With these skills, you'll be eligible to apply to over 70 percent of all online job postings for data scientist roles. Plus, expanding your skills beyond these foundational languages can lead you to a higher salary and allow you to cast a wider net when applying."

SEE: Analytics and data jobs: What employers are looking for (ZDNet)

Some 32% of full-time data scientists started learning machine learning or data science through a Massive Open Online Course (MOOC), while 27% said that they began picking up the needed skills on their own, according to a report from Kaggle. And 30% said they got their start in data science at a university.

Additional resources:

Which industries have the hottest markets for data scientists?

IBM predicted in May 2017 that by 2020 the number of annual job openings for all data savvy professionals in the US will increase to 2.7 million. Some 59% of all data science and analytics job demand is in finance and insurance, professional services, and IT industries, IBM found. Annual demand for the fast-growing new roles of data scientist, data developers, and data engineers will reach nearly 700,000 openings by 2020.

Demand for developers with data science skills is currently "very strong" among businesses, according to Shu Wu, director of Indeed Prime, with "tremendous growth" over the last four years for data scientist job postings.

While the job outlook for data scientists in the coming years remains strong, and these professionals can command high salaries, the competition is tough, Wu told TechRepublic. "A data scientist that is an expert at examining data is great, but someone who can make data digestible for the entire organization is pinnacle," Wu said.

Additional resources:

What is the average salary of a data scientist?

The US national median base salary for data scientists as of October 2017 was $96,441, with 0.9% year-over-year growth, according to Glassdoor. Data scientists in San Francisco are the highest paid, with a median base salary of $137,688, and 1.5% year-over-year pay growth, followed by Seattle ($125,825), Los Angeles ($117,093), and New York City ($114,598).

As seen above with the salary differences between core data scientists, researchers, and big data specialists, the skills that individual data scientists bring to the table can have a large impact on pay. Job seekers should consider what role they are most interested in, and make a cost-benefit analysis of which skills are worth spending time learning.

Additional resources:

What are typical interview questions for a career in data science?

"To assess if a candidate can be successful as a data scientist, I'm looking for a few things: baseline knowledge of the fundamentals, a capacity to think creatively and scientifically about real-world problems, exceptional communication about highly technical topics, and constant curiosity," said Kevin Safford, senior director of engineering at Umbel.

A junior data scientist can expect questions like the following in a job interview, according to Forrester analyst Kjell Carlsson:

  • Walk me through the project that you are most proud of where you used data/data science/machine learning/advanced analytics. What was your role on the project, and what did you do in each step?
  • Tell me about a project where you used (insert language or skill here, e.g., Python, R).
  • Tell me about a time you had to work with someone who is not data-savvy on a data science project.
  • Pretend I am not a data scientist, explain (insert data science topic, e.g., cross validation, unsupervised learning, etc.) to me.
  • Tell me about a time you had to work with very messy data.
  • Tell me about your experience working in teams.
  • Tell me about a time when you had to become an expert on a new technique quickly.

The interviewee might be given a mini-case study based on a data science project the team has undertaken, with questions such as: What data would you need? What are the hypotheses you would like to test? What technique(s) would you use to evaluate them?

An interview may also include an exercise in which the interviewee is given a data set and a broad question, and asked to present their findings, Carlsson said.

For more senior positions, these questions may come up, according to Daniel Miller, vice president of recruiting at Empowered Staffing:

  • Have you built a data warehouse from scratch? If so, tell me about the process you created in order to successfully implement the data warehouse. (If they have not been part of it from scratch, you can ask if they have been part of a department that dealt with a company merger or acquisition of data and how they handled it.)
  • What types of customized dashboards have you built, and what information/analytics were being presented through your dashboard?
  • Tell me about the most complicated data project you have worked on, and what you were able to do in order to achieve success.
  • How are you with explaining and presenting data to executive and senior leadership?

Additional resources:

Where can I find resources for a career in data science?

The Data Science Association, The Institute for Operations Research and the Management Sciences, and the International Institute for Analytics are national and international organizations where you can seek information about the profession as well as certification and training options.

Some educational institutions have created data science degree programs, including University of California Berkeley, Northwestern University, Carnegie Mellon University, and Kennesaw State University. Some of these schools offer online courses.

You can find a number of online programming courses, such as those in Python, R, and SQL, from many providers. Programs and seminars are also available through the IEEE Computer Society.

A number of certifications in data science are also available. These include the vendor-neutral Certified Analytics Professional (CAP), the Dell EMC Proven Professional certification program, the Microsoft Certified Solutions Expert (MCSE), and the SAS Data Science Certification.

Additional resources:

istock-835626948.jpg
Image: iStockphoto/PRImageFactory

About Alison DeNisco Rayome

Alison DeNisco Rayome is a Staff Writer for TechRepublic. She covers CXO, cybersecurity, and the convergence of tech and the workplace.

Editor's Picks

Free Newsletters, In your Inbox