Image: NanoStockk, Getty Images/iStockphoto
Data science is a field offering plenty of diverse career path opportunities, and Glassdoor.com named it the number one job in several recent years.
Northeastern University lists on its site a comprehensive set of potential jobs related to data science including business intelligence developer, data/applications/infrastructure architect, machine learning scientist/engineer, and, of course, the traditional data scientist role.
SEE: Big data management tips (free PDF) (TechRepublic)
My colleague Alison DeNisco Rayome covered data science last year and provided a plethora of details related to the topic. I recently spoke with Martijn Theuwissen, co-founder of DataCamp, a data science educational organization, to learn more about the concept.
Scott Matteson: What skills are needed to be a data scientist?
Martijn Theuwissen: There is a common misconception that to become a data scientist one needs to know statistics, linear algebra, calculus, programming, databases, machine learning, I could go on. Some even say a Ph.D. is required. This couldn’t be further from the truth.
In fact, anyone can become a data scientist. All you need is a learning plan with measurable objectives, and a basic understanding of the popular data science languages like SQL, Python, and R.
But let’s step back a bit to see what data science really is. Data science can often be segmented as descriptive analytics, predictive analytics, and prescriptive analytics.
- Descriptive analytics is essentially describing data that your company already has in the form of reports, dashboards, or other ways to share data visualizations and summary statistics.
- Predictive analytics is the realm of prediction and machine learning: For example, classifying whether an email is spam or not, based on its content, whether a customer will churn, based on interactions with your company, or whether a tumor is benign or malignant, based on diagnostic imaging.
- Prescriptive analytics, or decision science, brings rigor to decision making by tying it to the data world. Sure, machine learning is sexy, but the lion’s share of the value data science has created today across most verticals is actually in descriptive analytics by serving relevant summary statistics, visualizations and dashboards to relevant internal stakeholders.
SEE: What does a data scientist do? We talked to one to learn about this popular and lucrative field (TechRepublic)
And anybody can do this! I’ve seen data scientists on marketing, commercial, and product teams need to redefine their own roles as their “non-technical” teammates have learned some SQL and data visualization in Python or R to do work and create value that was previously inconceivable. And these are just some of the skills that we want to help build data fluency throughout the world.
Scott Matteson: Can you look within your organization to find the useful skills? Should enterprises turn to education and training?
Martijn Theuwissen: Yes to both. There are data scientists at every company. Instituting a mentor program, for example, combined with a continuous learning curriculum can greatly improve data fluency across an organization.
And this is no longer an option — it’s an imperative. Data is king in business. Data science is a means by which you can use data to make business decisions. Without the basic data science skills, employees can’t make these important decisions.
As your team becomes more comfortable with the language of data, they’ll be more comfortable bringing data to bear on important business decisions. It will become clear that some team members are more comfortable using data skills than others are. Encourage the proficient ones to mentor others. Even at DataCamp, where data science is our business, some people don’t work with data continuously. When they need help on a complex problem, they pair up with those who do.
SEE: How to fail as a data scientist: 3 common mistakes (TechRepublic)
It’s all about shared tools, skills and responsibilities — they can dramatically improve communication and understanding between employees, which ultimately improves workplace culture.
Scott Matteson: Can employees be trained in data science?
Martijn Theuwissen: Absolutely. But first, companies need to create awareness that data science today is not exclusive to data scientists. In fact, many tasks at companies require some level of data science—finance, marketing, operations, and HR, just to name a few. It’s a cultural challenge as much as a skills challenge.
Second, companies need to implement upskilling initiatives that fit the lifestyle of their employees. Solutions like DataCamp that provide on-demand and interactive learning options were specifically built for busy people. This reflects a fundamental shift in the upskilling and reskilling initiatives taking place in many industries. We’re seeing a transition from L&D functions creating in-person training material to them, curating personalized content for their employees using online resources.
SEE: Oracle using data science to give retailers an intelligence edge (TechRepublic)
Most importantly, don’t take your foot off the gas pedal. Learning isn’t a one-off, especially in a dynamic space like data science. Make sure the programs you’ve implemented are repeatable and that you’re measuring success and growth. In the future of work, continuous learning is the norm. The number of tools developed and skills needed to solve real business problems is growing quickly. We’ve entered an age where continual learning is essential to staying professionally relevant. This is true generally, but even more so in the data world.
Scott Matteson: Do data scientists need a Ph.D.?
Martijn Theuwissen: There are no shortcuts to writing code, but with practice, anyone can build the skills needed to solve problems using data, especially with the right education tools.
For example, one of our employees pivoted from account executive to data scientist using DataCamp. We’ve also heard similar stories from our customers. Then you have examples of well-known data scientists without formal degrees. Cloudera Co-founder Jeff Hammerbacher, election forecaster Nate Silver (of FiveThirtyEight), and Moneyball brain Paul DePodesta are three that come to mind.
This is not to say there isn’t value in having a university degree in data science. In fact, we give DataCamp subscriptions for free to many universities because we stand for democratizing data science, regardless of the education medium.
Scott Matteson: Is being a data scientist about the skill, dedication, understanding, or education? A mix?
Martijn Theuwissen: A major part of being an effective data scientist, which goes beyond having any sort of degree or training program, is knowing how to conduct conversations and ask the right questions around such topics as:
- Data generation, collection, and storage
- What data looks and feels like to data scientists and analysts,
- Statistical intuition and common statistical pitfalls
- Model building, machine learning, and artificial intelligence (AI)
- The ethics of data, big and small
Monica Rogati, who’s a total rock star in our field, wrote a great article on this topic called Data Science Hierarchy of Needs that’s worth seeking out. I’m biased, of course, but I also highly recommend our brand new Data Science for Business Leaders course to learn more.
SEE: Top 5 things to know about data science (TechRepublic)
Scott Matteson: Can you describe what the daily activities of a data scientist are, using subjective examples?
Martijn Theuwissen: Today’s data scientists add value on a daily basis by conducting data collection and data cleaning; constructing dashboards and building reports; data visualization; statistical inference; communicating results to key stakeholders; and providing quantifiable evidence to decision makers on their results.
Data scientists in the tech industry now know how data science works and the value it provides. They begin each day by putting a solid data foundation in place–one that will conduct robust analytics. From there they utilize online experiments and other methods that will result in sustainable growth. Last, but not least, they construct machine learning pipelines and customized data products to help them gain a greater understanding of their business and customers and make better decisions.
Scott Matteson: How long would it take for an individual to learn the trade and launch a career in data science?
Martijn Theuwissen: A reasonable estimate is spending six months dedicated to learning full time and completing projects. This would also include writing them out in Jupyter / R Markdown notebooks. The work should also be published on github and a personal blog. All of that work would equip someone well for an entry-level position like junior data analyst or junior data scientist. From that point on, the key is continuous learning that includes all of the latest tools, techniques, concepts, communications, and questions.