Though we keep mythologizing data science, its most important work is pretty basic.
Say the words "data science" and images of number-crunching propellerheads come to mind, or rocket scientists moonlighting as interpreters of an enterprise's Hadoop cluster. But, according to Noah Lorang, a data scientist for Basecamp: "Data scientists mostly just do arithmetic."
Basic math? Really? Is that what companies are paying outsized salaries to recruit and retain? People that can add and subtract?
Yes there are two paths you can go by...
The answer, of course, is "maybe." It depends on the target audience. As I've outlined before, data science breaks down into two categories: Data science intended for human consumption and data science intended for machine consumption.
SEE: Job description: Data scientist (Tech Pro Research)
For the latter audience, data science "involves complex digital models that ingest large amounts of data and extract insights using machine learning and algorithms, then act autonomously to display certain ads or make stock trades in real time." As such, machine-oriented data scientists require "exceptionally strong mathematical, statistical, and computational fluency to build models that can quickly make good predictions," as former Google and Foursquare data scientist Michael Li wrote.
This may be the vision we have of data scientists, generally, but the aforementioned skillset is very different from that required to thrive in human-oriented data science.
"[N]umbers have no way of speaking for themselves. We speak for them. We imbue them with meaning," noted statistician Nate Silver. We bias our data the minute we start collecting it, as we determine what we'll collect, not to mention the types of questions we'll ask of it. There is no such thing as unbiased data, be it machine-oriented or human-oriented.
Bias is the natural state of all data.
Once we understand this, the role of the human-oriented data scientist becomes clear: Help the data tell clear stories. In an interview with ZoomData CEO Justin Langseth, he warned against the facile expectations of machine-driven data science, holding that "algorithmic insight...generates false positives that drive any human reviewer nuts or cause them to not trust the system."
By contrast, data visualization, with its explicit human involvement, facilitates "exploration [which can] lead...to 'aha insights.'"
In short, good data science requires good storytelling and data visualization. All of which starts with basic math.
And it makes me wonder...
This brings us back to Lorang, who suggests, "In the last two weeks, the most 'sophisticated' math I've done has been a few power analyses and significance tests."
So, what does he spend all his well-paid data scientist time doing?
"Mostly what I've done is write SQL queries to get data, performed basic arithmetic on that data (computing differences, percentiles, etc.), graphed the results, and wrote paragraphs of explanation or recommendation," Lorang said.
"I haven't coded up any algorithms, built any recommendation engines, deployed a deep learning system, or built a neural net."
While he leaves room for more "sophisticated" data science down the road, he insists that Basecamp doesn't need it now, and others probably don't, either:
The dirty little secret of the ongoing 'data science' boom is that most of what people talk about as being data science isn't what businesses actually need. Businesses need accurate and actionable information to help them make decisions about how they spend their time and resources. There is a very small subset of business problems that are best solved by machine learning; most of them just need good data and an understanding of what it means that is best gained using simple methods.
In his view, businesses need to better understand their data, which is an inherently human problem. Langseth echoed this sentiment, when he told me that "The best [data] visual[ization] is the one that allows a normal human with understanding of a business system to quickly see how the visuals match up with the system."
Ultimately, declares Lorang, "Knowing what matters is the real key to being an effective data scientist." And that, it would seem, generally comes down to common sense, a bit of math, and the ability to tell a story with data.
- Data scientist: Your mileage may vary (TechRepublic)
- Cut the marketing nonsense: Will the real data scientist please stand up? (ZDNet)
- The center of gravity in big data is shifting to Spark (TechRepublic)
- Big data's big problem? Most companies don't realize they're already using it (TechRepublic)
- Big data developers' hallelujah moment for distributed storage (TechRepublic)