How to avoid unrealistic data science project expectations: 8 tips

Organizations are enthusiastic about hiring data scientists, but must remember that data-driven projects will not solve every problem.

Top 5 questions a data scientist should be able to answer Tom Merritt asks the five questions every data scientist should have the answers to.

Organizations across all industries are collecting more data than ever, and looking to data scientists and analysts to glean insights that will help improve business. However, with all of the hype around big data, it's easy for data science project expectations to spiral out of control.

"Companies are super excited that data will solve every problem that they have," Andrea Danyluk, a professor of computer science at Williams College and co-chair of the Association for Computing Machinery's taskforce on data science. "It very well may be that data and data science will solve many of their problems and will move their business forward. But with every project you do, you should sit back and think very hard about the specific data you're collecting and the potential implications about what that's going to mean."

SEE: Job description: Data scientist (Tech Pro Research)

For example, this means considering potential biases within the data itself, and how those biases could impact your business moving forward, Danyluk said.

Ultimately, "data science is not a silver bullet," said Dave McCarthy, vice president of Internet of Things (IoT) provider Bsquare. "Instead it's the highly advanced and ongoing mathematical analysis of extremely large data sets in search of unique and actionable insights."

Here are eight tips on how your organization can avoid setting unrealistic data science project expectations.

1. Start small

Start with a small, low-risk project, said Meta S. Brown, business analytics consultant and author of Data Mining for Dummies. This means something that you aren't very worried about at the moment, but that has a high chance of yielding success.

"One of the most common places to do that that most organizations are not really doing is testing something in your email," Brown said. For example, most email newsletter vendors offer the ability to test alternative versions of an email. You could start testing your subject lines and seeing which produce more opens and clicks.

"That's as low-risk as you possibly can go—you have nothing to lose, and you don't have to spend any money, because your vendor already provides the technical capabilities," Brown said. "And you might find out that, hey, this subject line works better than that subject line. It's a good example of something that might be right there for you to do, and where you could start to show value."

SEE: Big data policy (Tech Pro Research)

2. Create an analytics plan and process

Organizations need an analytics process, Brown said. "When people complain that analysts are not solving the right problems or giving them the right information, that's a reflection of a process problem," she added.

The process can begin by gaining agreement on what in the organization is a problem, and choosing a small problem that everyone can define and agree to work on, Brown said. Then, you have to evaluate whether you have the data to solve it.

3. Ignore the trends

Avoid starting with a flashy project, Brown said. "Don't worry about what's cool. Worry about what's cost-effective for you," she added. "The cool factor can be a really big problem."

4. Don't obsess over tools

When it comes to a data-driven project, "tools are the last thing you should think about," Brown said. However, companies need to determine what particular products are important and spend the money when they need to, instead of spending a lot waiting or seeking another solution, she added.

5. Understand the computational limits

While data analysis can improve many processes, "there are mathematically provably things that cannot be done unconditionally," Danyluk said. "It's a wonderful thing to think that one field would be able to do everything to solve the world's problems with data. But there are things that cannot be done through the end—unless we have a completely different framework for how we think about computation, it's just not going to happen."

6. Remember that not all data is useable

Organizations must remember that collecting a lot of data does not mean that data is clean or useable, McCarthy said.

"While organizations may have large volumes of data, it is not always the case that the right data is collected, is structured correctly, or is rich enough to be able to garner the insights they are looking for," he added. "Often the data needs to be refined, cleansed, restructured, and even combined with other data sources before it can truly add value. Failure to understand this is the principle reason expectations often go unmet."

7. Don't expect to find a data science unicorn

When hiring a data scientist, many companies are seeking a magical candidate who has every possible qualification, but have trouble finding them, Brown said. And when they do make a hire, expectations for what that professional can do are often too high, she added.

"Frankly, a lot of people hire a data scientist, and don't get what they want out of them," Brown said. "Start with something modest, and establish a good process as your mode of operations from the start."

8. Allow for a learning curve

Companies should make data-driven projects "special projects" that are given support and resources, but considered outside of day to day operations at the start, said Kristen Sosulski, clinical associate professor of information, operations, and management sciences in the Leonard N. Stern School of Business at New York University, and author of Data Visualization Made Simple.

"There's a learning process there for the organization to learn about the data," Sosulski said. "Be cautious about taking action too quickly without having an understanding of it."

Also see

istock-965424616.jpg
Image: iStockphoto/Gorodenkoff Productions OU

By Alison DeNisco Rayome

Alison DeNisco Rayome is a Senior Editor for TechRepublic. She covers CXO, cybersecurity, and the convergence of tech and the workplace.