Ask your data scientists to call my data scientists

When introducing analytic products and services into the marketplace, be prepared for an unlikely cog in the decision-making process: other data scientists.



You may have recently hired a data scientist or two to put some analytic punch behind one of your current offerings. Guess what? Your customers are doing the same thing. So if you plan to meet with a customer to talk about your new analytic product or service, don't be surprised if they show up with their own whip-smart data scientists to evaluate your product.

Don't miss: Big Data Primer for IT Pros ebook from Tech Pro Research

What's driving the demand for more data scientists?

As the base competence for science, technology, engineering, and mathematics (STEM) students grows, there will be an increasing demand to cater to their needs. The progression started when insightful STEM students started to realize the need for processing huge volumes of data in different, unconventional formats at very high rates of speed.

Pure-play big data solutions (i.e., products like Hadoop that specifically cater to big data needs) entered the scene. From there, visionaries saw the competitive implications of using big data analytics in their offerings (e.g., Progressive rewards good drivers by analyzing real-time driving data). Now, as adoption accelerates, people want to know what's under the hood, and they don't mind learning and/or buying the analytic competence to get their questions answered.

It's a fair demand -- everyone knows it's naive to invest in something you don't understand. The recent financial meltdown in 2008 was caused in large part by people buying obscure financial derivatives like Collateralized Debt Obligations (CDOs) that they didn't understand. We all know how that turned out.

These fancy financial instruments remind me of some of the big data analytic solutions available today. There are a lot of colorful brochures with people in white coats explaining how dazzling analytics can solve your biggest problems. That's not enough anymore -- smart people want to know more details, and your offerings should make it easy to find those answers. You'd never buy a car that had its hood sealed shut, so why would you expect your customers to invest in your black-box analytic solution on blind faith?

Something for everyone

You should build your analytic solutions to cater to multiple needs, including other data scientists who want to know how the machine works. Consider how a database management system (e.g., Oracle) is built. It caters to the needs of developers who need to create tables and manage data; however, instrumentation features (e.g., logs and trace files) are also critical for database administrators (DBAs) to keep the lights on.

There's no need to divulge proprietary information. Oracle doesn't share the intricate details of how it does cost-based optimization, although it does allow you to generate a trace file with an amazing amount of detail on how the optimizer decided to execute a query. This attitude can be applied to the design of your analytic solution as well. Think about the kinds of questions other data scientists might ask about your offering, such as:

  • If your solution involves causation, are you using logistical regression, neural networks, or something else?
  • How did you come up with your variable set, both independent and dependent?
  • Did you do qualitative research, or did you use another data mining tool?

In your design, keep in mind that you're building commercial features into your solution and not some backdoor for the intellectually curious. You must treat the analytic community as actual end users, not just as hobbyists who take an interest in your solution.

When Oracle's cost-based optimizer first rolled out, you could get details on its logic, though it required an estimable degree of knowledge in the obscure art of reading trace files. Furthermore, these techniques weren't in the official documentation -- you needed a connection with the underground DBA network to garner this knowledge. Don't make it this difficult for data scientists to get basic information about how your analytic solution operates; you should build these features right into the solution.


When considering the design of products that involve big data analytics, remember to address the customers' functional needs and the data scientists' non-functional needs, because the analysts are going to help the customers evaluate your product. You also need to be ready for a cross-examination from a fellow data scientist.

The bottom line is make sure you have a great engine to show off when customers and analysts lift your product's hood.

Note: TechRepublic and Tech Pro Research are CBS Interactive properties.