In 2013, Google made a splash that turned into a public relations belly-flop when it attempted to predict flu outbreaks and was off its prediction by 140%. Its algorithm searched for words like “cold” and fever,” assuming that these searches pointed to flu, but the algorithm guessed wrong.
In 2017, computer scientists at Stanford developed a new deep learning algorithm that could diagnose 14 types of heart rhythm defects with accurate results. The algorithm was tested on over 30,000 patients with a broad variety of heart rhythm abnormalities. It was then subjected to rigorous reviews by cardiologists to confirm diagnosis accuracy. Only then was the algorithm deployed. It worked.
The contrast between these two use cases demonstrates how important it is for companies to get the algorithms and queries that operate on their big data right. If you’re facing similar challenges, here are four tips:
1. If you’re dealing with repeatable data, uncover the pattern
Machines and systems are often the best areas to study for repeatability and predictable data patterns that an algorithm can use as “norms” in data analysis, because machines and systems usually execute repeatable processes and operations. For example, if an automated sensor reports water pressure on an hourly basis and you suddenly begin receiving readings every second, your algorithm should alert you to unusual performance that requires intervention.
2. If you’re modeling human behavior to test out an algorithm, make sure your sample is accurate
In some cases, data quality is the problem–but in others, organizations are too hasty and presumptuous when they aggregate their data and data sources and are analyzing humans. For example, if you are a global retailer and your goal is to model consumer habits, and you find that US consumers like the idea of self-service, don’t stop there. You might find that in other markets you serve, such as China, people like being served, and dislike self service. If your initial data sources for your algorithm were inclusive enough, you wouldn’t miss this.
SEE: How to build a successful data scientist career (free PDF) (TechRepublic)
3. Iterate and fine-tune your algorithms
Algorithms are hypotheses. An hypothesis as defined by Merriam-Webster as a “tentative assumption made in order to draw out and test its logical or empirical consequences.” In other words, we cannot really know at the onset of an algorithm if it will tell us what we think it will.
In the Stanford heart rhythm algorithm trials, algorithms were continuously run and the results were assessed by a battery of cardiologists and experts. The algorithm was iteratively run until it diagnosed with high degree of accuracy every cardiac case thrown at it. Only then was the algorithm deployed.
4. Ask the right questions
The hardest thing to do is to ask the right questions with your algorithms. This is where it’s important to get subject matter experts from your business together with your data analysts and scientists. A business expert can ask a question like, “Is it the battery or the spring on our gizmo that fails most often?” instead of a general question like “Why does this gizmo fail?” The expert can ask this question because he or she already knows enough about the product and the problem, so can narrow the question down for a more direct and actionable answer. Once this is done, a trained data analyst or scientist can transform the question into an effective algorithm.
- 6 tips to ensure that the quality of your data will optimize the performance of your algorithms (TechRepublic)
- Machine learning: A cheat sheet (TechRepublic)
- Top 5: Things to know about machine learning (TechRepublic)
- We must make the ‘right choices’ when designing algorithms: Tim O’Reilly (ZDNet)
- AI and jobs: Where humans are better than algorithms, and vice versa (ZDNet)