Top 5 ways humans bias machine learning

Tom Merritt explains the five ways human error can create issues with machine learning.

Top 5 ways humans bias machine learning Tom Merritt explains the five ways human error can create issues with machine learning.

Machine learning (ML). We often focus on the machine part of it, but let's think about the learning. Who do machines learn from? Okay, Generative Adversarial Network (GAN) people—sometimes from each other. But at the start, even GANs learn from data provided by humans; sometimes we pollute that data, just a little. Here are five ways humans bias machine learning.

SEE: Managing AI and ML in the enterprise (ZDNet special report) | Download the report as a PDF (TechRepublic)

  1. The square peg bias. This is where you just choose the wrong data set because it's what you have. For example: You want model sportswear purchases for your online clothing store, but you only have data on what people have been buying at brick-and-mortar shops.
  2. Sampling bias. You choose your data to represent an environment. Generally, you choose a subset of data that is large and representative, but you have to watch out for the human biases in picking that data; it can be as innocent as forgetting to include nighttime data in a training set for facial recognition.
  3. Bias-variance trade-off. You may cause bias by overcorrecting for variance. If your model is too sensitive to variance, small fluctuations could cause it to model random noise. Too much bias to correct this could miss complexity.
  4. Measurement bias. This is when the device you use to collect the data has bias built in, like say a scale that incorrectly overestimates weight; so the data is sound, and no statistical correction would catch it. Having multiple measuring devices can help prevent this.
  5. Stereotype bias. You're training a machine learning algorithm to recognize people at work, so you give it lots of images of male doctors and women teachers. This might even be mathematically sound, since the stereotype is social and might exist in the data without you even getting involved. But if you want a stronger ML, you'll need to correct for that social stereotype.

SEE: Free machine learning courses from Google, Amazon, and Microsoft: What do they offer? (Tech Pro Research)

Recognizing that the machines are only as good as their masters is essential to getting useful data out of them. And, you know, keeping them from getting mad at how badly we messed them up as children.

Also see

Image: iStockphoto/ipopba

By Tom Merritt

Tom is an award-winning independent tech podcaster and host of regular tech news and information shows. Tom hosts Sword and Laser, a science fiction and fantasy podcast, and book club with Veronica Belmont. He also hosts Daily Tech News Show, coverin...