Bias in machine learning, and how to stop it

AI and machine learning fuel the systems we use to communicate, work, and even travel. But bias seeps into the data in ways we don't always see. Here's why blocking bias is critical, and how to do it.

Image: iStockphoto/Ociacia

As AI becomes increasingly interwoven into our lives—fueling our experiences at home, work, and even on the road—it is imperative that we question how and why our machines do what they do. Although most AI operates in a "black box" in which its decision-making process is hidden—think, why did my GPS re-route me?—transparency in AI is essential to building trust in our systems.

But that transparency is not all we want: We also need to ensure that AI decision-making is unbiased, in order to fully trust its abilities.

The issue of bias in the tech industry is no secret—especially when it comes to the underrepresentation of and pay disparity for women. But bias can also seep into the very data that machine learning uses to train on, influencing the predictions it makes.

"Any time you have a dataset of human decisions, it includes bias," said Roman Yampolskiy, director of the Cybersecurity Lab at the University of Louisville. "Whom to hire, grades for student essays, medical diagnosis, object descriptions, all will contain some combination of cultural, educational, gender, race, or other biases."

So how exactly has biased data impacted algorithms?

After Pokémon Go was released, several users noted that there were fewer Pokémon locations in primarily black neighborhoods. That's because the creators weren't spending time in those neighborhoods. "It was a bias that came in because of the fact that people who wrote these algorithms were not a diverse group," said Anu Tewary, chief data officer for Mint at Intuit.

Tewary pointed to several other examples of bias in machine learning. On LinkedIn, for instance, it was discovered that high-paying jobs were not displayed as frequently for women as they were for men. "Again, it was biases that came in from the way the algorithms were written. The initial users of the product features were predominantly male for these high-paying jobs, and so it just ended up reinforcing some of the biases," she said.

Google's face recognition software has also experienced problems with racial bias. When it was initially rolled out, it tagged a lot of black faces as gorillas. "That's an example of what happens if you have no African American faces in your training set," Tewary said. "If you have no African Americans working on the product. If you have no African Americans testing the product. When your technology encounters African American faces, it's not going to know how to behave."

SEE: Big data can reveal inaccurate stereotypes on Twitter, according to UPenn study (TechRepublic)

It's not until the algorithm is used "in the wild" that people discover these built-in biases, which are then amplified, Tewary said. At Intuit, she devotes a lot of effort toward preventing biased data from being used in the company's products.

Diversifying tech and reducing bias

Tewary has a background in physics and math, with a computer science degree from MIT. After working at mobile advertising startup AdMob, which was acquired by Google, Tewary said she began to notice certain trends around women and tech. Women and girls who were early adopters of tech "tended to view themselves as consumers of technology, especially mobile technology," Tewary said. "Yet they didn't view themselves as creators of that technology."

To inspire more women to get involved in tech and therefore reduce gender bias in tech products, Tewary started a program called the Technovation challenge in 2009, meant to empower females to see themselves as creators of technology. To date, the global program has been completed by 10,000 girls.

Tewary said her experience in tech, which has also included stints at LinkedIn and Level Up Analytics, which she founded, has informed her focus on the importance of inclusion.

Bias in AI has larger implications, as well. Take driverless cars. These are examples of tech that rely on a collection of algorithms. "There are a lot of different algorithms that make up a large part of technology, and if women are left out of the process at any of these steps, or in any part of the various technologies, that's where biases can really harm women as a group," said Tewary.

"Imagine if there were no women on the team that either built the cars or tested the cars. Then if the technology was faced with a woman either operating or interacting with the car, it might have problems trying to understand the voice, or understand the person, and so on," she said. "That's an extreme case."

Machine learning is integral to many of Intuit's products, such as QuickBooks Self-Employed. Other important services at Intuit, like QuickBooks Financing, help people with small business loans. "We have to make sure the bias doesn't creep into these models," said Tewary.

Why? Because data determines when someone is "creditworthy," and biased data could impact those deemed worthy of a loan. The government regulates what features can be used to determine whether someone is eligible for a loan.

"We have to be cognizant of the fact that there is potential for bias, and make sure the features that we use to determine credit-worthiness for a small business don't fall into the trap of having these biases determining credit-worthiness," Tewary said.

How would bias creep in? Through names, for example. "Generally, female names versus male names show different patterns of behavior," she said, which "reinforces the bias." The same can happen when determining loan-eligibility. Zip-codes, she said, is another element that could reinforce bias. "If you have predominantly a minority neighborhood, then you have bias that creeps in through that," said Tewary.

It's no surprise that many algorithms contain bias, because they are written by people who have both conscious and unconscious biases, Tewary said.

Still, it's critical that organizations ensure that their data is checked for bias. "The data we collect, from climate, health, energy, and human behavioral data equally should represent all our world," said Manuela Veloso, head of machine learning at Carnegie Mellon. "Our social responsibility is now transferred to collecting as much representative data as possible."

Also see...

About Hope Reese

Hope Reese is a Staff Writer for TechRepublic. She covers the intersection of technology and society, examining the people and ideas that transform how we live today.

Editor's Picks

Free Newsletters, In your Inbox