Algorithms can be racist: Why CXOs should understand the assumptions behind predictive analytics

With predictive models making decisions about hiring, pricing, and even policing, here's how IT leaders can understand and mitigate fears that these mathematical models are biased.

Image: iStock/enotmaks

Predictive analytics have long been a sought-after goal in business and public policy. A computer system that could consistently and accurately predict future events or model complex behavior on a massive scale is almost magical, and these algorithms have been employed to price our car and health insurance, and model what we'll buy next when shopping.

Government agencies are also beginning to use predictive analytics to identify perpetrators and victims of crime, under the assumption that these models can allow for early police or social services interventions that can ultimately save lives. One of the more public examples of this technology is a Chicago program that attempts to reduce gun violence using arrest and demographic data to identify potential shooters and victims, and allow police to increase surveillance or intervene directly in an attempt to avert tragedy. This program has come under increasing scrutiny, with skeptics suggesting this is a direct violation of civil liberties, since "suspects" are targeted for activities a computer predicts they'll commit, while advocates maintain that violence is averted by focusing scarce city resources in a more effective manner. While most IT leaders won't have to deal with these types of accusations, examining both sides of this argument is worthwhile as business increasingly employs these types of algorithms.

SEE: AI can predict outcome of human rights trials, but should it?

The pro-predictive argument

Whether an algorithm is attempting to identify criminals or pricing car insurance, a predictive algorithm can do what computers do best: apply a defined set of rules with speed and scale. Humans could never decide which of a hundred discounts to offer a shopper on Black Friday, but a well-designed system performs this task with ease. Similarly, predictive algorithms are increasingly helping us on a daily basis, whether it's providing a faster commute using Waze or Google Maps and their predictive traffic analysis, or complex fraud detection algorithms that silently protect our financial data and networks. In either case, these tools quickly identify patterns and separate the signal from the noise, and allow humans to focus their limited attentions where most needed.

Proponents of predictive algorithms also argue that algorithms, ultimately a series of mathematical functions, are inherently unbiased. The designers of these algorithms may have included assumptions and shortcuts to model complex environments, or over or underrepresented some variables, but these can ultimately be tweaked and improved with relative ease. Like any system, a predictive algorithm is only as good as its model and the data that are available, once again validating the old computing axiom of GIGO (Garbage In, Garbage Out). Proponents ultimately argue that any "bias" inherent in an algorithm is the fault of the creators, not the math itself. In the case of Chicago's gun violence predictive toolkit, proponents also argue that any flaws in the system are ultimately outweighed by the benefit of saving lives.

SEE: How data and machine learning are 'part of Uber's DNA'

The anti-algorithm stance

The subtle nuances of algorithms—the potential for overreliance on some variables, or simplified modeling and assumptions—is the foundational concern for those against using algorithms on decisions that have significant human implications, like granting financing or "predicting" behaviors that might result in limiting civil liberties. These concerns combine with questions around transparency and oversight. What actions should a government or company be allowed to take based on a predictive model, and what recourse do those affected by the model have should they be unfairly targeted?

In the case of Chicago, analysts have questioned the weighting the algorithm places on a criminal record or demographic data, factors that would preemptively punish certain segments of the population. The City has declined to release the models that power its predictive algorithm, furthering accusations ranging from the model being overly simplistic to outright racial profiling, for example the equivalent of an "Is this person black or Hispanic" flag that tilts the outcome toward assuming future criminal behavior.

In the business context, credit scores have often been incorporated into service pricing and even employment models. The theory is that those with good credit are more responsible, and in some cases people with good credit and a drunk driving conviction have received better car insurance rates than those with poor credit and a flawless driving record.

Some of the more extreme detractors of predictive models have suggested they could be manipulated and misused. For example, a rogue politician could manipulate an algorithm to flag his or her political opponents for additional police scrutiny, or opaque tools shielded from public scrutiny could be used to detain people for "pre-crime" without providing a recourse, since it would be impossible to prove you won't do something that hasn't yet happened.

Where should IT leaders stand?

While few of us will deal with algorithms that could lock up our fellow citizens, we very well may discount our products or services based on complex analytical models where we're forced to rely on others' assurances that the math is sound. Rather than assuming your data scientists will do the right thing, at a minimum ensure you understand and can articulate the assumptions and key variables that are baked into these models. Work to help your peers understand what predictive analytics can and cannot do, and the risks of relying on what are essentially mathematical models as if they were some sort of infallible crystal ball. As you design your models, consider whether you'd want the details or assumptions you've included appearing in a news headline, especially if you'll be using demographic data like race, gender, or even locality. If you're placing significant emphasis on a single variable, like credit score, take the time to understand whether this variable accurately reflects what might be too complex for a single "stand in" variable to accurately represent.

Predictive analytics are a powerful tool, although one that's misunderstood and has significant potential for misuse due to malfeasance or simple carelessness. Understanding the pros and cons of this technology, and the arguments for and against its use, will help you design and deploy this tool appropriately.

Also see:
Fake news is everywhere. Should the tech world help stop the spread?
The secrets to big data project success for small businesses
Big data, business analytics to hit $203 billion by 2020, says IDC report
Smart cities: 6 essential technologies

About Patrick Gray

Patrick Gray works for a global Fortune 500 consulting and IT services company and is the author of Breakthrough IT: Supercharging Organizational Value through Technology as well as the companion e-book The Breakthrough CIO's Companion. He has spent ...

Editor's Picks

Free Newsletters, In your Inbox