How AI Bias Actually Occurs (and Why It’s Difficult to Fix)

May 22, 2019 - 8 minutes read

Today, the overwhelming majority of artificial intelligence (AI) applications are made possible through deep learning.

This subset of machine learning relies on the process of applying deep neural network architectures to make decisions or solve problems. Basically, multiple algorithm layers are applied to analyze a problem and produce a “probability vector,” which might say something like: “95% confident the object is a human, 25% confident the object is a fruit”.

You can think of deep learning as machine learning on steroids. Deep neural networks enhance machines’ abilities to identify even the smallest of patterns. And this pattern-finding augmentation is behind the biggest advancements in AI today.

But with great power comes great responsibility. For all the benefits that deep learning offers, it also has vast potential to cause catastrophic damage across industries due to bias. And being aware of this bias isn’t enough; we must understand the mechanics of how it occurs to properly address it.

How AI Bias Creeps In

Bias can enter the equation during numerous stages of the deep learning process. It doesn’t always boil down to using biased training data. In reality, bias can actually start forming before the data is even collected! Let’s cover three key stages where bias can creep in.

Framing Your Problem

When a team of machine learning developers sets out to create a deep learning model, they first have to decide upon what they actually want to achieve with it. For example, a credit card company may want to use AI to help predict a customer’s creditworthiness.

Cyber law or internet law concept with 3d rendering ai robot with law scale and gavel judge

But “creditworthiness” is a nebulous (and subjective) concept. To translate it into a discrete category that can be computed, the credit card company uses its priorities to guide the solution. Maybe the company wants to maximize the number of loans getting repaid. Or maybe it wants to optimize its profit margins. Regardless of which, this context informs how the company defines creditworthiness.

Data Preparation

Bias can also be introduced when you’re preparing the data for training. This stage relies on your AI team selecting the attributes you want the algorithm to consider and prioritize. This selection of attributes is known as the “art” of deep learning, and it can have a substantial impact on your model’s prediction accuracy.

In terms of creditworthiness, attributes could be the customer’s income, age, or the number of loans successfully paid off. In the case of Amazon’s recruiting debacle, attributes could have been the job candidate’s gender, experience, and education.

Why Bias Is So Hard to Address

So, why is bias so darn difficult to identify and address? Let’s cover four of the biggest challenges in mitigating it.

The Root of the Problem Isn’t Readily Apparent

During your deep learning model’s construction, the downstream impact of your data and choices aren’t easy to see. Thus, the introduction of bias isn’t always obvious. So retroactively identifying where it came from (and how to get rid of it) can be difficult.

When Amazon’s engineers initially discovered its recruiting tool’s sexist behavior, they tried to fix it by reprogramming it to ignore explicitly gendered words like “men’s” or “women’s.” But soon after, the team found out that the updated system was still using implicitly gendered words to make decisions.

What Now?

If you’ve come to the conclusion that bias is extremely difficult to fix, you’re not alone. “Fixing’ discrimination in algorithmic systems is not something that can be solved easily,” says Selbst. “It’s a process ongoing, just like discrimination in any other aspect of society.”

Fortunately, many AI developers and researchers are hard at work on various solutions. Some are even building algorithms to help detect and mitigate biases that may be hidden away in training data.

Do you think this big problem of AI bias can be properly addressed? And if so, what do you think the best solution is? Let us know in the comments!

Tags: , , , , , , , , , , , , , ,