Decoding Bias in AI: A No-Nonsense Guide to Better Machine Learning

Listen to this article.

Let’s talk biases. You have them, I have them. These biases are deeply engrained within us (whether we like it or not) and they shape our perceptions, beliefs, and behaviors, influencing everything from the way we interact with others to the decisions we make. And guess what? They don’t just stop there. These biases have a ripple effect, shaping our social structures, institutions, and ultimately, the future of our species. More recently, we have been forced to face these biases ending up in places we didn’t expect, from our phones, to search engines, and even to the tech we use to communicate with each other.

Today, I want to take you on an adventure into the world of artificial intelligence (AI) – more specifically, the biases that we bake into AI. Now, before you roll your eyes and think, “Isn’t this stuff for software developers or data scientists?” let me assure you, it impacts all of us, whether we’re aware of it or not.

How Bias Manifests in AI

Now, before we dive in, let’s get one thing straight: AI, is not a sentient being with its own beliefs and prejudices (yet). These algorithms are only as ‘good’, or as ‘bad’, as the data they’re trained on. So when we talk about bias in AI, we’re really talking about human biases that have been inadvertently incorporated into these systems.

A robot learning, generated by Bing Creator

One of the most well-known examples of this is Amazon’s AI recruitment tool. This AI was trained to vet job applications by learning from the patterns in the company’s past hiring decisions. Now, the problem here is that the existing data was already biased – male identifying people were overrepresented in the company’s tech workforce. So, the AI learned to prefer male candidates over female ones, because that’s what the data told it to do. As a result, Amazon had to scrap this system in 2018 due to its clear gender bias.

But it’s not just recruitment. AI bias can affect all sorts of industries. Let’s look at facial recognition technology. It’s been found that these systems have significantly higher error rates when identifying people of color compared to white people. That’s a huge problem when these systems are used in security or law enforcement applications. In one case, a man in Detroit was wrongfully arrested due to a false facial recognition match.

Bias in Training Data

So, why does this happen? Well, it’s all about the training data. AI learns by analyzing huge amounts of data and identifying patterns. If the data it’s trained on is biased, the AI will learn those biases too. It’s kind of like teaching someone how to cook by showing them only YouTube videos form 2014 – they are going to end up eating a lot of cinnamon and cold water thanks to the prevalence of the cinnamon and ice bucket challenges.

Think of AI like a super-advanced parrot. It mimics what it sees without understanding the context. It’s not inherently biased, but it can learn to be biased if the data it’s trained on is skewed in some way. So, when we talk about bias in AI, we’re not blaming the parrot – we’re questioning the data it’s been fed. So how can this bias sneak into our algorithms? There are a few common ways:

Skewed Training Data

So, what do we mean by “skewed training data”? In essence, it’s when the data used to train the AI doesn’t accurately reflect the diversity and complexity of the real world. Just like a student can’t learn about world history by only reading about Tasmania (as great as we are down here), an AI can’t learn to understand the world from a limited or biased dataset (but their knowledge of potatoes and apples will be outstanding).

Here’s a crunchy example for you: let’s say we’re training an AI on facial recognition. We feed it a bunch of photos so it can learn to distinguish one face from another. But here’s the catch: if most of the faces in those photos are of light-skinned people, the AI is going to get really good at recognizing light-skinned faces, and not so good at recognizing darker-skinned faces.

This is exactly what happened with some commercial facial recognition systems. They were found to have higher error rates for darker-skinned and female faces. This isn’t because the AI is prejudicial by design, but because it was trained on a dataset that didn’t represent the full range of human skin tones.

But skewed training data isn’t just a problem for facial recognition. It can impact any area where AI is used, from voice recognition to medical diagnosis. If an AI designed to diagnose skin cancer is mostly trained on images of light-skinned patients, it may not perform as well when presented with images of dark-skinned patients.

And it’s not just about race or gender. AI can be skewed by any factor that’s over or under-represented in the training data, like age, location, income level, enthusiasm for tea, or even something like whether photos were taken indoors or outdoors. The key takeaway here is that an AI is only as good as the data it’s trained on. If that data is skewed, the AI’s performance will be skewed too.

Bias in Labeling

In order to understand the world, AI relies on labeled data. For example, to teach an AI to recognize an apple, you’d show it lots of pictures of apples, each labeled “apple” (or “delicious crunchy orb of happiness”, it doesn’t really matter, as long as the label is consistent). Pretty simple, right? But here’s the thing: not all labels are as straightforward as “apple”.

Imagine you’re training an AI to moderate online content and flag anything “inappropriate”. This term can be quite subjective, varying greatly depending on cultural, personal, or societal norms. What one person finds inappropriate, another might find perfectly acceptable. So, if the team labeling the data has a narrow or biased view of what’s “appropriate”, what do you think will happen? That’s right, the AI will adopt that same narrow view.

For instance, if the team labeling the data deems any image showing skin as “inappropriate”, the AI will learn to flag any such image – from a picture of a baby’s chubby arm to a photo of clothes that are a similar color to that of the wearer. On the other hand, if the team has a more permissive approach, the AI might let through content that others might find offensive or harmful. Anyone who has run Facebook Ads which include products that you put on your body (such as clothes or jewelry) or to help your body (health supplements or fitness) knows the frustration of encountering ad disapprovals for something that might not even be skin (I had an issue with pictures of eggs in a carton being picked up as body parts).

And it’s not just about content moderation. This problem can manifest in other areas as well. Let’s consider an AI designed to diagnose medical images. If the doctors labeling the data have different standards or levels of experience, the labels – and thus the AI’s learning – could be inconsistent. One doctor might label a scan as showing early signs of a disease, while another might consider the same scan normal. The AI would be left confused, leading to less accurate diagnoses.

The bottom line is this: AI learns from the labels we give it, so it’s crucial to ensure those labels are as objective and unbiased as possible. It’s a complex issue, but it’s one we need to tackle if we want AI to make fair decisions.

Problem Framing

First off, what is “problem framing”? Well, when we’re developing an AI system, we need to give it a clear goal or problem to solve. This is kind of like telling your GPS where you want to go. But the destination we choose for the AI can influence the route it takes and the decisions it makes along the way.

Let’s say we’re developing an AI for hiring (a bit like the one mentioned above). We need to define what a “successful” hire looks like. Now, one approach might be to train the AI on data from past employees who were considered successful. Sounds reasonable, right?

But hold on a minute! What if our definition of “success” is based on past biases? For example, what if the “successful” employees in our data set were disproportionately from a certain educational background or economic background, or fit a certain personality type? Our well-meaning AI doesn’t know any better – it just follows the data. So, it might end up favoring applicants who fit that same old mold, while overlooking others who could be just as successful, if not more, given the chance, and this can be a diversity killer.

This is the crux of bias in problem framing. When we frame a problem based on biased or narrow definitions of success, we risk perpetuating those biases in the AI’s decisions. It’s a subtle form of bias, but it can have significant real-world impacts. The takeaway here is that when we’re developing AI, we need to think critically about how we’re framing the problem. We need to ensure our definitions of success are fair and inclusive, reflecting the diversity of the real world.

How Do We Transcend Bias in AI

We know that being able to transcend the limitations of bias are key to having AI work for us and not against us. So, what can we do about it? How do we start untangling this web of biases?

Diverse Datasets

It is important to ensure that the data used for training represents various demographic groups, including different genders, races, ethnicities, and socioeconomic backgrounds. By incorporating a wide spectrum of people and scenarios, AI models can learn to handle diversity more effectively in real-life applications. Additionally, datasets should be carefully curated to avoid reinforcing existing biases present in society.


AI developers should be open about the data sources they use, the methodology they employ, and the limitations of their AI systems. By being transparent, developers can foster trust, allow external scrutiny, and encourage the identification and rectification of biases. Additionally, transparency can help users understand the limitations and potential biases of AI systems, allowing for more informed interactions.

Regular Auditing

Continuous testing and auditing of AI systems for bias is essential. This involves evaluating the system’s outputs using various methods and metrics to detect any biases that may emerge. By actively monitoring and assessing AI systems, developers can identify areas where biases may be present and take corrective measures. Regular audits also help in identifying and addressing biases that may arise due to evolving data patterns or changes in the user base.

Ethical AI Guidelines

Adopting a set of guidelines for ethical AI development provides a framework for building AI systems that prioritize fairness and user diversity. These guidelines should encompass principles such as fairness, accountability, transparency, and inclusivity. By incorporating these guidelines into AI development practices, developers can ensure that bias mitigation becomes an integral part of the process from the early stages. We can also understand when organizations may be incorporating bias into their AI development practices for self-serving reason (i.e. monetary or political gain).

User Education

Educating end-users about how AI works and its potential for bias is crucial. By providing users with information about AI systems, their limitations, and potential biases, they can develop a critical understanding of AI technology. User education can empower individuals to question and challenge AI systems, helping to identify biases and hold developers accountable. It also encourages users to be aware of the limitations and potential risks associated with AI, fostering a more informed and responsible usage of AI systems.

Final Thoughts

Addressing bias in AI isn’t a one-and-done deal. It is not enough to simply implement bias mitigation techniques during the initial development stages of an AI system. As societal norms and understanding evolve, new biases may emerge, or existing biases may manifest in different ways. Developers and researchers must remain proactive in their efforts to identify and rectify biases as they arise. Let’s be real, are the gatekeeps of this technology at this current point in time.

But it’s not all about big corporations and data scientists here. As I see it, we all have a role to play in staying informed about the latest research and best practices in bias mitigation. I’m not saying you should quit your day job or stop seeing your finds so you can endlessly scroll read though whitepapers and other documented research. Just be aware, that’s all I’m saying – don’t take what the media or an organization has to say about a topic as the truth.

More broadly, as a community we need to be actively seeking out diverse perspectives and engaging with experts in fields such as ethics, social sciences, and human rights. By keeping abreast of developments in these areas, AI practitioners can gain valuable insights into the nuances of bias and discrimination, and apply that knowledge to improve their AI systems.

Learning is a fundamental aspect of addressing bias in AI. It involves acknowledging that biases exist and being willing to challenge one’s own assumptions and biases. Developers must continuously educate themselves about the social, cultural, and historical contexts that can contribute to bias. This includes understanding the biases that may be present in the training data, the algorithms being used, and the potential biases that can arise from the interactions between AI systems and users. By actively seeking knowledge and fostering a learning mindset, developers can better navigate the complexities of bias and work towards creating fairer AI systems.

Striving for fairer and more equitable AI systems should be an ongoing objective for us all. It requires a commitment to not only identify and rectify biases but also to continuously improve the overall fairness of AI systems and to participate in the process. After all, AI is a tool, and like any tool, its effectiveness depends on the user’s awareness and intention.

					if ('You Have Feedback' == true) {
  return 'Message Me Below!';
Picture of neobadger


I'm a Technology Consultant who partners with visionary people who want to solve human problems using data and technology (and having fun doing it)!


Want to dig a little deeper? Send me a message!
🎉 Nice work, that was a long article!