With any emergent technology, there are always challenges to overcome as adoption and awareness grow. Those hurdles can seem mountainous when two things happen simultaneously: millions of people start using and discussing the technology immediately and when that tech has the potential to transform our productivity levels and how we interact with our devices.
Generative AI is in this nuanced position right now. One of the seemingly immense and complex issues it faces is the idea of bias showing up in both the powerful language learning models training these tools and in the outcomes they produce. But addressing bias in AI is much more difficult than telling a tool like Jasper Chat or ChatGPT, “Don’t be biased in your generations” or “Give me objective results.”
For this series on ethics and responsible AI use, I asked experts for their opinion around the idea of bias in model training and outcomes. Where does it come from? How does it show up? How can we stop it from happening? What does the future of combating bias look like?
First up to address these obstacles is Rosie Campbell, who is on the Trust and Safety team at OpenAI. She develops policies to ensure products like GPT-3, Dall-E and ChatGPT are used responsibly, with a particular focus on risk mitigations for increasingly advanced AI systems. Prior to that, she worked as Head of Safety for Critical AI at the Partnership on AI and as Assistant Director at the Center for Human-Compatible AI.
Let’s hear her thoughts on AI bias in LLM models and what they produce.
Where does bias within AI models come from? Where does this underlying issue start?
Bias is an overloaded term in AI - sometimes it’s used to mean “the training data was poorly sampled and therefore does not reflect the real world accurately”, and sometimes it means “the training data does reflect the real world, but the world itself is biased in ways we don’t endorse, and we don’t want AI systems to amplify these biases”.
For example, imagine you’re training a language model using text from the internet. If you try to fairly sample the text available online, your model will likely be stronger in English compared to any other language since over half of the most popular websites are in English. However, less than 20 percent of the world’s population speak English! How you determine the bias of your model would depend on whether your goal is to represent the internet or the world’s population.
There are many decisions to be made in designing a neural network beyond just the training data, such as the architecture, the loss function and the training method; these can all affect the performance of the system. And if not chosen carefully and tested thoroughly, they could result in biased outputs that have real-world consequences when deployed, especially in high-stakes domains such as healthcare.
What are some of the ways you’ve seen bias creep into the results generative AI tools produce?
An obvious example is how image generation tools can perpetuate outdated stereotypes. If you ask for a CEO, they will often return images of a middle-aged white man in a suit. If you ask for a flight attendant, you’re likely to get a young woman. At OpenAI, we try to test our models for these kinds of issues and be transparent about the limitations, for example via system cards. We also experiment with different ways to improve the performance and reduce bias.
These kinds of stereotypes can also surface in text models in more subtle ways. For example, it might tend to use different adjectives when describing male characters vs female characters, or it might make offensive assumptions about someone’s background or personality based on their race.
It can be particularly challenging in areas such as summarization, where the model has to make subjective judgements about what information is relevant to include in a summary. For example, if an AI system is asked to summarize a news article with a strong political bias, should it try to represent the article accurately by ensuring the summary is also biased, or should it try to overcome the bias of the article and produce a more balanced summary? Again, the desired behavior strongly depends on the intended use case, and requires conscious design choice.
"If an AI system is asked to summarize a news article with a strong political bias, should it try to represent the article accurately by ensuring the summary is also biased."
What’s one of the most difficult parts of addressing bias in AI?
One difficult aspect is how hard it is to detect. It’s usually impossible to tell from an individual output whether a system is biased overall. For example if an AI system is being used to make decisions about creditworthiness, any bias in its decision-making would likely only show up at scale by analyzing across different populations. Similarly, it would be very difficult to detect if your AI-recommended news feed was biased in some way without comparing it with many other people’s.
A related issue is inspecting why an AI system produced a certain output. Unlike conventional computer programs, we can’t just take a look at the source code and deduce what it’s doing. Currently, the internal representations developed by neural networks are largely incomprehensible to humans. In practice, this means if we suspect an AI system made a biased credit decision, there’s no easy way to check whether or not it’s acting fairly. Even if we ask the system to explain its decision, it may produce something that sounds plausible and unobjectionable but there’s no guarantee it mapped to what actually happened.
What are some of the major ways bias is being curbed in generative AI outcomes?
Since it’s easy to introduce bias through the training data, one of the main ways you can curb bias in AI is to pay special attention to how you collect and sanitize your training data, keeping in mind whether it is representative for your intended use case. In addition, it can be a good idea to fine-tune the model (i.e. take a pretrained model and tailor it to your specific use case with carefully selected data).
One of the most important things is to thoroughly test your system in its intended domain. Red-teaming can be a useful technique here, which involves mandating a group of people to find creative ways to “break” your system: to try weird edge cases and see how it handles them. Increasingly, we’re seeing audits and research (such as GenderShades) on benchmarks that can help identify signs of bias. Publishing what you learn from these tests (via model cards, for example) helps keep people informed about the risks and limitations of your system.
Even if the model itself might be biased, it is sometimes possible to mitigate negative consequences in deployment. For example, prompt engineering (including instructions in the prompt “behind the scenes” from the end-user) can help keep a model on topic and avoid it veering into areas where bias could cause harm. In ChatGPT, we make use of our ModerationAPI to reduce the chance that it will respond to inappropriate requests.
While these are useful techniques, unfortunately it’s not always possible to entirely prevent bias via technical solutions. Having an accountable “human in the loop” to verify the outputs and decisions of AI systems is a useful mitigation. At OpenAI, our use case policies restrict the use of models for certain applications where we know the model may not perform adequately, and we collect user feedback on the quality of our model outputs in order to continuously improve them.
I mentioned previously the fact that the internal representations of neural networks are largely incomprehensible to humans, which makes it hard to identify bias. Fortunately, there is a lot of interesting research happening on AI “interpretability” that may give us insights into what is going on inside a network and help us understand and address the source of bias.
What are some important bias-related considerations that entities like Jasper, Open AI and other gen AI producers have to keep in mind as these models get more complex and use cases widen?
One of the key issues is how these biases can be reinforced and amplified. Imagine if for some reason we did a terrible job of sampling images for an image generation model and the result was that it was much more likely to produce purple-toned images than anything else. As people share these images online, they end up in the training dataset for future image generation models. Even if we fix our sampling method so that it’s more representative of the images that are out there, there are still way more purple-toned images online now! So our dataset will skew even more purple which means the new model is now more likely to produce purple-toned images and the cycle continues. We need to pay attention to the way that the outputs of our current systems could affect the performance of future systems and whether that’s what we intend.
Another issue is considering how a biased system could disproportionately benefit or harm different groups. AI is useful in so many ways that it’s very tempting to assume that there’s no downside to incorporating it into different workflows, especially when it seems to improve efficiency. However, especially with biased systems, even if it seems like on average things are more efficient, it’s possible that the distribution of benefits has changed without most people realizing. For example, a medical AI tool might be great at designing treatment plans for common conditions, but it may be much worse than the status quo when addressing rare health concerns — or maybe the other way around! Even if the average performance doesn’t change or it improves, it’s important to consider how distribution changes and who might be negatively affected by that.
"We need to pay attention to the way that the outputs of our current systems could affect the performance of future systems and whether that’s what we intend."
What would you tell someone worried about the future or generative AI tools being biased as more hit the market in the years to come?
Bias in AI is a major challenge. The good news is that there are a lot of smart and motivated people working on this problem and there are already techniques that can help address it.
Unfortunately, bias is far from the only issue we need to solve to ensure AI is beneficial to humans. For example, one thing we often see with large language models is their propensity to “hallucinate”, or generate plausible-sounding statements that are not actually based in reality. This is a huge issue for anyone using AI tools to assist with writing or knowledge work since you can never be totally sure that what is being produced is accurate. In addition, many people don’t know about this problem, so they may not even realize that the outputs shouldn’t be trusted. As these tools become more pervasive, we’ll need to improve our collective literacy on the strengths and limitations of these systems.
AI has the potential to be hugely transformative for society. We’re already seeing it enable new forms of creativity, expression and communication. We’re seeing signs that it might be helpful for solving some of the world’s most pressing problems. The flipside of course, is that such powerful technology comes with risks: it could cause societal disruption, be misused by malicious actors and/or be misaligned with human values. The challenge ahead of us is to ensure we are enabling the positive use cases while minimizing the risks.