Remember the computer-science maxim Garbage in, garbage out? So goes big data in artificial intelligence (AI). The historical data we use to train the machines in AI research continue to reflect biases that many of us have hoped to relegate to the past. But, if you ask the machines, women belong in the home and black men belong in prison. So what should you do if your company requires you to design systems that rely on big data that might be faulty or technologies such as voice or facial recognition that have proven to perpetuate gender and racial biases? Start by understanding the problem so you can avoid the mistakes of the past.
Voice Recognition: The Can-You-Hear-Me-Now? Problem
Most of us now have voice-recognition tools in the palms of our hands. Many of us have them on our kitchen counters, too. Every time you tell Alexa to play music, ask Siri to set a timer, or request directions from Google, you are relying on your Echo, iPhone, or Android phone to recognize your speech. Your device could respond with either perfect accuracy or seemingly capricious inaccuracy. Because these are low-stakes requests, you might either persist or give up, depending on how badly you want to listen to “Blush” by Wolf Alice, while stirring your risotto.
You’ll shout, cram your face into your phone, and swear at it. But nothing good happens because, as multiple research studies have shown, the accuracy rates for speech recognition are not dependent on the volume of your voice or how close you are to the microphone—they’re dependent on your gender and the color of your skin.
A team of scientists published a study earlier this year that delved into the major, commercial speech-to-text tools from Amazon, Apple, Google, IBM, and Microsoft, assessing their potential racial disparities. With word error rates (WER) as high as 45%, each of the tools misunderstood black speakers about twice as often as white speakers. Plus, the average error rate was 35% for black speakers, but only 19% for white speakers. Similar research by Ethics in Natural Language Processing, on the accuracy of AI-generated captions on YouTube, demonstrated that the underlying algorithm is “13% more accurate for men than it is for women.”
The Ethics in NLP study included only native English speakers, but found further variations in error rates when they introduced dialects. Speakers from Scotland had less than a 50–50 chance of the speech-recognition tools understanding them. New Zealanders came out just ahead of the Scots and New Englanders just ahead of the Kiwis—unless you were a woman from New Zealand or New England. Then your chances of being understood went down to about 50%, while men from NZ and NE held steady at 70%. Speakers from California were the most-often understood group. This is not surprising. The technology companies that control the major voice-recognition tools are based in Silicon Valley. Plus, the teams responsible for developing this software are majority white and male. According to the Guardian, “Women make up just 12% of AI researchers.”
I’ve experienced this problem myself. As a white woman from Connecticut, my voice should be well inside the wheelhouse of my Android phone’s speech-recognition software. But, apparently, my speech resembles that of a Gilmore Girl who has lived in Boston for 26 years and occasionally places the emphasis on the wrong syllable—thanks to my Scottish-Italian husband. Every time I try to text my 14-year old, my phone insists that I’m trying to speak to someone named Ali. The phone forces me to accommodate its deficiencies and call my son Oliver. I have to remember this every time I message him—which is often because he’s 14 and stays locked in his room 16 hours a day. I get replies full of attitude because I usually call him Oliver only when I’m shouting at him. Who needs a voice assistant that doesn’t assist? Admittedly, this example is far from a life-and-death matter. Although it does increase the mental burden of this nightmare of a year and impairs my shaky relationship with my teenager.
What if you’re one of the 1.14 billion people on the planet who speak English as a second language? Speech-recognition error rates increase even more when English is the second language of the speaker—risking further marginalization of already vulnerable communities. Claudia Lopez Lloreda is a Puerto Rican–American freelance science writer whose iPhone refuses to recognize her name unless she pronounces it in North-Americanized English. She captures this struggle succinctly, “Either you assimilate, or you are not understood.”
Facial-Recognition: The Invisible-Man Problem
Depending on your point of view, the sexist and racist biases of image-recognition platforms have persisted—or even worsened. In June 2020, police used facial-recognition software on grainy, surveillance footage of what appeared to be a Black man stealing watches from a store in Detroit, Michigan. Almost predictably, the police arrested the wrong man because the machines cannot tell Black faces apart. Now, despite their dropping the charges against Robert Williams, he has an arrest record. As the ACLU explained:
“Robert’s DNA sample, mugshot, and fingerprints—all of which were taken when he arrived at the detention center—are now on file. His arrest is on the record. Robert’s wife, Melissa, was forced to explain to his boss why Robert wouldn’t show up to work the next day. Their daughters can never un-see their father being wrongly arrested and taken away….”
This is happening not only in the criminal justice system, which historically has been rife with bias, but in the media that we consume daily, too. The Guardian revealed in May that MSN.com had made plans to fire their human editors and replace them with a computer algorithm. Then the Guardian revealed in June that “Microsoft’s decision to replace human journalists with robots has backfired, after the tech company’s artificial-intelligence software illustrated a news story about racism with a photo of the wrong mixed-race member of the band Little Mix.” The AI hailed for changing the workplace couldn’t tell the difference between Jade Thirlwall and Leigh-Anne Pinnock’s very different faces.
Research by MIT and Stanford has shown how high the error rates go when computers try to identify dark-skinned faces. Joy Buolamwini and Timnit Gebru pointed three of the major facial-recognition algorithms at 1200 diverse photos and asked the software to identify the gender of the person in the image. They discovered that there were error rates as high as 46%. A 54% accuracy rate would have meant that the machines could have performed just as well by flipping a coin.
This is happening on Twitter, too. In September, The Verge reported that tweaks to Twitter’s facial-recognition algorithm had resulted in photo previews that heavily favored White faces over Black faces. Developers at Twitter had trained the software to focus on the most prominent face it could identify. Apparently the machines couldn’t see the Black faces. One user illustrated this flaw by posting his own image preview. In a tweet about Mitch McConnell and Barak Obama, Twitter’s image preview offered the user two images: one of Mitch McConnell and another of Mitch McConnell. The neural network even replaced Carl’s face with Lenny’s, demonstrating that this bias also applied to Black cartoon faces from “The Simpsons.”
In response to Twitter’s massive gaffe, Ayanna Howard and Charles Isbell speculated about why the developers hadn’t tested this feature better: “Sometimes people of color are present, and we’re not seen [or heard]. Other times we are missing, but our absence is not noticed. It is the latter that is the problem here.”
How Can You Eliminate Bias from Your User Experience?
If you’re starting to feel that users are out of luck if they’re not White dudes from Connecticut, Colorado, or California—that makes sense. The sexism and racism that are pervasive in artificial intelligence create a dangerous echo chamber in which White men continue to hold all the power. They quite literally hold the keys to the machine-learning algorithms, so the so-called smart devices using this software can see and understand only them. As Jeff Link puts it, these biases “could create a self-fulfilling feedback loop,” enabling only White men to use voice assistants and facial-recognition apps. The software relies on data that overrepresents White men because White men have designed it and optimized it for use by White men.
The answer is not to expect everyone who is not a White man to suffer through lengthy training so the machines can understand us—as the big car makers have suggested—when the real problem is the underlying data. At least one major technology company trains its facial-recognition software on a collection of images that is “more than 77 percent male and more than 83 percent white,” according to Buolamwini and Gebru’s research. Regarding the datasets supporting voice-recognition software, the technology companies ’t sharing the gender or racial makeup of those voices. Nevertheless, it’s safe to assume that the data include more White men’s voices than naturally occur in the wild. The answer is to fix the data we use to train the machines.
As the owner of your company, product, or service’s user experience, there are things you can do to address these inequities. First, know what’s in your data. Work with your clients’ or your in-house data-analytics team to understand what data they’re capturing. Are they collecting gender and racial data? If so, do those data reflect your company’s market segment? Get to know your customers, collaborate with your Sales and Marketing teams, and make sure that everyone is on the same page about who you are targeting and who you should be targeting for sales of your product or service. As the voice of the user, you should feel empowered to push your way into these conversations.
If your data do not match the demographics of your desired customers, consider throwing them out—either all of them or just the demographic data. To use the language of statistics, you’ve committed a sample frame error—that is, you’ve included the wrong people in your sample dataset. If the data don’t reflect your target audiences, they aren’t helping you. Trash them or at least ignore them.
Replace your existing data with synthetic training data. These are statistically realistic data models that you can feed into your company’s predictive algorithms. Several services exist that provide these artificial datasets. But beware, because they might perpetuate racism and sexism.
If you’re concerned about copying and pasting biases from old datasets to new ones, develop your own training data that looks and sounds like your target audiences. If you’re unsure who your audience is, look up U.S. Census data to ensure that your demographics reflect the world around you. Then go beyond your friends-and-family network when recruiting and engage research participants who mirror the Census. Use a recruiting company if you must to find the faces and voices that match your target audience. Ask for permission to photograph and record them. Turn them into better training data for your voice- and facial-recognition software. Then turn what you’ve learned about your participants into better personas that reflect your users’ real goals and frustrations and create design solutions that bring joy to all of your clients, patients, and customers—whether they’re Black, White, female, male, or belong to any other group of humans. You’ll be less likely to translate racial and gender biases into your designs if you’re aware of these biases and actively fight against them.
Although it is important to note that the machine-learning algorithms and the software based on them are improving all the time, these gender and racial biases could be with us for a long time to come. Juniper Research anticipates that there will be 8 billion digital voice assistants in use by 2023, with smart TVs growing more than 120% a year and wearables growing 40% a year. We’ll even have voice-activated cars in the near future. Plus, 90% of the new vehicles manufactured globally will include built-in voice assistants by 2028, according to Automotive World.
Meanwhile, the use of facial-recognition software is expanding into helping us find missing and exploited children and disoriented adults. Healthcare professionals are now using facial recognition to track medication adherence and pain management and to detect genetic diseases. Plus, its use is growing in retail markets to analyze shopping behavior and improve customer experiences in real time. This technology will become ubiquitous in the next five years, so we need to be prepared for the design challenges these technologies present.
Sarah is Founder and UX Guru at Black Pepper, a digital studio that provides customer research, UX design, and usability consulting services. Sarah designs complex mobile and desktop Web apps for healthcare, financial services, not-for-profit organizations, and higher-education organizations. Her focus is on UX best practices, creating repeatable design patterns, and accessible design solutions that are based on data-driven user research. Sarah researches and writes about bias in artificial intelligence (AI)—harnessing big data and machine learning to improve the UX design process—and Shakespeare. Sarah teaches user-centered design and interaction design at the Brandeis University Rabb School of Graduate Professional Studies and Lesley University College of Art + Design. Read More