Designing Conversational Assistants to Reduce Gender Bias

Lead Research Organisation: Heriot-Watt University
Department Name: S of Mathematical and Computer Sciences

Abstract

Biased technology disadvantages certain groups of society, e.g. based on their race or gender. Recently, biased machine learning has received increased attention. Here, we address a different type of bias which is not learnt from data, but encoded during the design process. We illustrate this problem on the example of Conversational Assistants, such as Amazon's Alexa, Apple's Siri, Microsoft's Cortana, or Google's Assistant, which are predominately modelled as young, submissive women. According to UNESCO, this bears the risk of reinforcing gender stereotypes.

In this proposal, we will explore this claim via psychological studies on how conversational gendering (expressed through voice, content and style) influences human behaviour in both online and offline interactions. Based on the insights gained, we will establish a principled framework for designing and developing alternative conversational personas which are less likely to perpetuate bias. A persona can be viewed as a composite of elements of identity (background facts or user profile), language behaviour, and interaction style. This framework will include state-of-the-art data-efficient NLP deep learning tools for generating dialogue responses which are consistent with a given persona. The persona parameters can be specified by non-expert users in order to to facilitate more inclusive design, as well as to enable a wider critical discussion.

Planned Impact

UNESCO points out that the "clock is ticking" for establishing appropriate design norms for conversational assistants: One the one hand, they are new enough that the public's perception is still highly malleable. On the other hand, the adoption of this technology is growing at an unprecedented rate: According to NPR and Edison Research (2018) users are picking up smart speakers at a much faster rate than they did smartphones or tablets. And Gartner predicts that 75% of U.S. households will have smart speakers by 2020. As such, this research has potential to reach and impact millions of customers. In order to realise this impact we will:

* Work with decision and policy makers such as the Scottish Parliament and the UNESCO to ensure oversight.
* Disseminate our results to industry via conference talks, industry-focused events and invited visits.
* Educate a future workforce and investigate how to attract a more diverse workforce into the sector in collaboration with existing training programmes, such as "Data Education in Schools" and "Equate Scotland".
* Engage the public via outreach activities and by facilitating participatory design workshops.
* Closely work with the BBC on a showcase demonstrator.
 
Description Biased technology disadvantages certain groups of society, e.g. based on their race or gender. Recently, biased machine learning has received increased attention. Here, we address a different type of bias which is not learnt from data, but encoded during the design process. We illustrate this problem on the example of Conversational Assistants, such as Amazon's Alexa, Apple's Siri, Microsoft's Cortana, or Google's Assistant, which are predominately modelled as young, submissive women. According to UNESCO, this bears the risk of reinforcing gender stereotypes.

In this project, we will explored this by first analysing people's perceptions towards exiting voice assistants. In particular, we investigated the use of gendered pronouns to refer to these systems in online forums. We found that Amazon's Alexa and Apple's Stir and predominately referred to as "she/her" whereas Google's Assistant is refereed as "it. We conclude that naming matters. We then investigate to which extend these systems are anthropomorphised, i.e. claim to engage in human activities such as eating or having emotions. We use an existing annotation scheme from "Living Machines" and find that Google Assistant is the most anthropomorphised one. This study is published as:

Gavin Abercrombie, Amanda Cercas Curry, Mugdha Pandya and Verena Rieser. Alexa, Google, Siri: What are Your Pronouns? Gender and Anthropomorphism in the Design and Perception of Conversational Assistants. ACL-IJCNLP 2021 3rd Workshop on Gender Bias in Natural Language Processing (GeBNLP 2021)

We also explore alternative designs for these systems by using "participatory design". We organise a workshop open to the public in collaboration with the Royal Society of Edinburgh. We find that people largely disagree what design they would prefer and conclude that designs may either be personalised or the impact of the design, e.g. in terms of positive or negative user behaviour, should be measured objectively. This study was published as:

Amanda Cercas Curry, Judy Robertson and Verena Rieser. Conversational Assistants and Gender Stereotypes: Public Perceptions and Desiderata for Voice Personas. 2nd Workshop on Gender Bias in Natural Language Processing (GeBNLP) at COLING2020

We then conducted 2 industry case studies to discover the design process of conversational assistants. In particular, we conducted interviews with the BBC's design team of BEEP and the social robot Jibo from MIT. We are currently analysing the transcripts.


Next, we conducted several experiments regarding abuse detection and abuse mitigation towards voice assistants. We gathered and released a dataset and trained a classifier. The results are published at EMNLP (prime venue).

Amanda Cercas-Curry, Gavin Abercrombie and Verena Rieser. ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021. (long paper)

We then conducted a study on abuse mitigation strategies and found that there is an interaction between the "appropriateness" rating and the gender of the voice -- where CounterSpeech by female artificial voices is rated significantly lower. We hypothesis that this reflects the stereotype that women should not "talk back".

Luca M. Leisten and Verena Rieser. ""I Like You, as a Friend": Voice Assistants' Response Strategies to Sexual Harassment and Their Relation to Gender." Human Perspectives on Spoken Human-Machine Interaction SpoHuMa 2022

Finally, we collaborated with Facebook AI/ Meta on the more general question of safety in large language models, which is closely related to the bias problem. We published a series of papers, conducted several workshops, and released several resources including datasets and a "SafetyKit" evaluation tool.


Emily Dinan and Gavin Abercrombie and A. Stevie Bergman and Shannon Spruit and Dirk Hovy and Y-Lan Boureau and Verena Rieser. SafetyKit: First Aid for Measuring Safety for Open-domain Conversational Systems. Proceedings of the 60th Conference of the Chapter of the Association for Computational Linguistics, ACL 2022. (long paper)

A. Stevie Bergman, Gavin Abercrombie, Shannon Spruit, Dirk Hovy, Emily Dinan, Y-Lan Boureau and Verena Rieser. Guiding the Release of Safer E2E Conversational AI through Value Sensitive Design. 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2022)

Gavin Abercrombie and Verena Rieser. Risk-graded Safety for Handling Medical Queries in Conversational AI. 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, AACL-IJCNLP 2022 [Arxiv]
Exploitation Route Our findings of abuse mitigation strategies of voice assistants have already had an impact: Voice assistants have changed their responses to abuse from "flirt-y" to being more assertive. Anecdotally, I was told by people at Google and Apple that my research and invited talks I gave at their research and design labs influenced this decision.

I was also invited to several panels, focus groups and keynotes to talk about this issue; as well as several pieces in international media outlets and broadcasts.

The SafetyKit software is hosted at Meta and is openly accessible, see https://parl.ai/projects/safety_bench/
Sectors Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice

URL https://sites.google.com/view/convai-gender-bias
 
Description Our findings of abuse mitigation strategies of voice assistants have already had an impact: Voice assistants have changed their responses to abuse from "flirt-y" to being more assertive. Anecdotally, I was told by people at Google and Apple that my research and invited talks I gave at their research and design labs influenced this decision. I was also invited to several panels, focus groups and keynotes to talk about this issue; as well as several pieces in international media outlets and broadcasts. For example, I had the chance to speak on Radio 4 about "whether chatbots could solve the loneliness problem". The SafetyKit software is hosted at Meta AI/ Facebook and is openly accessible.
First Year Of Impact 2020
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Societal

 
Description DATAIA scientific advisory board
Geographic Reach Europe 
Policy Influence Type Participation in a guidance/advisory committee
URL https://www.dataia.eu/linstitut/le-conseil-scientifique
 
Description AI for Good
Amount £15,000 (GBP)
Organisation Nesta 
Sector Charity/Non Profit
Country United Kingdom
Start 03/2020 
End 09/2020
 
Description AISEC: AI Secure and Explainable by Construction
Amount £807,165 (GBP)
Funding ID EP/T026952/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 08/2020 
End 08/2023
 
Description Leverhulme Trust Senior Research Fellowship 2020
Amount £47,000 (GBP)
Funding ID SRF\R1\201100 
Organisation The Royal Society 
Sector Charity/Non Profit
Country United Kingdom
Start 09/2020 
End 08/2021
 
Description Postdoctoral & Early Career Excanges (PECE)
Amount £2,750 (GBP)
Organisation SICSA Scottish Informatics and Computer Science Alliance 
Sector Academic/University
Country United Kingdom
Start 10/2021 
End 01/2022
 
Title BLOOM Large Language Model 
Description We created BLOOM the first publicly available large language model. This was a year-long collaboration as part of the BigScience workshop with several hundred of international scientists. I co-led one of the working groups. BLOOM stands for BigScience Large Open-science Open-access Multilingual Language Model. 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? Yes  
Impact First publicly available "foundational model". Widely used and compared in the community. The ambition is to boost academic research and public benefits in competition to privately owned models, e.g. ChatGPT etc,. 
URL https://huggingface.co/bigscience/bloom
 
Title ConvAbuse data 
Description Dataset associated with the EMNLP2021 paper "ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI." 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact Other researchers using this dataset 
URL https://github.com/amandacurry/convabuse
 
Title ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI 
Description Anthology paper link: https://aclanthology.org/2021.emnlp-main.587/ Abstract: We present the first English corpus study on abusive language towards three conversational AI systems gathered 'in the wild': an open-domain social bot, a rule-based chatbot, and a task-based system. To account for the complexity of the task, we take a more 'nuanced' approach where our ConvAI dataset reflects fine-grained notions of abuse, as well as views from multiple expert annotators. We find that the distribution of abuse is vastly different compared to other commonly used datasets, with more sexually tinted aggression towards the virtual persona of these systems. Finally, we report results from bench-marking existing models against this data. Unsurprisingly, we find that there is substantial room for improve-ment with F1 scores below 90%. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact n/a 
URL https://underline.io/lecture/37849-convabuse-data,-analysis,-and-benchmarks-for-nuanced-detection-in...
 
Title GeBNLP2021 
Description Data and annotation guidelines from the paper "Alexa, Google, Siri: What are Your Pronouns? Gender and Anthropomorphism in the Design and Perception of Conversational Assistants.", presented at the 3rd Workshop on Gender Bias in Natural Language Processing (GeBNLP 2021). 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact n/a 
URL https://github.com/GavinAbercrombie/GeBNLP2021
 
Description Amazon SimBot Challenge 
Organisation Amazon.com
Department Amazon UK
Country United Kingdom 
Sector Private 
PI Contribution My student team was selected to participate in the Amazon SimBot challenge.
Collaborator Contribution Our entry is supported with a grant from Amazon and in-kind contributions such as an invited visit to Amazon headquarters in Seattle as well as 2 days of workshops with Amazon staff.
Impact We expect a number of outcomes, including publications, student internships, and raising the international profile of our lab and university in this research area.
Start Year 2021
 
Description Apple NLU research award 
Organisation Apple
Country United States 
Sector Private 
PI Contribution This research gift supports research on low-resource Natural Language Generation.
Collaborator Contribution Research gift and monthly meetings.
Impact not yet
Start Year 2021
 
Description Google Dialog and NLU research award 
Organisation Google
Country United States 
Sector Private 
PI Contribution This research gifts supports an informal collaboration between Google Zurich and my group on topics related to dialogue systems and Natural Language Understanding.
Collaborator Contribution We received a research gift from Google to support research expenses.
Impact The award has supported my group with hardware, travel and data services (such as transcriptions and crowdsourcing)
Start Year 2020
 
Description Article in New Statesman 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Spotlight article in the New Statesman on my research on Gender Bias in Conversational Assistant technology.
Year(s) Of Engagement Activity 2021
URL https://www.newstatesman.com/spotlight/2021/09/does-how-you-talk-to-your-ai-assistant-matter
 
Description BBC 4 Radio Broadcast 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Expert contributor to BBC Radio Broadcast: "Could a virtual friend solve Britain's loneliness epidemic?"
Year(s) Of Engagement Activity 2022
URL https://www.bbc.co.uk/sounds/play/m001b44n
 
Description BBC Future Article 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Interview for BBC article covering the gender bias in voice assistants
Year(s) Of Engagement Activity 2022
URL https://www.bbc.com/future/article/20220614-why-your-voice-assistant-might-be-sexist
 
Description CNBC Interview 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Interview with CNBC on AI trends/ research predictions for 2022
Year(s) Of Engagement Activity 2021
URL https://www.cnbc.com/2022/01/07/deep-learning-and-large-language-how-ai-is-set-to-evolve-in-2022.htm...
 
Description Cosmopolitan article 2022 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Contributed interview to Cosmopolitan article: ""AI voice assistants are often women: Here's why it's a problem"
From Alexa to Cortana, Tech prefers female voices. But how harmful are the effects?"
Year(s) Of Engagement Activity 2022
URL https://www.cosmopolitan.com/uk/entertainment/a41677473/ai-voice-assistants-women/
 
Description Edinburgh Science Festival Event 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Online event as part of the Edinburgh Science Festival. The event consisted of a pre-recorded panel discussion, with live chat Q&A with the panellists. The film has been by over 400 viewers and was one of the top three most viewed videos at the Science Festival 2021.
Year(s) Of Engagement Activity 2021
URL https://www.youtube.com/watch?v=fhhZRck0TDA
 
Description Gendering AI: the Case of Conversational Assistants | Edinburgh Science Festival 2021 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact My team was invited to organise an event at the Edinburgh science Festival 2021 on "Gendering AI: the Case of Conversational Assistants"
Year(s) Of Engagement Activity 2021
URL https://www.youtube.com/watch?v=fhhZRck0TDA
 
Description Heriot-Watt Engage - West Lothian Libraries 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Online talk aimed on "Understanding Online Abuse," primarily at users of public libraries in West Lothian, Scotland.
Year(s) Of Engagement Activity 2021
URL https://www.eventbrite.co.uk/e/understanding-online-abuse-an-artificial-intelligence-challenge-ticke...
 
Description Science outreach activity 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Drawing activity for children exploring design of conversational assistants.
Year(s) Of Engagement Activity 2022
URL https://www.whatsoninedinburgh.co.uk/event/102553-edinburgh-science-festival:-datasphere/
 
Description The Times article on Gender Based Abuse 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Interview contribution Times article covering online Gender Based Violence and how to use NLP/ ML algorithms to defend against it.
Year(s) Of Engagement Activity 2022
URL https://www.thetimes.co.uk/article/coders-make-algorithm-to-fight-online-gender-abuse-kdf5q29cf