Designing Conversational Assistants to Reduce Gender Bias

Lead Research Organisation: Heriot-Watt University

Department Name: S of Mathematical and Computer Sciences

Abstract

Biased technology disadvantages certain groups of society, e.g. based on their race or gender. Recently, biased machine learning has received increased attention. Here, we address a different type of bias which is not learnt from data, but encoded during the design process. We illustrate this problem on the example of Conversational Assistants, such as Amazon's Alexa, Apple's Siri, Microsoft's Cortana, or Google's Assistant, which are predominately modelled as young, submissive women. According to UNESCO, this bears the risk of reinforcing gender stereotypes.

In this proposal, we will explore this claim via psychological studies on how conversational gendering (expressed through voice, content and style) influences human behaviour in both online and offline interactions. Based on the insights gained, we will establish a principled framework for designing and developing alternative conversational personas which are less likely to perpetuate bias. A persona can be viewed as a composite of elements of identity (background facts or user profile), language behaviour, and interaction style. This framework will include state-of-the-art data-efficient NLP deep learning tools for generating dialogue responses which are consistent with a given persona. The persona parameters can be specified by non-expert users in order to to facilitate more inclusive design, as well as to enable a wider critical discussion.

Planned Impact

UNESCO points out that the "clock is ticking" for establishing appropriate design norms for conversational assistants: One the one hand, they are new enough that the public's perception is still highly malleable. On the other hand, the adoption of this technology is growing at an unprecedented rate: According to NPR and Edison Research (2018) users are picking up smart speakers at a much faster rate than they did smartphones or tablets. And Gartner predicts that 75% of U.S. households will have smart speakers by 2020. As such, this research has potential to reach and impact millions of customers. In order to realise this impact we will:

* Work with decision and policy makers such as the Scottish Parliament and the UNESCO to ensure oversight.
* Disseminate our results to industry via conference talks, industry-focused events and invited visits.
* Educate a future workforce and investigate how to attract a more diverse workforce into the sector in collaboration with existing training programmes, such as "Data Education in Schools" and "Equate Scotland".
* Engage the public via outreach activities and by facilitating participatory design workshops.
* Closely work with the BBC on a showcase demonstrator.

Funded Value:

£461,903

Funded Period:

Aug 20 - Aug 24

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/T023767/1

Principal Investigator:

Verena Rieser

Matthew Aylett

Research Subject:

Info. & commun. Technol. (78%)

Linguistics (22%)

Research Topic:

Artificial Intelligence (23%)

Computational Linguistics (22%)

Human-Computer Interactions (55%)

Organisations

People	ORCID iD
Verena Rieser (Principal Investigator)	http://orcid.org/0000-0001-6117-4395
Matthew Aylett (Principal Investigator)	http://orcid.org/0000-0001-7057-0525
Amanda Cercas Curry (Researcher)
David Howcroft (Researcher)	http://orcid.org/0000-0002-0810-9065

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 > >|

10 25 50

Abercrombie G (2021) Alexa, Google, Siri: What are Your Pronouns? Gender and Anthropomorphism in the Design and Perception of Conversational Assistants

Abercrombie G (2023) Resources for Automated Identification of Online Gender-Based Violence: A Systematic Review

Abercrombie G (2023) Temporal and Second Language Influence on Intra-Annotator Agreement and Stability in Hate Speech Labelling

Abercrombie G (2023) Mirages. On Anthropomorphism in Dialogue Systems

Abercrombie G. (2023) Resources for Automated Identification of Online Gender-Based Violence: A Systematic Review in Proceedings of the Annual Meeting of the Association for Computational Linguistics

Abercrombie G. (2021) Alexa, Google, Siri: What are Your Pronouns? Gender and Anthropomorphism in the Design and Perception of Conversational Assistants in GeBNLP 2021 - 3rd Workshop on Gender Bias in Natural Language Processing, Proceedings

Abercrombie G. (2023) Mirages. On Anthropomorphism in Dialogue Systems in EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings

Aylett M (2023) Embodied Conversational Agents: Trust, Deception and the Suspension of Disbelief

Aylett M (2024) Case study in choosing a graphical character to support reminiscence therapy for those living with dementia

Aylett M (2023) Why is my Agent so Slow? Deploying Human-Like Conversational Turn-Taking

Key Findings
Impact Summary
Policy Influence
Further Funding
Research Databases and Models
Research Tools and Methods
Collaboration
Engagement Activities


Description	Biased technology disadvantages certain groups of society, e.g. based on their race or gender. Recently, biased machine learning has received increased attention. Here, we address a different type of bias which is not learnt from data, but encoded during the design process. We illustrate this problem on the example of Conversational Assistants, such as Amazon's Alexa, Apple's Siri, Microsoft's Cortana, or Google's Assistant, which are predominately modelled as young, submissive women. According to UNESCO, this bears the risk of reinforcing gender stereotypes. In this project, we will explored this by first analysing people's perceptions towards exiting voice assistants. In particular, we investigated the use of gendered pronouns to refer to these systems in online forums. We found that Amazon's Alexa and Apple's Stir and predominately referred to as "she/her" whereas Google's Assistant is refereed as "it. We conclude that naming matters. We then investigate to which extend these systems are anthropomorphised, i.e. claim to engage in human activities such as eating or having emotions. We use an existing annotation scheme from "Living Machines" and find that Google Assistant is the most anthropomorphised one. This study is published as: Gavin Abercrombie, Amanda Cercas Curry, Mugdha Pandya and Verena Rieser. Alexa, Google, Siri: What are Your Pronouns? Gender and Anthropomorphism in the Design and Perception of Conversational Assistants. ACL-IJCNLP 2021 3rd Workshop on Gender Bias in Natural Language Processing (GeBNLP 2021) We also explore alternative designs for these systems by using "participatory design". We organise a workshop open to the public in collaboration with the Royal Society of Edinburgh. We find that people largely disagree what design they would prefer and conclude that designs may either be personalised or the impact of the design, e.g. in terms of positive or negative user behaviour, should be measured objectively. This study was published as: Amanda Cercas Curry, Judy Robertson and Verena Rieser. Conversational Assistants and Gender Stereotypes: Public Perceptions and Desiderata for Voice Personas. 2nd Workshop on Gender Bias in Natural Language Processing (GeBNLP) at COLING2020 We then conducted 2 industry case studies to discover the design process of conversational assistants. In particular, we conducted interviews with the BBC's design team of BEEP and the social robot Jibo from MIT. We are currently analysing the transcripts. Next, we conducted several experiments regarding abuse detection and abuse mitigation towards voice assistants. We gathered and released a dataset and trained a classifier. The results are published at EMNLP (prime venue). Amanda Cercas-Curry, Gavin Abercrombie and Verena Rieser. ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021. (long paper) We then conducted a study on abuse mitigation strategies and found that there is an interaction between the "appropriateness" rating and the gender of the voice -- where CounterSpeech by female artificial voices is rated significantly lower. We hypothesis that this reflects the stereotype that women should not "talk back". Luca M. Leisten and Verena Rieser. ""I Like You, as a Friend": Voice Assistants' Response Strategies to Sexual Harassment and Their Relation to Gender." Human Perspectives on Spoken Human-Machine Interaction SpoHuMa 2022 Finally, we collaborated with Facebook AI/ Meta on the more general question of safety in large language models, which is closely related to the bias problem. We published a series of papers, conducted several workshops, and released several resources including datasets and a "SafetyKit" evaluation tool. Emily Dinan and Gavin Abercrombie and A. Stevie Bergman and Shannon Spruit and Dirk Hovy and Y-Lan Boureau and Verena Rieser. SafetyKit: First Aid for Measuring Safety for Open-domain Conversational Systems. Proceedings of the 60th Conference of the Chapter of the Association for Computational Linguistics, ACL 2022. (long paper) A. Stevie Bergman, Gavin Abercrombie, Shannon Spruit, Dirk Hovy, Emily Dinan, Y-Lan Boureau and Verena Rieser. Guiding the Release of Safer E2E Conversational AI through Value Sensitive Design. 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2022) Gavin Abercrombie and Verena Rieser. Risk-graded Safety for Handling Medical Queries in Conversational AI. 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, AACL-IJCNLP 2022 [Arxiv]
Exploitation Route	Our findings of abuse mitigation strategies of voice assistants have already had an impact: Voice assistants have changed their responses to abuse from "flirt-y" to being more assertive. Anecdotally, I was told by people at Google and Apple that my research and invited talks I gave at their research and design labs influenced this decision. I was also invited to several panels, focus groups and keynotes to talk about this issue; as well as several pieces in international media outlets and broadcasts. The SafetyKit software is hosted at Meta and is openly accessible, see https://parl.ai/projects/safety_bench/
Sectors	Digital/Communication/Information Technologies (including Software) Government Democracy and Justice
URL	https://sites.google.com/view/convai-gender-bias


Description	Our findings of abuse mitigation strategies of voice assistants have already had an impact: Voice assistants have changed their responses to abuse from "flirt-y" to being more assertive. Anecdotally, I was told by people at Google and Apple that my research and invited talks I gave at their research and design labs influenced this decision. I was also invited to several panels, focus groups and keynotes to talk about this issue; as well as several pieces in international media outlets and broadcasts. For example, I had the chance to speak on Radio 4 about "whether chatbots could solve the loneliness problem". The SafetyKit software is hosted at Meta AI/ Facebook and is openly accessible.
First Year Of Impact	2020
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Societal


Description	DATAIA scientific advisory board
Geographic Reach	Europe
Policy Influence Type	Participation in a guidance/advisory committee
URL	https://www.dataia.eu/linstitut/le-conseil-scientifique


Description	UK AI Security Institute poll
Geographic Reach	National
Policy Influence Type	Contribution to a national consultation/review
URL	https://www.aisi.gov.uk/work/should-ai-systems-behave-like-people


Description	AI for Good
Amount	£15,000 (GBP)
Organisation	Nesta
Sector	Charity/Non Profit
Country	United Kingdom
Start	03/2020
End	09/2020


Description	AISEC: AI Secure and Explainable by Construction
Amount	£807,165 (GBP)
Funding ID	EP/T026952/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	08/2020
End	08/2024


Description	Leverhulme Trust Senior Research Fellowship 2020
Amount	£47,000 (GBP)
Funding ID	SRF\R1\201100
Organisation	The Royal Society
Sector	Charity/Non Profit
Country	United Kingdom
Start	08/2020
End	08/2021


Description	Postdoctoral & Early Career Excanges (PECE)
Amount	£2,750 (GBP)
Organisation	SICSA Scottish Informatics and Computer Science Alliance
Sector	Academic/University
Country	United Kingdom
Start	09/2021
End	01/2022


Title	BLOOM Large Language Model
Description	We created BLOOM the first publicly available large language model. This was a year-long collaboration as part of the BigScience workshop with several hundred of international scientists. I co-led one of the working groups. BLOOM stands for BigScience Large Open-science Open-access Multilingual Language Model.
Type Of Material	Improvements to research infrastructure
Year Produced	2022
Provided To Others?	Yes
Impact	First publicly available "foundational model". Widely used and compared in the community. The ambition is to boost academic research and public benefits in competition to privately owned models, e.g. ChatGPT etc,.
URL	https://huggingface.co/bigscience/bloom


Title	ConvAbuse data
Description	Dataset associated with the EMNLP2021 paper "ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI."
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	Other researchers using this dataset
URL	https://github.com/amandacurry/convabuse


Title	ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI
Description	Anthology paper link: https://aclanthology.org/2021.emnlp-main.587/ Abstract: We present the first English corpus study on abusive language towards three conversational AI systems gathered 'in the wild': an open-domain social bot, a rule-based chatbot, and a task-based system. To account for the complexity of the task, we take a more 'nuanced' approach where our ConvAI dataset reflects fine-grained notions of abuse, as well as views from multiple expert annotators. We find that the distribution of abuse is vastly different compared to other commonly used datasets, with more sexually tinted aggression towards the virtual persona of these systems. Finally, we report results from bench-marking existing models against this data. Unsurprisingly, we find that there is substantial room for improve-ment with F1 scores below 90%.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	n/a
URL	https://underline.io/lecture/37849-convabuse-data,-analysis,-and-benchmarks-for-nuanced-detection-in...


Title	GBV-Resources
Description	This repository serves as a comprehensive collection of resources for the automated identification of online Gender-Based Violence (GBV) and related phenomena.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	Research paper describing the repository available at https://aclanthology.org/2023.woah-1.17/
URL	https://github.com/HWU-NLP/GBV-Resources


Title	GeBNLP2021
Description	Data and annotation guidelines from the paper "Alexa, Google, Siri: What are Your Pronouns? Gender and Anthropomorphism in the Design and Perception of Conversational Assistants.", presented at the 3rd Workshop on Gender Bias in Natural Language Processing (GeBNLP 2021).
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	n/a
URL	https://github.com/GavinAbercrombie/GeBNLP2021


Description	Amazon SimBot Challenge
Organisation	Amazon.com
Department	Amazon UK
Country	United Kingdom
Sector	Private
PI Contribution	My student team was selected to participate in the Amazon SimBot challenge.
Collaborator Contribution	Our entry is supported with a grant from Amazon and in-kind contributions such as an invited visit to Amazon headquarters in Seattle as well as 2 days of workshops with Amazon staff.
Impact	We expect a number of outcomes, including publications, student internships, and raising the international profile of our lab and university in this research area.
Start Year	2021


Description	Apple NLU research award
Organisation	Apple
Country	United States
Sector	Private
PI Contribution	This research gift supports research on low-resource Natural Language Generation.
Collaborator Contribution	Research gift and monthly meetings.
Impact	not yet
Start Year	2021


Description	Google Dialog and NLU research award
Organisation	Google
Country	United States
Sector	Private
PI Contribution	This research gifts supports an informal collaboration between Google Zurich and my group on topics related to dialogue systems and Natural Language Understanding.
Collaborator Contribution	We received a research gift from Google to support research expenses.
Impact	The award has supported my group with hardware, travel and data services (such as transcriptions and crowdsourcing)
Start Year	2020


Description	2nd Workshop on Perspectivist Approaches to NLP (NLPerspectives 2023)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	This full-day workshop was held at the European Conference on AI (ECAI) in Krakow, Poland on the 30th September 2023. It was attended by 42 researchers and featured a keynote and panel featuring international guests. Nine research papers were presented including five archival papers published in the workshop proceedings at https://ceur-ws.org/Vol-3494/
Year(s) Of Engagement Activity	2023
URL	https://nlperspectives.di.unito.it/w/2nd-workshop-on-perspectivist-approaches-to-nlp/


Description	Article in New Statesman
Form Of Engagement Activity	A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Media (as a channel to the public)
Results and Impact	Spotlight article in the New Statesman on my research on Gender Bias in Conversational Assistant technology.
Year(s) Of Engagement Activity	2021
URL	https://www.newstatesman.com/spotlight/2021/09/does-how-you-talk-to-your-ai-assistant-matter


Description	BBC 4 Radio Broadcast
Form Of Engagement Activity	A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Media (as a channel to the public)
Results and Impact	Expert contributor to BBC Radio Broadcast: "Could a virtual friend solve Britain's loneliness epidemic?"
Year(s) Of Engagement Activity	2022
URL	https://www.bbc.co.uk/sounds/play/m001b44n


Description	BBC Future Article
Form Of Engagement Activity	A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Media (as a channel to the public)
Results and Impact	Interview for BBC article covering the gender bias in voice assistants
Year(s) Of Engagement Activity	2022
URL	https://www.bbc.com/future/article/20220614-why-your-voice-assistant-might-be-sexist


Description	CNBC Interview
Form Of Engagement Activity	A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Media (as a channel to the public)
Results and Impact	Interview with CNBC on AI trends/ research predictions for 2022
Year(s) Of Engagement Activity	2021
URL	https://www.cnbc.com/2022/01/07/deep-learning-and-large-language-how-ai-is-set-to-evolve-in-2022.htm...


Description	Cosmopolitan article 2022
Form Of Engagement Activity	A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Media (as a channel to the public)
Results and Impact	Contributed interview to Cosmopolitan article: ""AI voice assistants are often women: Here's why it's a problem" From Alexa to Cortana, Tech prefers female voices. But how harmful are the effects?"
Year(s) Of Engagement Activity	2022
URL	https://www.cosmopolitan.com/uk/entertainment/a41677473/ai-voice-assistants-women/


Description	Edinburgh Science Festival Event
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Online event as part of the Edinburgh Science Festival. The event consisted of a pre-recorded panel discussion, with live chat Q&A with the panellists. The film has been by over 400 viewers and was one of the top three most viewed videos at the Science Festival 2021.
Year(s) Of Engagement Activity	2021
URL	https://www.youtube.com/watch?v=fhhZRck0TDA


Description	Gendering AI: the Case of Conversational Assistants \| Edinburgh Science Festival 2021
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Public/other audiences
Results and Impact	My team was invited to organise an event at the Edinburgh science Festival 2021 on "Gendering AI: the Case of Conversational Assistants"
Year(s) Of Engagement Activity	2021
URL	https://www.youtube.com/watch?v=fhhZRck0TDA


Description	Heriot-Watt Engage - West Lothian Libraries
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	Online talk aimed on "Understanding Online Abuse," primarily at users of public libraries in West Lothian, Scotland.
Year(s) Of Engagement Activity	2021
URL	https://www.eventbrite.co.uk/e/understanding-online-abuse-an-artificial-intelligence-challenge-ticke...


Description	Invited talk at Google Deep Mind, London
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	An invited talk at the Sociotechnical AI Research group, Google Deep Mind. Led to a lively discussion on fairness and ethics in NLP with industry practitioners.
Year(s) Of Engagement Activity	2023


Description	Invited talk at the National Robotarium
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Other audiences
Results and Impact	An invited talk presenting recent work related to the Equally Safe Online and Gender Bias in Conversational Agents projects. Attended by members of faculty, students and members of the public.
Year(s) Of Engagement Activity	2023


Description	Science outreach activity
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Drawing activity for children exploring design of conversational assistants.
Year(s) Of Engagement Activity	2022
URL	https://www.whatsoninedinburgh.co.uk/event/102553-edinburgh-science-festival:-datasphere/


Description	The 1st Workshop on CounterSpeech for Online Harms (CS4OA)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	The workshop was collocated with the 24th meeting of SIGDial (Special Interest Group on Discourse and Dialogue), and brought together researchers form computer science and experts in policy surrounding tackling online abuse and hate speech. Eight research papers were presented along with two invited keynote talks and a panel discussion.
Year(s) Of Engagement Activity	2023
URL	https://sites.google.com/view/cs4oa


Description	The Times article on Gender Based Abuse
Form Of Engagement Activity	A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Media (as a channel to the public)
Results and Impact	Interview contribution Times article covering online Gender Based Violence and how to use NLP/ ML algorithms to defend against it.
Year(s) Of Engagement Activity	2022
URL	https://www.thetimes.co.uk/article/coders-make-algorithm-to-fight-online-gender-abuse-kdf5q29cf

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications