Designing Conversational Assistants to Reduce Gender Bias
Lead Research Organisation:
Heriot-Watt University
Department Name: S of Mathematical and Computer Sciences
Abstract
Biased technology disadvantages certain groups of society, e.g. based on their race or gender. Recently, biased machine learning has received increased attention. Here, we address a different type of bias which is not learnt from data, but encoded during the design process. We illustrate this problem on the example of Conversational Assistants, such as Amazon's Alexa, Apple's Siri, Microsoft's Cortana, or Google's Assistant, which are predominately modelled as young, submissive women. According to UNESCO, this bears the risk of reinforcing gender stereotypes.
In this proposal, we will explore this claim via psychological studies on how conversational gendering (expressed through voice, content and style) influences human behaviour in both online and offline interactions. Based on the insights gained, we will establish a principled framework for designing and developing alternative conversational personas which are less likely to perpetuate bias. A persona can be viewed as a composite of elements of identity (background facts or user profile), language behaviour, and interaction style. This framework will include state-of-the-art data-efficient NLP deep learning tools for generating dialogue responses which are consistent with a given persona. The persona parameters can be specified by non-expert users in order to to facilitate more inclusive design, as well as to enable a wider critical discussion.
In this proposal, we will explore this claim via psychological studies on how conversational gendering (expressed through voice, content and style) influences human behaviour in both online and offline interactions. Based on the insights gained, we will establish a principled framework for designing and developing alternative conversational personas which are less likely to perpetuate bias. A persona can be viewed as a composite of elements of identity (background facts or user profile), language behaviour, and interaction style. This framework will include state-of-the-art data-efficient NLP deep learning tools for generating dialogue responses which are consistent with a given persona. The persona parameters can be specified by non-expert users in order to to facilitate more inclusive design, as well as to enable a wider critical discussion.
Planned Impact
UNESCO points out that the "clock is ticking" for establishing appropriate design norms for conversational assistants: One the one hand, they are new enough that the public's perception is still highly malleable. On the other hand, the adoption of this technology is growing at an unprecedented rate: According to NPR and Edison Research (2018) users are picking up smart speakers at a much faster rate than they did smartphones or tablets. And Gartner predicts that 75% of U.S. households will have smart speakers by 2020. As such, this research has potential to reach and impact millions of customers. In order to realise this impact we will:
* Work with decision and policy makers such as the Scottish Parliament and the UNESCO to ensure oversight.
* Disseminate our results to industry via conference talks, industry-focused events and invited visits.
* Educate a future workforce and investigate how to attract a more diverse workforce into the sector in collaboration with existing training programmes, such as "Data Education in Schools" and "Equate Scotland".
* Engage the public via outreach activities and by facilitating participatory design workshops.
* Closely work with the BBC on a showcase demonstrator.
* Work with decision and policy makers such as the Scottish Parliament and the UNESCO to ensure oversight.
* Disseminate our results to industry via conference talks, industry-focused events and invited visits.
* Educate a future workforce and investigate how to attract a more diverse workforce into the sector in collaboration with existing training programmes, such as "Data Education in Schools" and "Equate Scotland".
* Engage the public via outreach activities and by facilitating participatory design workshops.
* Closely work with the BBC on a showcase demonstrator.
Publications
Abercrombie G
(2023)
Mirages. On Anthropomorphism in Dialogue Systems
Bergman S. A.
(2022)
Guiding the Release of Safer E2E Conversational AI through Value Sensitive Design
Cercas Curry A
(2021)
ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI
Description | Biased technology disadvantages certain groups of society, e.g. based on their race or gender. Recently, biased machine learning has received increased attention. Here, we address a different type of bias which is not learnt from data, but encoded during the design process. We illustrate this problem on the example of Conversational Assistants, such as Amazon's Alexa, Apple's Siri, Microsoft's Cortana, or Google's Assistant, which are predominately modelled as young, submissive women. According to UNESCO, this bears the risk of reinforcing gender stereotypes. In this project, we will explored this by first analysing people's perceptions towards exiting voice assistants. In particular, we investigated the use of gendered pronouns to refer to these systems in online forums. We found that Amazon's Alexa and Apple's Stir and predominately referred to as "she/her" whereas Google's Assistant is refereed as "it. We conclude that naming matters. We then investigate to which extend these systems are anthropomorphised, i.e. claim to engage in human activities such as eating or having emotions. We use an existing annotation scheme from "Living Machines" and find that Google Assistant is the most anthropomorphised one. This study is published as: Gavin Abercrombie, Amanda Cercas Curry, Mugdha Pandya and Verena Rieser. Alexa, Google, Siri: What are Your Pronouns? Gender and Anthropomorphism in the Design and Perception of Conversational Assistants. ACL-IJCNLP 2021 3rd Workshop on Gender Bias in Natural Language Processing (GeBNLP 2021) We also explore alternative designs for these systems by using "participatory design". We organise a workshop open to the public in collaboration with the Royal Society of Edinburgh. We find that people largely disagree what design they would prefer and conclude that designs may either be personalised or the impact of the design, e.g. in terms of positive or negative user behaviour, should be measured objectively. This study was published as: Amanda Cercas Curry, Judy Robertson and Verena Rieser. Conversational Assistants and Gender Stereotypes: Public Perceptions and Desiderata for Voice Personas. 2nd Workshop on Gender Bias in Natural Language Processing (GeBNLP) at COLING2020 We then conducted 2 industry case studies to discover the design process of conversational assistants. In particular, we conducted interviews with the BBC's design team of BEEP and the social robot Jibo from MIT. We are currently analysing the transcripts. Next, we conducted several experiments regarding abuse detection and abuse mitigation towards voice assistants. We gathered and released a dataset and trained a classifier. The results are published at EMNLP (prime venue). Amanda Cercas-Curry, Gavin Abercrombie and Verena Rieser. ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021. (long paper) We then conducted a study on abuse mitigation strategies and found that there is an interaction between the "appropriateness" rating and the gender of the voice -- where CounterSpeech by female artificial voices is rated significantly lower. We hypothesis that this reflects the stereotype that women should not "talk back". Luca M. Leisten and Verena Rieser. ""I Like You, as a Friend": Voice Assistants' Response Strategies to Sexual Harassment and Their Relation to Gender." Human Perspectives on Spoken Human-Machine Interaction SpoHuMa 2022 Finally, we collaborated with Facebook AI/ Meta on the more general question of safety in large language models, which is closely related to the bias problem. We published a series of papers, conducted several workshops, and released several resources including datasets and a "SafetyKit" evaluation tool. Emily Dinan and Gavin Abercrombie and A. Stevie Bergman and Shannon Spruit and Dirk Hovy and Y-Lan Boureau and Verena Rieser. SafetyKit: First Aid for Measuring Safety for Open-domain Conversational Systems. Proceedings of the 60th Conference of the Chapter of the Association for Computational Linguistics, ACL 2022. (long paper) A. Stevie Bergman, Gavin Abercrombie, Shannon Spruit, Dirk Hovy, Emily Dinan, Y-Lan Boureau and Verena Rieser. Guiding the Release of Safer E2E Conversational AI through Value Sensitive Design. 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2022) Gavin Abercrombie and Verena Rieser. Risk-graded Safety for Handling Medical Queries in Conversational AI. 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, AACL-IJCNLP 2022 [Arxiv] |
Exploitation Route | Our findings of abuse mitigation strategies of voice assistants have already had an impact: Voice assistants have changed their responses to abuse from "flirt-y" to being more assertive. Anecdotally, I was told by people at Google and Apple that my research and invited talks I gave at their research and design labs influenced this decision. I was also invited to several panels, focus groups and keynotes to talk about this issue; as well as several pieces in international media outlets and broadcasts. The SafetyKit software is hosted at Meta and is openly accessible, see https://parl.ai/projects/safety_bench/ |
Sectors | Digital/Communication/Information Technologies (including Software) Government Democracy and Justice |
URL | https://sites.google.com/view/convai-gender-bias |
Description | Our findings of abuse mitigation strategies of voice assistants have already had an impact: Voice assistants have changed their responses to abuse from "flirt-y" to being more assertive. Anecdotally, I was told by people at Google and Apple that my research and invited talks I gave at their research and design labs influenced this decision. I was also invited to several panels, focus groups and keynotes to talk about this issue; as well as several pieces in international media outlets and broadcasts. For example, I had the chance to speak on Radio 4 about "whether chatbots could solve the loneliness problem". The SafetyKit software is hosted at Meta AI/ Facebook and is openly accessible. |
First Year Of Impact | 2020 |
Sector | Digital/Communication/Information Technologies (including Software) |
Impact Types | Societal |
Description | DATAIA scientific advisory board |
Geographic Reach | Europe |
Policy Influence Type | Participation in a guidance/advisory committee |
URL | https://www.dataia.eu/linstitut/le-conseil-scientifique |
Description | AI for Good |
Amount | £15,000 (GBP) |
Organisation | Nesta |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 03/2020 |
End | 09/2020 |
Description | AISEC: AI Secure and Explainable by Construction |
Amount | £807,165 (GBP) |
Funding ID | EP/T026952/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 08/2020 |
End | 08/2024 |
Description | Leverhulme Trust Senior Research Fellowship 2020 |
Amount | £47,000 (GBP) |
Funding ID | SRF\R1\201100 |
Organisation | The Royal Society |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 08/2020 |
End | 08/2021 |
Description | Postdoctoral & Early Career Excanges (PECE) |
Amount | £2,750 (GBP) |
Organisation | SICSA Scottish Informatics and Computer Science Alliance |
Sector | Academic/University |
Country | United Kingdom |
Start | 09/2021 |
End | 01/2022 |
Title | BLOOM Large Language Model |
Description | We created BLOOM the first publicly available large language model. This was a year-long collaboration as part of the BigScience workshop with several hundred of international scientists. I co-led one of the working groups. BLOOM stands for BigScience Large Open-science Open-access Multilingual Language Model. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | First publicly available "foundational model". Widely used and compared in the community. The ambition is to boost academic research and public benefits in competition to privately owned models, e.g. ChatGPT etc,. |
URL | https://huggingface.co/bigscience/bloom |
Title | ConvAbuse data |
Description | Dataset associated with the EMNLP2021 paper "ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI." |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Other researchers using this dataset |
URL | https://github.com/amandacurry/convabuse |
Title | ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI |
Description | Anthology paper link: https://aclanthology.org/2021.emnlp-main.587/ Abstract: We present the first English corpus study on abusive language towards three conversational AI systems gathered 'in the wild': an open-domain social bot, a rule-based chatbot, and a task-based system. To account for the complexity of the task, we take a more 'nuanced' approach where our ConvAI dataset reflects fine-grained notions of abuse, as well as views from multiple expert annotators. We find that the distribution of abuse is vastly different compared to other commonly used datasets, with more sexually tinted aggression towards the virtual persona of these systems. Finally, we report results from bench-marking existing models against this data. Unsurprisingly, we find that there is substantial room for improve-ment with F1 scores below 90%. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | n/a |
URL | https://underline.io/lecture/37849-convabuse-data,-analysis,-and-benchmarks-for-nuanced-detection-in... |
Title | GBV-Resources |
Description | This repository serves as a comprehensive collection of resources for the automated identification of online Gender-Based Violence (GBV) and related phenomena. |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | Research paper describing the repository available at https://aclanthology.org/2023.woah-1.17/ |
URL | https://github.com/HWU-NLP/GBV-Resources |
Title | GeBNLP2021 |
Description | Data and annotation guidelines from the paper "Alexa, Google, Siri: What are Your Pronouns? Gender and Anthropomorphism in the Design and Perception of Conversational Assistants.", presented at the 3rd Workshop on Gender Bias in Natural Language Processing (GeBNLP 2021). |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | n/a |
URL | https://github.com/GavinAbercrombie/GeBNLP2021 |
Description | Amazon SimBot Challenge |
Organisation | Amazon.com |
Department | Amazon UK |
Country | United Kingdom |
Sector | Private |
PI Contribution | My student team was selected to participate in the Amazon SimBot challenge. |
Collaborator Contribution | Our entry is supported with a grant from Amazon and in-kind contributions such as an invited visit to Amazon headquarters in Seattle as well as 2 days of workshops with Amazon staff. |
Impact | We expect a number of outcomes, including publications, student internships, and raising the international profile of our lab and university in this research area. |
Start Year | 2021 |
Description | Apple NLU research award |
Organisation | Apple |
Country | United States |
Sector | Private |
PI Contribution | This research gift supports research on low-resource Natural Language Generation. |
Collaborator Contribution | Research gift and monthly meetings. |
Impact | not yet |
Start Year | 2021 |
Description | Google Dialog and NLU research award |
Organisation | |
Country | United States |
Sector | Private |
PI Contribution | This research gifts supports an informal collaboration between Google Zurich and my group on topics related to dialogue systems and Natural Language Understanding. |
Collaborator Contribution | We received a research gift from Google to support research expenses. |
Impact | The award has supported my group with hardware, travel and data services (such as transcriptions and crowdsourcing) |
Start Year | 2020 |
Description | 2nd Workshop on Perspectivist Approaches to NLP (NLPerspectives 2023) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | This full-day workshop was held at the European Conference on AI (ECAI) in Krakow, Poland on the 30th September 2023. It was attended by 42 researchers and featured a keynote and panel featuring international guests. Nine research papers were presented including five archival papers published in the workshop proceedings at https://ceur-ws.org/Vol-3494/ |
Year(s) Of Engagement Activity | 2023 |
URL | https://nlperspectives.di.unito.it/w/2nd-workshop-on-perspectivist-approaches-to-nlp/ |
Description | Article in New Statesman |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Spotlight article in the New Statesman on my research on Gender Bias in Conversational Assistant technology. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.newstatesman.com/spotlight/2021/09/does-how-you-talk-to-your-ai-assistant-matter |
Description | BBC 4 Radio Broadcast |
Form Of Engagement Activity | A broadcast e.g. TV/radio/film/podcast (other than news/press) |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Expert contributor to BBC Radio Broadcast: "Could a virtual friend solve Britain's loneliness epidemic?" |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.bbc.co.uk/sounds/play/m001b44n |
Description | BBC Future Article |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Interview for BBC article covering the gender bias in voice assistants |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.bbc.com/future/article/20220614-why-your-voice-assistant-might-be-sexist |
Description | CNBC Interview |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Interview with CNBC on AI trends/ research predictions for 2022 |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.cnbc.com/2022/01/07/deep-learning-and-large-language-how-ai-is-set-to-evolve-in-2022.htm... |
Description | Cosmopolitan article 2022 |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Contributed interview to Cosmopolitan article: ""AI voice assistants are often women: Here's why it's a problem" From Alexa to Cortana, Tech prefers female voices. But how harmful are the effects?" |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.cosmopolitan.com/uk/entertainment/a41677473/ai-voice-assistants-women/ |
Description | Edinburgh Science Festival Event |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Online event as part of the Edinburgh Science Festival. The event consisted of a pre-recorded panel discussion, with live chat Q&A with the panellists. The film has been by over 400 viewers and was one of the top three most viewed videos at the Science Festival 2021. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.youtube.com/watch?v=fhhZRck0TDA |
Description | Gendering AI: the Case of Conversational Assistants | Edinburgh Science Festival 2021 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Public/other audiences |
Results and Impact | My team was invited to organise an event at the Edinburgh science Festival 2021 on "Gendering AI: the Case of Conversational Assistants" |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.youtube.com/watch?v=fhhZRck0TDA |
Description | Heriot-Watt Engage - West Lothian Libraries |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Online talk aimed on "Understanding Online Abuse," primarily at users of public libraries in West Lothian, Scotland. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.eventbrite.co.uk/e/understanding-online-abuse-an-artificial-intelligence-challenge-ticke... |
Description | Invited talk at Google Deep Mind, London |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | An invited talk at the Sociotechnical AI Research group, Google Deep Mind. Led to a lively discussion on fairness and ethics in NLP with industry practitioners. |
Year(s) Of Engagement Activity | 2023 |
Description | Invited talk at the National Robotarium |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Other audiences |
Results and Impact | An invited talk presenting recent work related to the Equally Safe Online and Gender Bias in Conversational Agents projects. Attended by members of faculty, students and members of the public. |
Year(s) Of Engagement Activity | 2023 |
Description | Science outreach activity |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | Drawing activity for children exploring design of conversational assistants. |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.whatsoninedinburgh.co.uk/event/102553-edinburgh-science-festival:-datasphere/ |
Description | The 1st Workshop on CounterSpeech for Online Harms (CS4OA) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | The workshop was collocated with the 24th meeting of SIGDial (Special Interest Group on Discourse and Dialogue), and brought together researchers form computer science and experts in policy surrounding tackling online abuse and hate speech. Eight research papers were presented along with two invited keynote talks and a panel discussion. |
Year(s) Of Engagement Activity | 2023 |
URL | https://sites.google.com/view/cs4oa |
Description | The Times article on Gender Based Abuse |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Interview contribution Times article covering online Gender Based Violence and how to use NLP/ ML algorithms to defend against it. |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.thetimes.co.uk/article/coders-make-algorithm-to-fight-online-gender-abuse-kdf5q29cf |