Counter Factual Data Augmentation for Mitigating Gender Bias in Morphologically Rich Languages

Lead Research Organisation: University of Cambridge
Department Name: Computer Science and Technology

Abstract

One of the biggest challenges facing modern machine learning systems is removing bias from data. As AI algorithms are
increasingly being used to shape and determine important decisions in private and public life, it is essential to ensure that
these systems are devoid of human prejudice and error. Nonetheless, the utility and accuracy of these algorithms
depend on data which "are not objective; they are creations of human design" (Crawford, 2013). One area in which data
is inherently biased is language. My current Master's thesis aims to create a novel generative model that transforms
sentences of one gender form into another gender form, for languages such as Spanish and Hebrew which possess
masculine and feminine inflections for nouns, verbs, and adjectives. The PhD will seek to build on this research and
investigate three main questions: (1) To what extent can the efficacy and efficiency of the generative model used to
balance gendered language be strengthened and enhanced? (2) What other techniques can be designed and
implemented to de-bias corpora in English and other languages? And (3), to what extent can these models and
techniques be applied to study and correct other types of biases, such as racial and cultural biases? The PhD will aim to
build systems that will avoid bias amplification, facilitate interdisciplinary applications, and offer robustness to future
models developed for NLP tasks.

People

ORCID iD

Ran Zmigrod (Student)

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513180/1 01/10/2018 30/09/2023
2276290 Studentship EP/R513180/1 01/10/2019 31/03/2023 Ran Zmigrod