Counter Factual Data Augmentation for Mitigating Gender Bias in Morphologically Rich Languages

Lead Research Organisation: University of Cambridge

Department Name: Computer Science and Technology

Abstract

One of the biggest challenges facing modern machine learning systems is removing bias from data. As AI algorithms are
increasingly being used to shape and determine important decisions in private and public life, it is essential to ensure that
these systems are devoid of human prejudice and error. Nonetheless, the utility and accuracy of these algorithms
depend on data which "are not objective; they are creations of human design" (Crawford, 2013). One area in which data
is inherently biased is language. My current Master's thesis aims to create a novel generative model that transforms
sentences of one gender form into another gender form, for languages such as Spanish and Hebrew which possess
masculine and feminine inflections for nouns, verbs, and adjectives. The PhD will seek to build on this research and
investigate three main questions: (1) To what extent can the efficacy and efficiency of the generative model used to
balance gendered language be strengthened and enhanced? (2) What other techniques can be designed and
implemented to de-bias corpora in English and other languages? And (3), to what extent can these models and
techniques be applied to study and correct other types of biases, such as racial and cultural biases? The PhD will aim to
build systems that will avoid bias amplification, facilitate interdisciplinary applications, and offer robustness to future
models developed for NLP tasks.

Student:

Ran Zmigrod

Period of Study:

Oct 19 - Mar 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2276290

Research Topic:

Unclassified

Organisations

University of Cambridge (Lead Research Organisation)

People	ORCID iD
Ran Zmigrod (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/R513180/1			01/10/2018	30/09/2023
2276290	Studentship	EP/R513180/1	01/10/2019	31/03/2023	Ran Zmigrod

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects