Towards Globally Equitable Language Technologies (EQUATE)

Lead Research Organisation: University of Cambridge
Department Name: Linguistics

Abstract

Language technologies can now offer effective support to communication, education, healthcare, and many other aspects of human life. Yet, these technologies are not distributed equally. They are only available for a small part of the world's 7.9 billion population, mainly those living in the Global North. This is because the resources needed for them are limited or lacking for the vast majority of the world's over 7,000 living languages. This situation has significant scientific and socioeconomic consequences. The goal of our project is to investigate the methodological challenges in the development of globally equitable language technologies and to design transformative approaches to overcome them, with the overall aim of creating a realistic methodological basis for multilingually equitable NLP. We will first develop an understanding of the (in)equalities in language technologies, and produce a novel index that profiles the world's languages and language populations in terms of their readiness for language technologies. We will then develop new methods for multilingual NLP that address critical aspects of equity (ranging from sample efficiency to modularity, model compactness, transparency, fairness and others), along with a novel unified approach that integrates such methods to support NLP at different levels of readiness. Working with local language populations, we will also produce new, equity-aware evaluation resources that are representative of the world's low-resource languages in terms of geographic regions, linguistic characteristics, and NLP readiness. Our novel methodology will be evaluated on downstream NLP tasks as well as in the context of useful real-life applications in languages that are currently under-served by them. This project can transform the way we approach multilingual NLP, and substantially improve our understanding of how language technologies can be made fair and inclusive at a global level.

Publications

10 25 50