Identifying & Classifying Bias in Cultural Heritage Catalogues: Applying Natural Language Processing to University of Edinburgh Archival Descriptions

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

The objective of this project is to develop a context-informed approach to bias detection, executed as a series of case studies beginning with the University of Edinburgh's Archive. Motivated by separate yet related strands of research in the fields of Natural Language Processing (NLP) and Cultural Heritage, the project identifies opportunity to improve large-scale, automated bias detection. Taking a cross-disciplinary approach, the project applies NLP and data visualisation to archival descriptions. NLP approaches such as topic modelling and sentiment analysis will analyse and classify the language of the Archive's descriptions. Due to the context-dependency of bias, data visualisation provides a suitable approach to presenting results of the NLP analysis. Interactive data visualisations will present the results in their associated geographic areas and time periods, enabling people to see associations that Archive items have with different types of bias. The project will propose a visualisation framework for presenting bias in human language content, which, based on the author's knowledge, has yet to be proposed. Rather than eliminate bias, the project seeks to identify and classify bias, arguing that bias deserves a place in cultural heritage institutions.

Bias, though problematic when one-sided, is informative when presented transparently. Bias communicates the perspective of specific groups of people during specific time periods in history; recording historical biases informs understandings of societal evolution and the various perspectives that have existed on a topic [1]. Identifying different types of bias helps researchers understand how representative their dataset is, where more types of bias being present suggests a more representative dataset. This project seeks to develop techniques for identifying and classifying bias that will bring value to cultural heritage institutions and the public they serve, making bias transparent in human language content anywhere from an archival description to a social media post.

The project seeks to develop bias-detecting technology beginning with a case study with free-text, human-written, archival descriptions. Cataloguers first wrote archival descriptions on paper in the 1930s and then in databases beginning in the 1970s. Explicitly, the language of archival descriptions reflects their historical contexts, using terms considered racist, sexist or otherwise inappropriately biased today. Implicitly, missing information in archival descriptions regarding certain groups of people reflects historical biases. These types of explicit and implicit bias can be found in textual data beyond cultural heritage catalogues, such as in newspapers and social media posts. As a result, while improving the transparency of the Archive's descriptions, the outcomes of this project could also inform research on returning representative search results [5], implementing fair algorithms [2], and identifying bias in social media [3, 4].

References

1. Holterhoff, K. (2017) "From Disclaimer to Critique: Race and the Digital Image Archivist." In: Digital Humanities Quarterly 11.3 URL: http://digitalhumanities.org:8081/dhq/vol/11/3/ 000324/000324.html

2. IEEE. (2016) Ethically Aligned Design: A Vision for Prioritizing Human Wellbeing with Artificial Intelligence and Autonomous Systems. Version 1. http://standards.ieee.org/develop/indconn/ ec/autonomous%20systems.html 12.05.2018

3. Recasens, M., Danescu-Nculescu-Mizil, C., Jurafsky, D. (2013). "Linguistic Models for Analyzing and Detecting Biased Language." Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics 1650-1659.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513209/1 30/09/2018 29/09/2023
2356289 Studentship EP/R513209/1 31/03/2020 29/09/2023 Lucy Havens