Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship

Lead Research Organisation: University of Sussex
Department Name: Sch of History, Art History & Philosophy


"Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship" will develop a platform for a transformational impact in digital scholarship within cultural institutions by opening up new and important directions for computational, critical, and curatorial analysis of collection catalogues. Extensive digital and digitised sets of curatorial descriptions from legacy catalogues are increasingly available and we seek to realise their potential as valuable resources for cross-disciplinary research into curatorial practice, and for enhancing access to and analysis of collections at scale.

Catalogues are fundamental to cultural institutions: they represent their objects, provide the basis of searches for their objects, and communicate knowledge about their objects into the future. Catalogues are also fundamental to the history of cultural institutions, as artefacts of late-nineteenth and early-twentieth century professionalisation that have evolved from physical objects such as printed books to digital databases, online discovery services, and linked open data. Catalogues, then, create a lasting legacy. The digitisation of collections has furthered that legacy with the use of descriptions from legacy print catalogues as a starting point for indexing digitised material.

Thus we see that the writers of catalogue descriptions are powerful interlocutors not only between objects and viewers, but also between the past and now. And when legacy catalogues are reused as the basis for contemporary descriptions of collection items, a powerful and often difficult to detect "curatorial voice" remains. This voice is a product of the historical and social contexts in which the descriptions were written. This has serious consequences for the trust which users can have in federated catalogues, particularly when parts of those catalogues are steeped in unacknowledged or unidentifiable past voices and colonial gathering. There is therefore an urgent need to elucidate and foreground curatorial "voices", and to do this at scale. Curators and researchers alike require methods that can comprehensively articulate the choices, preferences, and omissions made by curators in what are often large bodies of text produced during decades of work.

This project will foreground some of the fundamental ways in which cultural institutions represent their objects and create a pathway to transform the reuse of legacy catalogues for access, scholarship, and research. Our pilot research will investigate the temporal and spatial legacy of a landmark catalogue: the 1.1 million word British Museum 'Catalogue of Political and Personal Satires', which is the basis of related catalogue data at the Lewis Walpole Library and the British Library. We will demonstrate how methods combining corpus linguistic and archival research can be used to produce an empirical account of curatorial "voice" across a large catalogue. We ask questions about the enduring legacies of curatorial labour, methods for defining and highlighting curatorial voice, the role of digital scholarship in responding to the ways in which legacy descriptions work against contemporary ambitions of cultural institutions, and how to develop sectoral capability in digital scholarship. We will co-produce training materials and reports, deliver proofs-of-concept for changing how legacy descriptions are presented to diverse publics, and release transparent code, data, and methods to enable the reuse of our methods.

The project team comprises researchers, curators and technologists from University of Sussex, Yale Digital Humanities Lab, the Lewis Walpole Library and the British Library. The partnerships between the team members, and the wider community, will be developed through pilot research, workshops, and research residencies.

Planned Impact

1. *Collaborating Cultural Institutions* will benefit from computational, critical, and curatorial analysis of their collection catalogues. In particular, they will benefit from our focus on elucidating difficult to detect curatorial voices whose continued presence works against the ability of cultural institutions to respond to larger societal shifts. This will impact on the collaborating cultural institutions by, for example, transfers of knowledge about computational approaches to defining features of "curatorial voice" and by giving them agency to co-produce capability building outputs (e.g. a training module) based on their own catalogue data. During and after the project, the British Library and Lewis Walpole Library will benefit from being at the centre of a new transatlantic network and collaboration focused on the temporal and spatial legacies of early-twentieth century Anglophone curatorial labour, and on producing practical, implementable, and public facing responses in light of new knowledge and data about these legacies.
2. *Cultural Institutions with Significant Legacy Catalogue Data* will be benefit from the project as attendees at our capability building and partnership development workshops. UK and US based invitees will be drawn from participants at workshops run during Baker and Salway's recent British Academy "Curatorial Voice" project, and from the networks of British Library and Lewis Walpole Library, for example the British Museum, Cleveland Museum of Art, and Wellcome Collection. For those responsible for delivering access to collections, the workshop will be an opportunity to reflect on curatorial practice and inherited knowledges, and how they impact search and discovery of collections. There is also the potential for a big practical "pay-off" if we demonstrate that descriptions can be classified automatically, e.g. for checking curatorial practice against institutional guidelines, and for selecting (parts of) descriptions to be used as the basis for structured representations of digital images of collection items.
3. *Policy Networks relating to Cultural Institutions* will benefit from interventions that underscore the political urgency of knowing where, from whom, and under what circumstances catalogue data was produced. For example, during the development of our training module we will work with curators and cataloguers to assess the applicability of corpus linguistic techniques to current and future professional practice. Given a set of guidelines for producing curatorial descriptions, corpus techniques can check the extent to which guidelines are being followed at a macro-level, e.g. by identifying what aspects of objects tend to be referred to or not. Further, such analysis can form a basis for plans to enhance a catalogue by providing areas to focus on and estimates on the person time required. A corpus-based characterisation of the language used in an exemplary catalogue could also be used to develop or refine cataloguing guidelines by identifying that catalogue's distinctive linguistic features.
4. *Publics Using Legacy Catalogue Data* will benefit from the production of proofs-of-concept for presenting legacy descriptions and their linguistic features. It is common, for example, for news agencies to clearly "flag" old stories on their websites, but that no such custom exists for legacy descriptions. We will develop implementable pathways for achieving such interventions, disseminated as an easy to digest pamplet. Further, when using a legacy catalogue as the basis for accessing a collection through text-based search, users may benefit from having an overview of the common vocabulary in order to understand what search terms are likely to be effective. Our project considers in what form users of legacy descriptions would find this type of data useful, and creates a pathway towards change.


10 25 50
Title Legacies of Catalogue Descriptions and Curatorial Voice: Infographics 
Description The project 'Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship' (2020-22, AHRC, Project Reference AH/T013036/1) sought to develop a platform for a transformational impact in digital scholarship within cultural institutions by opening up new and important directions for computational, critical, and curatorial analysis of collection catalogues. Our pilot research investigated the temporal and spatial legacy of a landmark catalogue: the 'Catalogue of Political and Personal Satires Preserved in the Department of Prints and Drawings in the British Museum', entries in which form the basis of related catalogue data at institutions including the Lewis Walpole Library and the British Library. Towards the end of the project we invited together members of our community to discuss shared agendas and actions: where we should focus our collective resources, what things we need to work differently, and the role of computational technologies in both shaping and constraining change. These infographics respond to those conversations. 'Legacy catalogue entries: actions and agendas' presents priorities that emerged from a longlist of project findings. And 'Tracing the transmission of legacy catalogue entries' visualises the journey of legacy catalogue data in our case study from conception, through cataloging infrastructures, and into the future. Both graphics were designed by Lucy Havens, with editorial support from James Baker and Rossitza Atanassova. We thank our community of participants for their encouragement and suggestions during the design process. 
Type Of Art Image 
Year Produced 2022 
Impact The infographics were developed with community input. Members of that community from the cultural heritage sector have already expressed interested in using the infographics as a novel way of introducing colleagues (especially those starting out in the sector) to the issues at stake with regards to legacy catalogue data. 
URL https://doi.org/10.5281/zenodo.6221868
Description From a long list of 21 project findings in February 2022 we worked with beneficiaries to create a shortlist of five project findings most pertinent to future work with legacy catalogue data. These are:

- Language structures that perpetuate inequality, without being directly offensive or harmful, are present in contemporary catalogues
- Computational analysis of legacy catalogue entries enables the creation of important new knowledge about legacy catalogue entries at scale
- Computational analysis of catalogue data can form the basis of reparative cataloguing
- Computational analysis of catalogue data can form a part of professional development in the cataloguing community
- Creative computation can be used to foreground the inequalities present in and harms created by machine use of legacy catalogue entries

These findings combine the computational, curatorial, and archival, and confirm our hypothesis that the catalogue is a pivot around which digital scholarship can be fostered in cultural institutions.Cataloguing professionals are fully aware of the presence of historic voices in their catalogue data (though they may not be able to accurately locate them), are keen to invest time understanding if these voices should continue to be amplified, and are looking for tools to start this work. Our research-led approach, which combines corpus linguistics, archival research, and technological experiments with catalogue data, seems to be have struck a cord (e.g. invite to speak at the National Library of Scotland) and generated positive feedback which are taking into the revised partnership development stage.

Note that international partnership development activities continued to be disrupted by the pandemic, leading to a further project extension to complete these.

In our research pilot, we have also found that transmission of 'curatorial voice' did take place between the British Museum and Lewis Walpole Library with regards to descriptions of satirical print collections at each institution. Our computational, critical, and curatorial approach continued to bear fruit, and we are finding - though this is provisional, as this work continues into the project extension period - that the computational approach can go beyond identifying exact or light transmission of text, and that the curatorial approach provides a bottom-up route to understanding the historically-specific features of our data and identifying looser transmission (e.g. a Lewis Walpole Library description that responds to the character of a British Museum description).

The major achievement of the project thus far is a training module on computational analysis of catalogue data https://cataloguelegacies.github.io/antconc.github.io/. A provocation and a worksheet on the presentation of legacy catalogue data have published. Findings from the research pilot have been written up and accepted for publication with Digital Humanities Quarterly. And the technological experiments - focused on automated writing and computational creativity - are establishing new directions for analysing catalogue data and enabling sectoral change.
Exploitation Route Through our training materials, worksheets, and the publication of documented datasets and code, heritage professionals and researchers will be able to take forward our approach to catalogue data in their work.
Sectors Culture


Museums and Collections

URL https://cataloguelegacies.github.io/
Description Based on feedback from our training events, catalogue professionals are exploring taking forward our computational approach to catalogue data in their catalogue related work, with associated impacts on users of cultural heritage collections.
First Year Of Impact 2020
Sector Culture, Heritage, Museums and Collections
Impact Types Cultural

Description Legacies of curatorial voice in the descriptions of incunabula collections at the British Library and their future reuse
Amount £25,000 (GBP)
Organisation Arts & Humanities Research Council (AHRC) 
Sector Public
Country United Kingdom
Start 07/2022 
End 07/2023
Title Analysis of transmission from BMSat to LWL 
Description This repository release contains code in relation to the analysis of "transmission" from the Catalogue of Political and Personal Satires Preserved in the Department of Prints and Drawings in the British Museum (a corpus of which is described at James Baker & Andrew Salway (2019). Creation of the BMSatire Descriptions corpus (Version v1.0) http://doi.org/10.5281/zenodo.3245037) to catalogue records at the Lewis Walpole Library. 
Type Of Material Data analysis technique 
Year Produced 2021 
Provided To Others? Yes  
Impact It underpinned a paper in which results of analysis are published. 
URL https://doi.org/10.5281/zenodo.5148228
Title Catalogue records of photographs (1850-1950) 
Description A set of catalogue records for photographs (created 1850-1950) that are held at the British Library. Export from the Integrated Archives and Manuscripts System of only CC0 published records. Personal or sensitive information has been removed. This dataset was created specifically for the Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship project to demonstrate computational analysis of catalogue data using corpus linguistic methods and tools. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact The dataset was extracted from the IAMS catalogue based on our research query. The publication of the dataset makes available data otherwise only held internally. The use of the dataset by the project team has identified data quality issues (e.g. faulty concatenation of fields created during migration of dataset from photography catalogue to IAMS). Usage of the dataset will be monitored. 
URL https://bl.iro.bl.uk/work/d765556d-24ee-48b9-8bc2-8c1c9ae229ea
Description Computational Analysis of Catalogue Data training (1) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This training event provided instruction in using AntConc and approaches from computational linguistics for the purposes of examining catalogue data to enable important catalogue related work. This fed into the design of our training materials.
Year(s) Of Engagement Activity 2020
URL https://cataloguelegacies.github.io/antconc-training
Description Computational Analysis of Catalogue Data training (2) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This training event provided instruction in using AntConc and approaches from computational linguistics for the purposes of examining catalogue data to enable important catalogue related work. This fed into the design of our training materials.
Year(s) Of Engagement Activity 2020
URL https://cataloguelegacies.github.io/antconc-training
Description Computational Analysis of Catalogue Data training (3) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This half-day training event provided instruction in using AntConc and approaches from computational linguistics for the purposes of examining catalogue data to enable important catalogue related work. It represented the final test of our developed training materials.
Year(s) Of Engagement Activity 2020
URL https://cataloguelegacies.github.io/antconc-training2
Description Computational analysis of catalogue descriptions: Collaboration as a catalyst for change 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact As part of the DCDC 2021 conference and the panel on "Accessible and inclusive catalogues", Baker and Atanassova introduced the project as a transformative collaboration that has brought together historical research using computational methods and the investigation of catalogue practice at cultural institutions, and that has stimulated an interest amongst GLAM professionals in using corpus linguistic tools and approaches for the computational, critical and curatorial analysis of collection catalogues. We explained the background to the project, and the engagement with the GLAM community as part of the development of online training materials.
Year(s) Of Engagement Activity 2021
URL https://pheedloop.com/dcdc21/site/schedule/
Description Europeana Research Community Café - Legacies of Catalogue Descriptions, Data Quality and Ethics 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Roughly 40 attendees, largely from the GLAM sector, attended to learn more about project outcomes and to discuss the use of legacy catalogue data in the cultural sector, and the ethical implications of that use in the context of a changing technological landscape (e.g. machine learning based applications for discovery analysis; user navigation of cultural collections).
Year(s) Of Engagement Activity 2022
URL https://pro.europeana.eu/event/europeana-research-community-cafe-legacies-of-catalogue-descriptions-...
Description Legacies of Catalogue Descriptions: outputs and next steps 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This workshop provided an opportunity to discuss work-in-progress outputs from the Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship project and to work together to develop next steps.
Year(s) Of Engagement Activity 2021
URL https://cataloguelegacies.github.io/July-2021-workshop
Description Legacies of Catalogue Descriptions: prioritising agendas and actions 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The past, present, and future of anglophone cataloguing, including their spatial and temporal resonances, has for the 'Legacies' team provided a focal point for research, technological experimentation, the development of training materials, and partnership development. At our workshops, talks, and discussion forums, we have benefited from the insight, curiosity, and enthusiasm of colleagues across the cultural heritage sector committed to investigating the inequities in legacies catalogue data and to mitigating their harms.

As our current round of funding drew to a close, we invited members of that community to join us to set shared agendas and agree next steps. We asked where we should focus our collective resources, what things we need to work differently, and the role of computational technologies in both shaping and constraining action.

Both events worked towards a single co-produced output: an infographic explaining the problem area, our shared priorities, and the next steps for action.
Year(s) Of Engagement Activity 2022
URL https://cataloguelegacies.github.io/agendas-actions