Evolving Hands: Building Workflows and Scalable Practices for Handwriting Recognition and Text Encoding in Cultural Institutions

Lead Research Organisation: Newcastle University
Department Name: Sch of English Lit, Lang & Linguistics

Abstract

Transcription has evolved dramatically in the twenty-first century. Historically this has been volunteer-driven to provide 'added-value'. However this results in a flat and undemocratic user experience - requiring a pre-existing understanding of the terms, subjects, people, places, and their spellings within the digitized collection. Two tools changing this model within digital curation are Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR). OCR is ubiquitous in mass digitisation but has substantial limitations, while HTR is still unfamiliar in cultural institutions.

This project undertakes 3 case studies ranging across document forms to demonstrate how these digital tools can be iteratively incorporated into curation. These range from: 19th-20th century handwritten letters and diaries from the UNESCO Gertrude Bell Archive, 16th century scribal hand in Spanish and Nahuatl, 18th century German, 20th century French correspondence, and a range of printed materials from the 19th century onward in English and French. A joint case study converts legacy printed material of the Records of Early English Drama (REED) project. By covering a wide variety of periods and document forms the project has a real opportunity here to foster responsible and responsive support for cultural institutions. This project seeks to establish more effective workflows that fill the gap between digitization, semantic-oriented encoding, and data discoverability.

Newcastle University Case Study -- The Gertrude Bell Archive: This case study uses Newcastle Special Collection's UNESCO Gertrude Bell Archive (http://gertrudebell.ncl.ac.uk/), which document the activities of the explorer, archaeologist, and political agent who was instrumental in establishing the Kingdom of Iraq in 1921. Bell is the subject of plays, documentaries, a feature film, and recently was nominated as a BBC 20th Century Icon. A separate centenary project is digitizing and cataloguing her archive of diaries, letters, and photographs. Piggybacking on that we will select the richest materials to train HTR base models of Bell's hand and use up-converted transcriptions for the production of training materials.

Bucknell University Case Study -- Scholarly Production at Scale: The Bucknell case study centers on processes used across multiple discrete projects by staff with a range of digital experience. These projects represent different models for testing the HTR to TEI conversion process. Their sources are drawn from Bucknell's Special Collections and research of faculty working with archives in the US, UK, Europe, and Asia. They include scribal hands, life papers, correspondence, and semi-legible typed government files from 1500-1970 and are in English, French, German, Latin, Nahuatl (an indigenous Mexican language), Spanish, and Vietnamese. This case study will directly benefit multiple projects at the university, and is optimized for sharing with smaller cultural institutions around the world.

Joint Newcastle/Bucknell Case Study -- Transforming REED Print Collections: Cummings (AHRC PI) and Jakacki (NEH PI) will collaborate on a case study converting collections produced by the Records of Early English Drama (http://reed.utoronto.ca) project that has published since 1979 edited documentary records of pre-1642 performance in pre-modern England, Scotland and Wales. However, the semantic information provided in the print collections, through the use of special symbols and formatting, is lost in OCR. Previous tests using HTR by Jakacki and Cummings have demonstrated that these distinctions can be transformed with HTR to TEI. The project will document shared workflows for consistent up-conversion into viable materials ready to enter the REED project's digital publication workflow. This has the potential to be of use for all the other REED legacy print volumes (well over 20,000 pages of rich scholarly material).
 
Description Although the project is still underway it is progressing well towards its original goals. This project has been undertaking 3 large case studies in the full-text digitisation of cultural heritage textual resources. Partly this has been to improve accessibility of those resources and partly to serve as case studies for small cultural institutions looking to undertake handwritten text recognition on a variety of texts. While we are still in the midst of documenting the workflows used by the use cases, we had discovered the applicability and limits of the proposed workflows. Undertaking this work as an AHRC/NEH project has created a transatlantic collaborative network with our NEH partner project (at Bucknell University) in a way that has generated both capacity and skill enrichment in our respective institutions. One of the case studies at Newcastle University is nearing its end and is leading to significant developments in the Gertrude Bell archive website. The project has already presented some of its outputs at the TEI 2022 conference.
Exploitation Route A primary goal of the project has been to produce outputs which act as cookbook exemplars or how-to's for text creation projects in small cultural institutions. Its pending documentation of the workflows, and release of conversion XSLT stylesheets for converting the outputs of Handwritten Text Recognition (HTR) to TEI P5 XML are still in development but will be freely and openly released on GitHub and other locations. The pedagogical materials on using these technologies will be disseminated to the community in a variety of ways.
Sectors Education,Culture, Heritage, Museums and Collections

URL https://research.ncl.ac.uk/evolvinghands/