Unlocking Digital Texts: Towards an Interoperable Text Framework

Lead Research Organisation: University of Oxford
Department Name: Library Services

Abstract

A key challenge faced by digital text projects is encouraging cultural institutions, researchers and the wider-public to reuse and build upon their resources. Despite the scholarly effort put into creating them, digital texts do not seem to have the same 'long tail' use patterns that data from other disciplines have. One of the chief impediments has been that texts are produced and stored in formats that are hard to reuse. Many contain detailed contextual, semantic and presentational markup embedded with the underlying text. Even when these texts are encoded according to robust standards, such as Text Encoding Initiative (TEI), the content and style of the coding is fundamentally shaped by the editors' specific fields of study, languages or cultural norms. Reusing materials often requires project-specific code that embodies those principles and norms and sometimes even replicating the infrastructure that delivers them. This sets a high bar that only the most skilled, determined and well-funded researchers are able to surmount.
We aim to rectify this situation by defining an Interoperable Text Framework (ITF) and implementing exemplary test cases to demonstrate its strengths. We are not proposing a new format for encoding or storing text but rather a method for accessing and delivering textual resources (either whole documents or fragments) that are both readable by humans and also machine-friendly for computational analysis. When ITF is combined with other frameworks, such as IIIF and the W3C Web Annotation Data Model, it becomes possible to link texts, images, annotations and other online resources to construct narratives that can be visualised and navigated online.
ITF has the potential to transform online texts and editions into active online discourse by allowing multiple new narratives and analyses to be created around texts, without compromising the integrity of the originals. The partner projects all require such a capability and have already developed specific approaches that can inform the development of a more general and flexible standard.
ITF will enable researchers studying Samuel Beckett's works on the Beckett Digital Manuscript Project to construct their own narratives about his writing process. They can connect, display and analyse fragments from books in Beckett's library, where they were copied in notebook(s), and Beckett's subsequent intertextual reuse in his works. Readers can see and compare these multiple narratives and make their own inferences.
For digital pedagogy and for digital editions, ITF will be the starting point of unprecedented global and local collaborations. The rich but disorganized papers of the early modern mathematician Thomas Harriot, for instance, will finally benefit from a flexible framework that does not assume linearity from front cover to back cover, but rather enables multiple points of entry for various readers. Enhanced navigability and annotation will make Harriot's papers legible not only for researchers collaborating worldwide but for classrooms, where teachers seek ready ways to contextualize mathematical discoveries within their cultural moments.
ITF will also enable users to apply computational analysis tools to heterogeneous collections of text from diverse sources, which would have typically been avoided because they are difficult to use. A researcher could use existing text mining and machine learning tools to study patterns of citation and reference in the correspondence collections catalogues in Early Modern Letters Online by performing comparative topic and sentiment analyses of letter texts and the referenced works digitised by the Text Creation Partnership.
By removing the technical and infrastructural barriers, ITF will help to ensure that textual resources will then be better able to live up to the promises of the FAIR principles [https://www.go-fair.org/fair-principles/]; they will be Findable, Accessible, Interoperable, and Reusable.

Publications

10 25 50
 
Description We have now largely completed the initial "Research" phase of the project, seeking to gain an understanding of the diversity of text resources and approaches "in the field."

On of the key discoveries is that graph-like approaches to both textual versioning and text fragment co-ordinate systems need to be supported, in addition to more conventional linear or hierarchical approaches. In addition, there was a community view that there was a niche for a semi-structured text format, in between a purely visual representation and a linear document-style serialisation.
Exploitation Route Ultimately we hope to establish a standard which future developments in textual scholarship can use to improve their interoperability and sustainability. This has the potential to impact large scale text mining and analytics.
Sectors Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections,Other

 
Description Presentation at "War, Trade and the Divisive Power of Citizenship" conference Sept 2022. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation at "War, Trade and the Divisive Power of Citizenship" conference Sept 2022. UDT was one of a number of projects discussed in the context of long term access to Humanities resources.
Year(s) Of Engagement Activity 2022
URL https://europa.unibas.ch/en/war-trade-citizenship/
 
Description Presentation to CLARIN Workshop on Libraries May 2022 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation to practitioners working on national and international infrastructures for the Humanities with a linguistic focus. Since the project aims to put in place standards that have the potential to take such endeavours to the next stage of development there was keen interest in collaboration.
Year(s) Of Engagement Activity 2022
URL https://www.clarin.eu/event/2022/clarin-and-libraries
 
Description Short presentation and discussion with the Oxford Digital Editions Community of Practice 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Short presentation and discussion with scholars in Oxford working on or around Digital Editions.
Year(s) Of Engagement Activity 2023
URL https://digitalscholarship.web.ox.ac.uk/event/digital-editions-community-practice
 
Description UDT Cambridge Workshop Feb 2023 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Workshop bringing together a variety of projects and practitioners working on scholarly workflows and textual editions to discuss how existing and future practice would map onto a text fragment, manifest and web annotation model. This activity also sought to recruit contributors towards a text manifest specification. Importantly, this workshop included contributions from the Text+ NFDI initiative in Germany and the the Humanities Cluster in the Netherlands.

A full write up is ongoing.
Year(s) Of Engagement Activity 2023
URL https://osf.io/aqdmh/
 
Description UDT Open Science Framework Site 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Primary website for running the operations of the UDT project (linked from the various institutional presences).
Year(s) Of Engagement Activity 2022
URL https://osf.io/r78gx/
 
Description UDT Oxford Workshop Jan 2023 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Workshop bringing together a variety of projects and practitioners working on access to text corpora and text fragments to discuss the requirements for a text fragment API and recruit potential contributors to such a specification. Importantly, this workshop included contributions from the Text+ NFDI initiative in Germany and the the Humanities Cluster in the Netherlands.

A full write up is ongoing.
Year(s) Of Engagement Activity 2023
URL https://osf.io/z2ybv/