Unlocking Digital Texts: Towards an Interoperable Text Framework

Lead Research Organisation: University of Oxford
Department Name: Library Services

Abstract

A key challenge faced by digital text projects is encouraging cultural institutions, researchers and the wider-public to reuse and build upon their resources. Despite the scholarly effort put into creating them, digital texts do not seem to have the same 'long tail' use patterns that data from other disciplines have. One of the chief impediments has been that texts are produced and stored in formats that are hard to reuse. Many contain detailed contextual, semantic and presentational markup embedded with the underlying text. Even when these texts are encoded according to robust standards, such as Text Encoding Initiative (TEI), the content and style of the coding is fundamentally shaped by the editors' specific fields of study, languages or cultural norms. Reusing materials often requires project-specific code that embodies those principles and norms and sometimes even replicating the infrastructure that delivers them. This sets a high bar that only the most skilled, determined and well-funded researchers are able to surmount.
We aim to rectify this situation by defining an Interoperable Text Framework (ITF) and implementing exemplary test cases to demonstrate its strengths. We are not proposing a new format for encoding or storing text but rather a method for accessing and delivering textual resources (either whole documents or fragments) that are both readable by humans and also machine-friendly for computational analysis. When ITF is combined with other frameworks, such as IIIF and the W3C Web Annotation Data Model, it becomes possible to link texts, images, annotations and other online resources to construct narratives that can be visualised and navigated online.
ITF has the potential to transform online texts and editions into active online discourse by allowing multiple new narratives and analyses to be created around texts, without compromising the integrity of the originals. The partner projects all require such a capability and have already developed specific approaches that can inform the development of a more general and flexible standard.
ITF will enable researchers studying Samuel Beckett's works on the Beckett Digital Manuscript Project to construct their own narratives about his writing process. They can connect, display and analyse fragments from books in Beckett's library, where they were copied in notebook(s), and Beckett's subsequent intertextual reuse in his works. Readers can see and compare these multiple narratives and make their own inferences.
For digital pedagogy and for digital editions, ITF will be the starting point of unprecedented global and local collaborations. The rich but disorganized papers of the early modern mathematician Thomas Harriot, for instance, will finally benefit from a flexible framework that does not assume linearity from front cover to back cover, but rather enables multiple points of entry for various readers. Enhanced navigability and annotation will make Harriot's papers legible not only for researchers collaborating worldwide but for classrooms, where teachers seek ready ways to contextualize mathematical discoveries within their cultural moments.
ITF will also enable users to apply computational analysis tools to heterogeneous collections of text from diverse sources, which would have typically been avoided because they are difficult to use. A researcher could use existing text mining and machine learning tools to study patterns of citation and reference in the correspondence collections catalogues in Early Modern Letters Online by performing comparative topic and sentiment analyses of letter texts and the referenced works digitised by the Text Creation Partnership.
By removing the technical and infrastructural barriers, ITF will help to ensure that textual resources will then be better able to live up to the promises of the FAIR principles [https://www.go-fair.org/fair-principles/]; they will be Findable, Accessible, Interoperable, and Reusable.

Publications

10 25 50
 
Description We have now largely completed the initial "Research" phase of the project, seeking to gain an understanding of the diversity of text resources and approaches "in the field."

On of the key discoveries is that graph-like approaches to both textual versioning and text fragment co-ordinate systems need to be supported, in addition to more conventional linear or hierarchical approaches. In addition, there was a community view that there was a niche for a semi-structured text format, in between a purely visual representation and a linear document-style serialisation.
Exploitation Route Ultimately we hope to establish a standard which future developments in textual scholarship can use to improve their interoperability and sustainability. This has the potential to impact large scale text mining and analytics.
Sectors Digital/Communication/Information Technologies (including Software)

Culture

Heritage

Museums and Collections

Other

 
Description Collaboration with Notre-Dame 
Organisation University of Notre Dame
Country United States 
Sector Academic/University 
PI Contribution Notre-Dame is our NEH-funded partner on this project. We provide the conceptual basis for their work - establishing the concept of a Text Fragment API and the use of Web-annotation-based workflows and publication models to implement a variety of current textual practices in a consistent manner.
Collaborator Contribution Notre-Dame's focus in the project has been to investigate new approaches to textual scholarship that would be facilitated by an Interoperable Text Framework - with a focus on collaborative transcription, analysis and annotation.
Impact A number of collaborative transcription/annotation workshops, reports due shortly e.g. https://textframe.io/news/2022/09/12/harriot-de-infinitis-workshop/
Start Year 2022
 
Title Interoperable Text Framework API 
Description The ITF specification is intended to facilitate systematic referencing and reuse of textual resources in repositories in a manner that is both user- and machine-friendly. The specifications defines two API forms: one to request a text fragment, and a second to request technical information about the underlying source text. Both forms convey the request's information in the path segment of the URL, rather than as query parameters. This makes responses more easily cacheable by the server and standard web-caching infrastructure. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2023 
Open Source License? Yes  
Impact The specification is still undergoing development and refinement. However, it has already drawn interest from the broader digital text community through workshops and requests for Git access. 
URL https://github.com/UDT-ITF
 
Description Harriot De Infinitis Workshop 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Workshop on collaboratively transcribing and annotating the papers of Thomas Harriot
Year(s) Of Engagement Activity 2022
URL https://textframe.io/news/2022/09/12/harriot-de-infinitis-workshop/
 
Description Online Text Strand at Digital Humanities at Oxford Summer School 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact An online Textual Approaches strand organised by the UDT project for the Summer School, aimed at new and experienced Humanists, Librarians and Technologists with an interest in the range of techniques and approaches being used in textual scholarship. Fourteen online sessions were spread over the week of the Summer School. Sessions were delivered primarily by external (to Oxford) speakers that had participated in one or more UDT workshops, supported by some sessions from Oxford specialists. Details of the programme can be found here: https://digitalscholarship.web.ox.ac.uk/sitefiles/main-programme.pdf
Year(s) Of Engagement Activity 2023
URL https://digitalscholarship.web.ox.ac.uk/digital-humanities-oxford-summer-school
 
Description Presentation at "War, Trade and the Divisive Power of Citizenship" conference Sept 2022. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation at "War, Trade and the Divisive Power of Citizenship" conference Sept 2022. UDT was one of a number of projects discussed in the context of long term access to Humanities resources.
Year(s) Of Engagement Activity 2022
URL https://europa.unibas.ch/en/war-trade-citizenship/
 
Description Presentation at the CLARIN and Libraries Workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation on progress to the CLARIN and Libraries gGoup at the National Library of Norway with a focus on machine access to text and Large Language Models.
Year(s) Of Engagement Activity 2023
URL https://www.clarin.eu/event/2023/clarin-and-libraries-2023-large-language-models-and-libraries
 
Description Presentation to CLARIN Workshop on Libraries May 2022 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation to practitioners working on national and international infrastructures for the Humanities with a linguistic focus. Since the project aims to put in place standards that have the potential to take such endeavours to the next stage of development there was keen interest in collaboration.
Year(s) Of Engagement Activity 2022
URL https://www.clarin.eu/event/2022/clarin-and-libraries
 
Description Short presentation and discussion with the Oxford Digital Editions Community of Practice 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Short presentation and discussion with scholars in Oxford working on or around Digital Editions.
Year(s) Of Engagement Activity 2023
URL https://digitalscholarship.web.ox.ac.uk/event/digital-editions-community-practice
 
Description UDT Cambridge Workshop Feb 2023 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Workshop bringing together a variety of projects and practitioners working on scholarly workflows and textual editions to discuss how existing and future practice would map onto a text fragment, manifest and web annotation model. This activity also sought to recruit contributors towards a text manifest specification. Importantly, this workshop included contributions from the Text+ NFDI initiative in Germany and the the Humanities Cluster in the Netherlands.

A full write up is ongoing.
Year(s) Of Engagement Activity 2023
URL https://osf.io/aqdmh/
 
Description UDT Open Science Framework Site 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Primary website for running the operations of the UDT project (linked from the various institutional presences).
Year(s) Of Engagement Activity 2022
URL https://osf.io/r78gx/
 
Description UDT Oxford Workshop Jan 2023 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Workshop bringing together a variety of projects and practitioners working on access to text corpora and text fragments to discuss the requirements for a text fragment API and recruit potential contributors to such a specification. Importantly, this workshop included contributions from the Text+ NFDI initiative in Germany and the the Humanities Cluster in the Netherlands.

A full write up is ongoing.
Year(s) Of Engagement Activity 2023
URL https://osf.io/z2ybv/
 
Description Website for the Interoperable text Frameowrk 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The is the website for publishing to a readable form of the ITF specification, along with links to online versions of the various reports and other outputs (with references back to the citable versions with DOI's). It also includes details of how to participate and a calendar of activties.
Year(s) Of Engagement Activity 2023,2024
URL https://textframe.io