Unlocking Digital Texts: Towards an Interoperable Text Framework
Lead Research Organisation:
UNIVERSITY OF OXFORD
Department Name: Library Services
Abstract
A key challenge faced by digital text projects is encouraging cultural institutions, researchers and the wider-public to reuse and build upon their resources. Despite the scholarly effort put into creating them, digital texts do not seem to have the same 'long tail' use patterns that data from other disciplines have. One of the chief impediments has been that texts are produced and stored in formats that are hard to reuse. Many contain detailed contextual, semantic and presentational markup embedded with the underlying text. Even when these texts are encoded according to robust standards, such as Text Encoding Initiative (TEI), the content and style of the coding is fundamentally shaped by the editors' specific fields of study, languages or cultural norms. Reusing materials often requires project-specific code that embodies those principles and norms and sometimes even replicating the infrastructure that delivers them. This sets a high bar that only the most skilled, determined and well-funded researchers are able to surmount.
We aim to rectify this situation by defining an Interoperable Text Framework (ITF) and implementing exemplary test cases to demonstrate its strengths. We are not proposing a new format for encoding or storing text but rather a method for accessing and delivering textual resources (either whole documents or fragments) that are both readable by humans and also machine-friendly for computational analysis. When ITF is combined with other frameworks, such as IIIF and the W3C Web Annotation Data Model, it becomes possible to link texts, images, annotations and other online resources to construct narratives that can be visualised and navigated online.
ITF has the potential to transform online texts and editions into active online discourse by allowing multiple new narratives and analyses to be created around texts, without compromising the integrity of the originals. The partner projects all require such a capability and have already developed specific approaches that can inform the development of a more general and flexible standard.
ITF will enable researchers studying Samuel Beckett's works on the Beckett Digital Manuscript Project to construct their own narratives about his writing process. They can connect, display and analyse fragments from books in Beckett's library, where they were copied in notebook(s), and Beckett's subsequent intertextual reuse in his works. Readers can see and compare these multiple narratives and make their own inferences.
For digital pedagogy and for digital editions, ITF will be the starting point of unprecedented global and local collaborations. The rich but disorganized papers of the early modern mathematician Thomas Harriot, for instance, will finally benefit from a flexible framework that does not assume linearity from front cover to back cover, but rather enables multiple points of entry for various readers. Enhanced navigability and annotation will make Harriot's papers legible not only for researchers collaborating worldwide but for classrooms, where teachers seek ready ways to contextualize mathematical discoveries within their cultural moments.
ITF will also enable users to apply computational analysis tools to heterogeneous collections of text from diverse sources, which would have typically been avoided because they are difficult to use. A researcher could use existing text mining and machine learning tools to study patterns of citation and reference in the correspondence collections catalogues in Early Modern Letters Online by performing comparative topic and sentiment analyses of letter texts and the referenced works digitised by the Text Creation Partnership.
By removing the technical and infrastructural barriers, ITF will help to ensure that textual resources will then be better able to live up to the promises of the FAIR principles [https://www.go-fair.org/fair-principles/]; they will be Findable, Accessible, Interoperable, and Reusable.
We aim to rectify this situation by defining an Interoperable Text Framework (ITF) and implementing exemplary test cases to demonstrate its strengths. We are not proposing a new format for encoding or storing text but rather a method for accessing and delivering textual resources (either whole documents or fragments) that are both readable by humans and also machine-friendly for computational analysis. When ITF is combined with other frameworks, such as IIIF and the W3C Web Annotation Data Model, it becomes possible to link texts, images, annotations and other online resources to construct narratives that can be visualised and navigated online.
ITF has the potential to transform online texts and editions into active online discourse by allowing multiple new narratives and analyses to be created around texts, without compromising the integrity of the originals. The partner projects all require such a capability and have already developed specific approaches that can inform the development of a more general and flexible standard.
ITF will enable researchers studying Samuel Beckett's works on the Beckett Digital Manuscript Project to construct their own narratives about his writing process. They can connect, display and analyse fragments from books in Beckett's library, where they were copied in notebook(s), and Beckett's subsequent intertextual reuse in his works. Readers can see and compare these multiple narratives and make their own inferences.
For digital pedagogy and for digital editions, ITF will be the starting point of unprecedented global and local collaborations. The rich but disorganized papers of the early modern mathematician Thomas Harriot, for instance, will finally benefit from a flexible framework that does not assume linearity from front cover to back cover, but rather enables multiple points of entry for various readers. Enhanced navigability and annotation will make Harriot's papers legible not only for researchers collaborating worldwide but for classrooms, where teachers seek ready ways to contextualize mathematical discoveries within their cultural moments.
ITF will also enable users to apply computational analysis tools to heterogeneous collections of text from diverse sources, which would have typically been avoided because they are difficult to use. A researcher could use existing text mining and machine learning tools to study patterns of citation and reference in the correspondence collections catalogues in Early Modern Letters Online by performing comparative topic and sentiment analyses of letter texts and the referenced works digitised by the Text Creation Partnership.
By removing the technical and infrastructural barriers, ITF will help to ensure that textual resources will then be better able to live up to the promises of the FAIR principles [https://www.go-fair.org/fair-principles/]; they will be Findable, Accessible, Interoperable, and Reusable.
Publications

Hawkins M
(2024)
UDT Report on Cambridge Workshop

Jefferies N
(2024)
UDT Report on Oxford Workshop

Jefferies N S
(2022)
UDT Position Paper December 2022
Description | We have now largely completed the initial "Research" phase of the project, seeking to gain an understanding of the diversity of text resources and approaches "in the field." On of the key discoveries is that graph-like approaches to both textual versioning and text fragment co-ordinate systems need to be supported, in addition to more conventional linear or hierarchical approaches. In addition, there was a community view that there was a niche for a semi-structured text format, in between a purely visual representation and a linear document-style serialisation. |
Exploitation Route | Ultimately we hope to establish a standard which future developments in textual scholarship can use to improve their interoperability and sustainability. This has the potential to impact large scale text mining and analytics. |
Sectors | Digital/Communication/Information Technologies (including Software) Culture Heritage Museums and Collections Other |
Description | Collaboration with Notre-Dame |
Organisation | University of Notre Dame |
Country | United States |
Sector | Academic/University |
PI Contribution | Notre-Dame is our NEH-funded partner on this project. We provide the conceptual basis for their work - establishing the concept of a Text Fragment API and the use of Web-annotation-based workflows and publication models to implement a variety of current textual practices in a consistent manner. |
Collaborator Contribution | Notre-Dame's focus in the project has been to investigate new approaches to textual scholarship that would be facilitated by an Interoperable Text Framework - with a focus on collaborative transcription, analysis and annotation. |
Impact | A number of collaborative transcription/annotation workshops, reports due shortly e.g. https://textframe.io/news/2022/09/12/harriot-de-infinitis-workshop/ |
Start Year | 2022 |
Title | Interoperable Text Framework API |
Description | The ITF specification is intended to facilitate systematic referencing and reuse of textual resources in repositories in a manner that is both user- and machine-friendly. The specifications defines two API forms: one to request a text fragment, and a second to request technical information about the underlying source text. Both forms convey the request's information in the path segment of the URL, rather than as query parameters. This makes responses more easily cacheable by the server and standard web-caching infrastructure. |
Type Of Technology | New/Improved Technique/Technology |
Year Produced | 2023 |
Open Source License? | Yes |
Impact | The specification is still undergoing development and refinement. However, it has already drawn interest from the broader digital text community through workshops and requests for Git access. |
URL | https://github.com/UDT-ITF |
Description | Harriot De Infinitis Workshop |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Workshop on collaboratively transcribing and annotating the papers of Thomas Harriot |
Year(s) Of Engagement Activity | 2022 |
URL | https://textframe.io/news/2022/09/12/harriot-de-infinitis-workshop/ |
Description | Online Text Strand at Digital Humanities at Oxford Summer School 2023 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | An online Textual Approaches strand organised by the UDT project for the Summer School, aimed at new and experienced Humanists, Librarians and Technologists with an interest in the range of techniques and approaches being used in textual scholarship. Fourteen online sessions were spread over the week of the Summer School. Sessions were delivered primarily by external (to Oxford) speakers that had participated in one or more UDT workshops, supported by some sessions from Oxford specialists. Details of the programme can be found here: https://digitalscholarship.web.ox.ac.uk/sitefiles/main-programme.pdf |
Year(s) Of Engagement Activity | 2023 |
URL | https://digitalscholarship.web.ox.ac.uk/digital-humanities-oxford-summer-school |
Description | Presentation at "War, Trade and the Divisive Power of Citizenship" conference Sept 2022. |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation at "War, Trade and the Divisive Power of Citizenship" conference Sept 2022. UDT was one of a number of projects discussed in the context of long term access to Humanities resources. |
Year(s) Of Engagement Activity | 2022 |
URL | https://europa.unibas.ch/en/war-trade-citizenship/ |
Description | Presentation at the CLARIN and Libraries Workshop |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation on progress to the CLARIN and Libraries gGoup at the National Library of Norway with a focus on machine access to text and Large Language Models. |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.clarin.eu/event/2023/clarin-and-libraries-2023-large-language-models-and-libraries |
Description | Presentation to CLARIN Workshop on Libraries May 2022 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation to practitioners working on national and international infrastructures for the Humanities with a linguistic focus. Since the project aims to put in place standards that have the potential to take such endeavours to the next stage of development there was keen interest in collaboration. |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.clarin.eu/event/2022/clarin-and-libraries |
Description | Short presentation and discussion with the Oxford Digital Editions Community of Practice |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | Short presentation and discussion with scholars in Oxford working on or around Digital Editions. |
Year(s) Of Engagement Activity | 2023 |
URL | https://digitalscholarship.web.ox.ac.uk/event/digital-editions-community-practice |
Description | UDT Cambridge Workshop Feb 2023 |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Workshop bringing together a variety of projects and practitioners working on scholarly workflows and textual editions to discuss how existing and future practice would map onto a text fragment, manifest and web annotation model. This activity also sought to recruit contributors towards a text manifest specification. Importantly, this workshop included contributions from the Text+ NFDI initiative in Germany and the the Humanities Cluster in the Netherlands. A full write up is ongoing. |
Year(s) Of Engagement Activity | 2023 |
URL | https://osf.io/aqdmh/ |
Description | UDT Open Science Framework Site |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Primary website for running the operations of the UDT project (linked from the various institutional presences). |
Year(s) Of Engagement Activity | 2022 |
URL | https://osf.io/r78gx/ |
Description | UDT Oxford Workshop Jan 2023 |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Workshop bringing together a variety of projects and practitioners working on access to text corpora and text fragments to discuss the requirements for a text fragment API and recruit potential contributors to such a specification. Importantly, this workshop included contributions from the Text+ NFDI initiative in Germany and the the Humanities Cluster in the Netherlands. A full write up is ongoing. |
Year(s) Of Engagement Activity | 2023 |
URL | https://osf.io/z2ybv/ |
Description | Website for the Interoperable text Frameowrk |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | The is the website for publishing to a readable form of the ITF specification, along with links to online versions of the various reports and other outputs (with references back to the citable versions with DOI's). It also includes details of how to participate and a calendar of activties. |
Year(s) Of Engagement Activity | 2023,2024 |
URL | https://textframe.io |