Mapping Manuscript Migrations: Digging into data for the history and provenance of pre-modern European manuscripts

Lead Research Organisation: University of Oxford
Department Name: Engineering Science

Abstract

Hundreds of thousands of European pre-modern manuscripts have survived until the present day. As the result of changes in their ownership over the centuries, they are now spread all over the world. Collectively they constitute a great cultural and scholarly treasure. There are many sources of data relating to them, and new sources continue to proliferate in the digital environment. This project will link disparate datasets from Europe and North America to provide an international view of the history and provenance of these manuscripts. The aggregated data will enable researchers to analyse and visualize these topics at scales ranging from individual manuscripts to thousands of manuscripts. We will be able to show how these manuscripts have travelled across time and space to their current locations, where they continue to find new audiences.

The project will also be of particular relevance and value to libraries and other collecting institutions. The results of its analyses will situate their manuscript collections in the broader historical context of patterns and trends in collecting, while its methodology and its body of data will provide a very important resource for further aggregation and exploration in the future. The data linkage techniques and visualization methodologies deployed by the project will be of wider applicability to all kinds of cultural heritage objects and collections as well as manuscripts.

Planned Impact

The target audiences for the work of the Project are: manuscript researchers, collection and data custodians in libraries and museums, the digital humanities community, and the Linked Data community. The benefits to these groups will include: integrated and sophisticated access to data about manuscripts, with the potential to carry out their own research; new large-scale analyses and visualizations of manuscript histories; and a working environment for Linked Data in medieval and Renaissance studies, which can be built on in the future. Manuscript researchers and data custodians will be able to use the visualizations as the basis for outreach and dissemination of their work and information about their collections to a wider community audience.

The Project will engage with these target audiences through the connections of the Project Team members and through the following communication channels. A detailed Outreach Plan will be developed in the first stage of the Project. The Project will establish its own Web site to report progress, discuss issues, and link to datasets and data products. The site will include a blog, to which all members of the Project Team will be able to contribute. The Project will also establish a Twitter account which will leverage the existing Twitter accounts of Project Team members for re-tweets and recruiting followers, as well as followers of @DiggingIntoData. Videos explaining the work of the Project will be posted to the Schoenberg Institute's YouTube channel.

Training for researchers in the digital humanities and manuscript studies will take place in 2018/9 through workshops and short courses organized by the Schoenberg Institute, the IRHT, and the Digital Humanities Summer School at Oxford University. Workshops will also be offered for librarians and curators through professional bodies. Mentoring will be embedded in the Project through the employment of postdoctoral researchers. Members of the Project Team will present reports on progress at conferences relevant to medieval and Renaissance studies (Medieval Academy, Renaissance Society, ICMS at Kalamazoo and Leeds), digital humanities (international and national conferences), Linked Data and the Semantic Web (ESWC), and library and museum conferences. Members of the Project Team will submit articles for publication in refereed journals across the range of disciplines covered by the Project: manuscript studies, medieval and Renaissance studies, digital humanities, library and museum practice and collecting, and Linked Data and Semantic Web research.
 
Description Data from three heterogeneous sources have been combined into a single Web portal, which can be browsed and searched in a variety of ways. Evaluation through the comparative use of a suite of 25 research questions shows that the Web portal makes it possible to answer questions which cannot be answered in the native interfaces to the component sources.
The project has demonstrated the applicability and effectiveness of Linked Open Data approaches to modelling and organizing knowledge about the history and provenance of medieval and Renaissance manuscripts.
The project has, for the first time, demonstrated a method for transforming TEI (Text Encoding Initiative) encoded documents describing manuscripts into Resource Description Framework (RDF) triples.
Exploitation Route Further use and refinement of the Unified Data Model for describing manuscript histories developed by the project, based on the CIDOC-CRM and FRBR ontologies.
Contribution of additional data sources to the project's Web portal, using transformation pipelines developed by the project.
As a new platform for aggregating information about medieval and Renaissance manuscripts, replacing existing superseded approaches.
As a basis for the wider use of Linked Open Data identifiers and vocabularies for medieval and Renaissance studies to integrate knowledge bases relevant to these subject areas.
Sectors Culture, Heritage, Museums and Collections

 
Description The findings of the project are being considered by the Consortium of European Research Libraries (CERL) in the context of redeveloping its own, obsolete manuscripts portal. CERL represents national and regional libraries (e.g., British Library, Royal Library of the Netherlands, Bibliothèque nationale de France) as well as university libraries.
First Year Of Impact 2020
Sector Culture, Heritage, Museums and Collections
 
Title Conversion of TEI-XML documents to RDF Linked Data 
Description Descriptions of medieval manuscripts in the Bodleian Library and other Oxford libraries are made available as XML documents encoded in accordance with the TEI (Text Encoding Initiative) Guidelines for manuscript descriptions. As part of the Mapping Manuscript Migrations project, we have developed a pipeline for converting these XML documents into RDF triples which can be ingested into a Linked Data triple store. This pipeline involves extracting relevant portions of the XML documents, converting these extracts into a single XML document, mapping this document to RDF triples using a Unified Data Model developed by the project, and uploading the RDF triples to the project's triple store. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact We are just beginning the process of publicizing this method to the community of manuscript librarians and researchers. 
URL https://github.com/bodleian/medieval-mss
 
Title Mapping Manuscript Migrations Web portal 
Description This database combines data from three existing databases (Schoenberg Database of Manuscripts, Bibale, and Medieval Manuscripts in Oxford Libraries) into a unified RDF Triple Store. Outputs from the three databases are transformed into RDF using a Unified Data Model based on two published ontologies which are widely deployed in the cultural heritage knowledge sector: CIDOC-CRM and FRBR. Entities referenced in each database (persons, places, organizations) have been reconciled and matched using Linked Open Data services like the Getty Thesaurus of Geographical Names and VIAF. The MMM portal provides a user interface for visualizing and exploring the combined data relating to 216,000 manuscripts. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact The MMM portal was launched in January 2020. Initial feedback via social media and at conference presentations has been very positive. Initial usage in the first month of public availability was 1,500 users with 2,000 sessions. Most of the usage came from the United States, France, the United Kingdom, Germany, and Finland. 79% of users were from a desktop, with 21 from a mobile or tablet. 
URL http://mappingmanuscriptmigrations.org
 
Title Mapping Manuscript Migrations underlying data 
Description The underlying data created by the Mapping Manuscript Migrations project have been made available for reuse through the Zenodo data repository, in the form of RDFD triples in a Turtle serialization. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact The dataset has been downloaded seven times in its first month of availability. 
URL https://zenodo.org/record/3667486
 
Title Unified Data Model for manuscript provenance data 
Description This Unified Data Model has been developed using elements from the CIDOC-CRM and FRBR ontologies. The model is used to aggregate data from three different manuscript databases and transform them into RDF Linked Data. 
Type Of Material Data handling & control 
Year Produced 2020 
Provided To Others? Yes  
Impact We have begun the process of publicizing the model at relevant conferences and workshops. 
URL https://drive.google.com/open?id=18ZcCXNljPPlBZVJt4wLGjMNKPYhDqVVr
 
Description Mapping Manuscript Migrations 
Organisation Aalto University
Country Finland 
Sector Academic/University 
PI Contribution The research team at the Oxford e-Research Centre is contributing: specialist knowledge in computer science (Linked Data, ontologies, data modelling) and manuscript studies as well as project coordination and conceptualization.
Collaborator Contribution The other partners are contributing: specialist knowledge in computer science (Linked Data, ontologies, data modelling), database design and manuscript studies.
Impact Multi-disciplinary: historical studies, library and information science, computer science
Start Year 2017
 
Description Mapping Manuscript Migrations 
Organisation Research Institute and History of Texts
Country France 
Sector Charity/Non Profit 
PI Contribution The research team at the Oxford e-Research Centre is contributing: specialist knowledge in computer science (Linked Data, ontologies, data modelling) and manuscript studies as well as project coordination and conceptualization.
Collaborator Contribution The other partners are contributing: specialist knowledge in computer science (Linked Data, ontologies, data modelling), database design and manuscript studies.
Impact Multi-disciplinary: historical studies, library and information science, computer science
Start Year 2017
 
Description Mapping Manuscript Migrations 
Organisation University of Pennsylvania
Country United States 
Sector Academic/University 
PI Contribution The research team at the Oxford e-Research Centre is contributing: specialist knowledge in computer science (Linked Data, ontologies, data modelling) and manuscript studies as well as project coordination and conceptualization.
Collaborator Contribution The other partners are contributing: specialist knowledge in computer science (Linked Data, ontologies, data modelling), database design and manuscript studies.
Impact Multi-disciplinary: historical studies, library and information science, computer science
Start Year 2017
 
Title Mapping Manuscript Migrations (MMM) sotfware and scripts 
Description (1) Data conversion pipeline for converting source data (in RDF triples) to the MMM data model (2) Software in node.js for creating MMM Web portal interface (3) Docker container for running Fuseki triplestore with MMM Knowledge Graph (4) Scripts for producing RDF triples from TEI-XML for the MMM project 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact Download statistics are not available. Software availability has only been publicized since January 2020. 
 
Description DReAM Lab workshop: Linked Data for the Humanities 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Dream Lab (Digital Resources and Methods) is a week-long digital humanities training opportunity hosted at the University of Pennsylvania and designed to help humanists become more confident and thoughtful users, creators and critics of digital technology.

19 people attended the 'Linked Data for the Humanities: a semantic web of scholarly data' workshop run and taught by Kevin Page and David Lewis. This drew upon data and tools developed at the University of Oxford e-Research Centre, including from the FAST, Unlocking Musicology, Linked Art, and Mapping Manuscript Migrations projects.

Delegates left the workshop with practical experience, new knowledge, and enthusiasm for Linked Data approaches, alongside new familiarity with these projects.
Year(s) Of Engagement Activity 2019
 
Description Focus Group for manuscript researchers 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact A group of manuscript researchers, drawn primarily from the Oxford area, was convened to discuss their requirements for a digital discovery service which would improve their access to data about the history and provenance of medieval and Renaissance manuscripts. Many of the participants provided useful feedback for the project in relation to the requirements for designing such a service and their research questions and interests. Most of the participants subsequently followed the project on Twitter.
Year(s) Of Engagement Activity 2017
 
Description Presentation at the Digital Humanities 2019 conference, July 2019, Utrecht, Netherlands (Kevin Page) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Kevin Page presented "A Layered Digital Library for Cataloguing and Research: Practical Experiences with Medieval Manuscripts, from TEI to Linked Data" at the DH2019 conference, the largest annual gathering of those working on and in the digital humanities. Co-authors of the work presented were Toby Burrows, Andrew Hankinson, Matthew Holford, Andrew Morrison, David Lewis, and Athanasios Velios.
Year(s) Of Engagement Activity 2019
 
Description Presentation to CERL Working Group on a New Manuscripts Portal 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The results of the project were presented to a Working Group of the Consortium of European Research Libraries (CERL). The Working Group is considering the future replacement of its existing Portal for data about medieval and Renaissance manuscripts, and our presentation was positively received as outlining an innovative technical framework for a new CERL Portal.
Year(s) Of Engagement Activity 2020
 
Description Workshop at Digital Humanities 2019 conference, July 2019, Utrecht, Netherlands 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The workshop "Tracing the Complex History of Manuscripts Using Linked Data" at DH2019 focussed on approaches for deploying Linked Data methodologies to aggregate complex data relating to the history and provenance of manuscripts, and to address large-scale research questions in this field through analysis and visualization. The workshop was delivered by members of the Mapping Manuscript Migrations project team: Toby Burrows, Kevin Page, David Lewis, Emma Cawlfield, and Jouni Tuominen.
Year(s) Of Engagement Activity 2019
 
Description Workshops co-chair of ACM/IEEE Joint Conference on Digital Libraries, June 2019 (Kevin Page) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Kevin Page was invited to be co-chair of the workshops and tutorials programme for the ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2019, held at the University of Illinois Urbana-Champaign, USA, in June 2019. The conference is a key annual event for the digital libraries community; 4 workshops and and 5 tutorials were accepted and co-ordinated after review.
Year(s) Of Engagement Activity 2019
URL https://2019.jcdl.org/