📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Infrastructure for Digital Arts and Humanities: a National Repository for Literature and Linguistics Continuation 2023-2025

Lead Research Organisation: University of Oxford
Department Name: Linguistics Philology and Phonetics

Abstract

The project will operate a repository service for researchers across the UK for the curation of digital outputs of projects in literary and linguistic disciplines, acting as a hub in the new national digital
curation service from June 2023, funded by AHRC as part of the Infrastructure for Digital Arts and Humanities (iDAH) programme. The project will continue to carry out the necessary organizational,
administrative and technical tasks to maintain and enhance the service. The repository will build on the collections, reputation, expertise and experience of the Oxford Text Archive (OTA), which has
long been a de facto pillar of the research infrastructure in the UK. The work will focus on three thematic areas: (i) collections development, (ii) delivering research data for re-use, and (iii) connecting to the national and international research infrastructure.

Publications

10 25 50
 
Description Fifteen digital datasets which were outputs of research projects were deposited in the Oxford Text Archive collections and made available for researchers and the general public. Information on how to find and use these (and the more than 70,000 legacy resources) are available to researchers worldwide, and can be found on our website and via the CLARIN Virtual Language Observatory, alongside resources from other repositories. The award also allowed the ongoing coordination of the CLARIN-UK network, whereby leading UK experts in corpus and computational linguistics were able to promote interoperability of their resources, across the UK and Europe, and participate in collaborative research projects and training initiatives. A new CLARIN Knowledge Centre for Digital Resources for the Languages of Ireland and Britain was set up and put into operation, offering advice and support on how to find and use data and software for Irish, Scottish Gaelic, Welsh, English and other languages, including in regional and historic varieties. Two training workshops were also carried out for researchers , in order to share expertise and promote good practice in the creation of digital resources.
Exploitation Route The main outcomes are re-usable datasets, shared via open licences, for others to use in their research.
The CLARIN Knowledge Centre for Digital Resources for the Languages of Ireland and Britain is an ongoing initiative which will provide advice and support on how to find and use data and software in this domain.
Sectors Education

 
Description The project continues to ensure the ongoing long-term preservation of works of national and international importance. While these works are the output of academic research projects, they include the only digital editions of printed works, and reference information about a number of languages and language varieties, e.g. the British National Corpus, the most important reference snapshot of usage of the English language in speech and writing from the 1980s. These works are of interest and use to commercial sector (publishers of dictionaries, grammars and teaching materials, developers of large language models), the general public (for whom the works are made available free for reading and research), journalists, and policymakers, etc.
First Year Of Impact 2024
Sector Creative Economy,Digital/Communication/Information Technologies (including Software),Education
Impact Types Cultural

Societal

 
Title Building Corpora for Lesser-resourced Languages 
Description How to build textual corpora for research into under-resourced languages. This workflow was produced collaboratively with other scholas. It was designed during the workshop "Creating Managing and Archiving Textual Corpora in Under-resourced Languages". The workshop was conceived by DARIAH Working Groups Research Data Management and Multilingual DH, financed by DARIAH-EU Funding Scheme for Working Group Activities 2023-25, and hosted by the University of Hamburg on 28th to 30th August 2024. 
Type Of Material Data analysis technique 
Year Produced 2024 
Provided To Others? Yes  
Impact None known so far. 
URL https://marketplace.sshopencloud.eu/workflow/67KJnp
 
Title CLARIN Resource Family: Corpus Query Tools 
Description A curated catalogue of research software packages and online interfaces for corpus lingustic research. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
Impact Poster presentations at Corpus Linguistics 2023 and CLARIN Annual Conference 2023. 
URL https://www.clarin.eu/resource-families/corpus-query-tools
 
Title Frequently repeated clusters of words in Early English Books Online 
Description Lists of repeated clusters of words, lemmata and part-of-speech tags derived from the 60238 works in the public domain from the Early English Books Online collection, as made available in the Oxford Text Archive collections in late 2023. In each case. the list contains the top 4000 most frequent clusters (or "n-grams"). The lists are made available as a lexical resource for exploring n-grams in historical English texts. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact A conference presentation and open access publication were based on the analysis of this dataset. See https://zenodo.org/records/10497492. 
URL http://hdl.handle.net/20.500.14106/2570
 
Description Anglo Norman Dictionary 
Organisation Aberystwyth University
Department Department of Modern Languages
Country United Kingdom 
Sector Academic/University 
PI Contribution We provided advice about the long term storage of the Dictionary's data and the best ways to archive it. We also provided a letter of support for their funding application.
Collaborator Contribution Our partners agreed to deposit a copy of the dictionary data when it was available.
Impact A funding application for the Anglo-Norman Dictionary was submitted with our letter of support.
Start Year 2024
 
Description CLARIN ERIC 
Organisation Clarin EU
Country Netherlands 
Sector Charity/Non Profit 
PI Contribution The award has helped to facilitate and fund the activities of the national research infrastructure which is a node in the CLARIN European Research Infrastructure Consortium, supporting the activity of the national repository, the role of the national coordinator, and the activities of the national CLARIN-UK consortium. The PI has acted as UK representative in the CLARIN General Assembly, National Coordinators' Forum, User Involvement Committee, Standing Committee of CLARIN Technical Centres and in training and education activities. The PI has advised AHRC officers and government on matters relating to membership of the CLARIN ERIC, and helped to facilitate the UK joining as an 'Extraordinary Observer'.
Collaborator Contribution The partners in the CLARIN-UK consortium, coordinated by the research team, have contributed resources to CLARIN repositories, conduc
Impact The UK has joined CLARIN ERIC as an 'Extraordinary Observer', having been a normal Observer since 2016, and now effectively allowing researchers in the UK to access all of the benefits of researchers in full member countries. As a result, all members of UK HEIs are able to access secure resources via institutional single sign-on, have access to CLARIN funding opportunities, and are able to participate in collaborative research activities and working groups.
Start Year 2021
 
Description DR-LIB: CLARIN Knowledge Centre for Digital Resources for the Languages in Ireland and Britain 
Organisation Clarin EU
Country Netherlands 
Sector Charity/Non Profit 
PI Contribution Initiated the collaboration, found partners, submitted proposal to CLARIN ERIC, hosting the centre and web presence for it. We also took the lead in proposals for papers, posters and panel session for conferences in 2025.
Collaborator Contribution Partners in numerous universities and other research organizations agreed to join the panel of experts, and collaborate on joint proposals.
Impact The centre received accreditation as an official CLARIN Knowledge Centre (see https://www.clarin.eu/k-centres-catalogue?search_api_fulltext=DR-LIB). Panel and poster proposals have been submitted for the Corpus Linguistics 2025 conference and the UK-Ireland Digital Humanities Association annual event.
Start Year 2024
 
Description European Holocaust Research Infrastructure 
Organisation Masaryk Institute and Archive
Country Czech Republic 
Sector Public 
PI Contribution Initiatied and co-organized the first and subsequent workshops, panel session, conference presentations and project proposals.
Collaborator Contribution Co-organized workshops, conference submissions and project proposals.
Impact Using Holocaust Testimonies as Research Data, King's College, London, 15-17 May 2023 Natural Language Processing Meets Holocaust Archives, workshop at Charles University, Prague, 27-28 March 2024 Holocaust Testimonies as Language Resources, Pre-conference workshop at LREC-COLING 2024, Turin, 21 May 2024 Project proposal Holocaust Testimonies as Open Research Data submitted to OSCARS 1st Open Call for Open Science Projects and Services, May 2024 Paper and poster presented at the EHRI Academic Conference, Warsaw, 18 June 2024 Natural Language Processing Meets Holocaust Research, panel session at the Lessons and Legacies Conference, Los Angeles, 14-17 November 2024 Unlocking Holocaust Testimonies Datathon, ELTE, Budapest, 26-27 February 2025 Dataset, theme and domain expert organized for Helsinki Digital Humanities Hackathon 2025
Start Year 2023
 
Description CLARIN and Libraries workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The workshop builds upon the first CLARIN and Libraries workshop held in the Hague in May 2022. The workshop investigated areas of collaboration between CLARIN-related initiatives and libraries with a special emphasis on building (large) language models in and in cooperation with libraries. The workshop brought together for the second time a group of people associated with both CLARIN (or other research infrastructures) and libraries. Whereas the first CLARIN and Libraries workshop was particularly concerned with digital content delivery for researchers, the main theme of the second workshop was large language models and library collections, e.g. technical challenges in building such models and legal implications of model training and use. The host, the National Library of Norway (NLN), has since 2005 digitised its entire text collections, amounting at present to a large corpus of 160 billion words for Norwegian and has built large language models for text (BERT, GPT-2, T5) and speech (wav2vec, Whisper) on these collections. There were keynotes from the National Libraries of Norway and Germany on the technical aspects of building such models in a library setting, as well as a keynote on the legal aspects of building large language models from the Swedish National Library.
Year(s) Of Engagement Activity 2023,2024
URL https://www.clarin.eu/event/2023/clarin-and-libraries-2023-large-language-models-and-libraries
 
Description CLARIN-UK poster presentation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation of the CLARIN-UK network and associated research activity at an event to discuss and further collaboration in the digital humanities in the UK, and to promote collaboration with European research infrastructures.
Year(s) Of Engagement Activity 2024
URL https://dcch.leeds.ac.uk/events/uk-dariah-day/
 
Description Computational Literary Studies UK Training School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The event was the first on-site Computational Literary Studies Training School in the UK. Computational Literary Studies (CLS) uses innovative methods and tools that offer different kinds of analyses by investigating large corpora. Scholars use text analysis software and programming languages to analyse hundreds of texts, which can be used to track character knowledge; illuminate dramatic development of plays; and help reveal the author identity of previously anonymous texts. Staff from our project delivered sessions on creating a literary corpus, and using corpus analysis tools to explore and analyse literary corpora.
Year(s) Of Engagement Activity 2024
URL https://www.wlv.ac.uk/research/research-centres/cttr---centre-for-transnational-and-transcultural-re...
 
Description DR-LIB poster presentation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We presented the new CLARIN Knowledge Centre for Digital Resources for the Languages in Ireland and Britain at the CLARIN annual conference in Barcelona to raise awareness for the new centre. We made new contacts and began discussions with researchers to help support their research on topics in the K-centre's remit.
Year(s) Of Engagement Activity 2024
URL https://www.clarin.eu/event/2024/clarin-annual-conference-2024
 
Description Datathon: Unlocking Holocaust Testimony 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A hands-on workshop in which participants used an online digital interface to annotate early Holocaust testimonies.
Year(s) Of Engagement Activity 2024
URL https://www.ehri-project.eu/call-for-applications-unlocking-holocaust-testimony-ehri-clarin-datathon...
 
Description LancsBox X free online training event 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact An online CLARIN-UK webinar given by Vaclav Brezinska of Lancaster University on using the LancsBox X software tool for linguistic research, sponsored and supported by CLARIN-UK.
Year(s) Of Engagement Activity 2023
URL https://www.clarin.ac.uk/event/lancsbox-x-free-online-training
 
Description Libraries as Data Infrastructures: Towards a CENL Dialogue Forum 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation at the academic conference 'DARIAH Annual Event 2024' in Budapest, Hungary. This was a joint presentation based on a collaboration initiated by this project, and included co-presenters: Sally Chambers, DARIAH-EU, KBR Royal Library of Belgium & Ghent University
Peter Leinen, German National Library
Andreas Witt, Leibniz Institute for the German Language & University of Mannheim
Martin Wynne, University of Oxford
Martijn Kleppe, National Library of the Netherlands
Frédéric Lemmers, KBR Royal Library of Belgium
Hélène Bergès, Bibliothèque nationale de France
Marie Carlin, Bibliothèque nationale de France
Year(s) Of Engagement Activity 2024
URL https://b2drop.eudat.eu/s/GCEqWWqrJwic6Qx
 
Description London Rare Books School Digital Editing Short Course 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact 9 students attended a hybrid short course on digital editing co-taught by the School of Advanced Study and the Oxford Text Archive. Our goal was to raise awareness of the OTA to increase users and deposits. We have seen sign-ins to the LLDS service (where the OTA is hosted) increase since the event.
Year(s) Of Engagement Activity 2025
URL https://ies.sas.ac.uk/events/digital-editing
 
Description Natural Language Processing meets Holocaust Research 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation at the EHRI Academic Conference in Warsaw o nthe topic of the collaboration with EHRI.
Year(s) Of Engagement Activity 2024
URL https://www.ehri-project.eu/ehri-academic-conference-researching-holocaust-digital-age/
 
Description Panel session: Natural Language Processing meets Holocaust Research 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Panel session convened at the Lessons and Legacies Conference in Los Angeles.
Year(s) Of Engagement Activity 2024
URL https://hef.northwestern.edu/lessons-and-legacies-conference/2024-ll-program-final.pdf
 
Description SunoikisisDC Analysing and visualing texts session 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact We co-taught a session on text analysis in the SunoikisisDC programme, which provides online lectures on digital classics to supplement in person learning. We used this session to raise awareness of the Oxford Text Archive as a useful repository of digital texts. As a result we have seen a rise in LLDS sign-ins since the session.
Year(s) Of Engagement Activity 2025
URL https://www.youtube.com/watch?v=xNihExxxOy0
 
Description Training workshop: Create a Digital Edition using LEAF Commons 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Workshop held at the School of Advanced Study, with invited tutor from Newcastle University, in February 2025 for training participant in the use of the LEAF software suite for encoding texts for literary and linguistic study.
Year(s) Of Engagement Activity 2025
URL https://www.sas.ac.uk/news-events/events/create-digital-edition-using-leaf-commons
 
Description Training workshop: Getting started with Transkribus 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A training workshop hosted at the University of Edinburgh, with an invited tutor from the University of Sheffield, for creators of digital resources based on handwritten sources.
Year(s) Of Engagement Activity 2025
URL https://www.cdcs.ed.ac.uk/events/getting-started-with-transkribus
 
Description What can you do with the CLARIN research infrastructure? 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Pre-conference tutorial workshop, demonstrating to researchers how to access and use the CLARIN European Research Infrastructure in their research.
Year(s) Of Engagement Activity 2023
URL https://www.clarin.eu/event/2023/clarin-corpus-linguistics-2023
 
Description Workshop: Holocaust Testimonies as Language Resources 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Pre-conference workshop organized at the LREC-COLING international conference in Turin, Italy in May 2024, with keynotes and papers from scholars around the world.
Year(s) Of Engagement Activity 2024
URL https://www.clarin.eu/HTRes2024