Towards a National AI-Enabled Repository for Wales

Lead Research Organisation: Aberystwyth University
Department Name: Computer Science

Abstract

The project will develop a prototype distributed National trusted data repository (TDR) for Wales (NTDRW), which will have artificial intelligence (AI)-enabled functionality for knowledge discovery. The project will be developed in co-creation with stakeholders to assess readiness using appropriate toolsets and through a series of focus groups. These two aspects will concentrate on the following four TDR areas: data capture, data repositories, AI enabled search and knowledge discovery. The project will provide a detailed specification, including technical requirements, post-construction operational costs, and training requirements for the NTDRW.

One of the aims of the project is to bring the AHRC born-digital data community together. This is a collaborative project, which involves the following academic partners Aberystwyth University, Bangor University, Cardiff Metropolitan University, Cardiff University, Open University in Wales, Swansea University, and University of South Wales; and the following non-academic partners Archive and Records Council Wales, Cadw, Canolfan Beddwyr, the Digital Preservation Coalition, Eisteddfod Genedlaethol Cymru, the National Library of Wales, the Royal Commission on the Ancient and Historical Monuments Wales, and Welsh Government.

Publications

10 25 50
 
Description 5. Top 5 recommendations
5.1. Establishment of a bilingual AI-enhanced interoperable discovery layer for a trusted repository for Wales based on a distributed custody and storage model, enabling cross-search, knowledge discovery, metadata enhancement and linked data capabilities for digital arts and humanities data.
5.2. Development of an API for both display and re-ingest of AI-enhanced metadata. The DSpace APIs already provide a range of functions for querying metadata, but we will investigate whether these are sufficient or if additional APIs need to be implemented.
5.3. Development of a trusted storage environment for estray data, which is also available as additional storage capacity for member repositories.
5.4. Development of a training framework for the arts and humanities based on the FAIR Principles to include: data preparation and ingest, metadata generation, data protection legislation, intellectual property rights and open licencing, repository assessment frameworks, and AI functionality and ethics.
5.5. Establishment of an ethics board to develop an ethical governance framework and to oversee organisational and technical developments, including a framework for human involvement in training AI algorithms.
Exploitation Route These findings could be used for specific AI developments for Trusted Digital Repositories.
These findings could be used to further develop a National Trusted Digital Repository for Wales.
Sectors Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice,Culture, Heritage, Museums and Collections

 
Description The Project aimed to identify, through partner and stakeholder co-creation, a) the organisational requirements, b) technical requirements, and c) costs, for developing an AI-enabled pan-Wales approach to the curation, access and discovery of digital data of interest to arts and humanities research. A single point of access to data will demolish existing silos, while AI-enabled discovery tools will open-up data access, in line with Welsh Government legislation and policy; and kickstart a collaborative network of stakeholders and researchers. We established data priorities, data capture and interoperability requirements; and the governance and potential application of AI to enhancing knowledge discovery through stakeholder co-creation. We used the results to iteratively develop a prototype digital repository demonstrating bilingual AI-enabled discovery across a) images, b) typed and handwritten text, and c) metadata enhancement and translation.
First Year Of Impact 2022
Sector Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice,Culture, Heritage, Museums and Collections
Impact Types Cultural,Societal,Policy & public services