📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

iDAH- Consolidating the Museum Data Service as research infrastructure

Lead Research Organisation: University of Leicester
Department Name: Museum Studies

Abstract

Of the estimated 80 million object records held by UK museums, only 25 million or so are currently online. Of these, fewer still meet the requirements of the FAIR principles. November 2023 will see the beta launch of the Museum Data Service (MDS) that will, uniquely, address both problems, providing the raw material for anyone wanting to work at scale with collections data from UK museums.
Within five years, the MDS aims to bring together online, in one place, the object records of at least half the country's c.1.7K accredited museums, and almost all of them within a decade. Between them, the three MDS partners (Art UK, Collections Trust, and the University of Leicester) have the connections and community trust needed to achieve this goal and truly unite UK museums large and small online.
Moreover, the MDS (and the future Art UK 2.0) will ensure for researchers that this data becomes: Findable (ingesting tens of millions of object records across the UK museum sector, and transforming these into web-ready datasets); Accessible (for the first time providing the museum sector with a trusted 'back-up of last resort'); Interoperable (for the first time enabling cross-searching across the UK's digital cultural record); and Re-usable (for the first time a unique persistent identifier being assigned to every object - essential for reference and long-term research).
The MDS collaboration has start-up funding (via Art UK) from Bloomberg Philanthropies to the end of May 2024. The Bloomberg grant is building the core technical infrastructure and providing staff time from Art UK and CT to manage the project, and ingest collections data from the first 100 institutions. By the end of the start-up phase, the founding organisations will have jointly established a new non-profit company capable of taking long-term responsibility for the MDS.
The core MDS provides the raw material for researchers wanting to work at scale with collection data. WPs 1-6 of this iDAH proposal focus on consolidating the start-up service: more than doubling the number of museum datasets available; ensuring the needs of researchers are met; and refining the low-cost operational model key to long-term sustainability. WPs 7-10 focus on the greater, and costlier, challenge of transforming raw MDS data into an enhanced resource offering image-rich records and compelling related content.
In particular, building on the success of Art UK's present platform, the planning and design of Art UK 2.0 in WP 9 will show the potential for powerfully combining HE-designed tools and research outputs into a compelling audience-facing offering, of great value to members of the public and researchers alike, and one that underpins and demonstrates research impact.
The present proposal seeks AHRC support (via UoL) initially for the consolidation phase of the MDS to March 2025, taking it from start-up project through the first year of operation and outlining a roadmap towards steady-state and, ultimately, long-term sustainability for the MDS. The proposal underpins the success of the first major use case of the (raw) art data transformation service; enables Art UK to convert this service into a more generic tool allowing others to transform and enhance raw data from MDS for their own uses; whilst also laying the foundations for Art UK 2.0 and a framework for measuring ROI. While there is a timing overlap between the Bloomberg start-up project and the proposed AHRC-funded consolidation phase, there is no duplication or double-counting of funded activity; Appendix 1 delineates the Bloomberg investment and value added by the AHRC's funding.

Publications

10 25 50
 
Description The project 'iDAH- Consolidating the Museum Data Service as research infrastructure' is still a live and on-going project - running with its current funding settlement to 31 August 2025. At which point the full impact of the Service can be reported.
Exploitation Route MDS meets specific needs identified by researchers who want to work with museum data. Practically, it does so by: bringing together all objects records from all UK museums; mapping diverse data structures to a usable standard; establishing persistent identifiers for all records; enabling re-use of data through APIs; and connecting non-public data.

1. Bringing together all the object records from all UK museums

Our target is half of Accredited museums on board within five years, and all within ten. MDS went live in September 2024 and currently has more than 3 million object records, from 29 institutions (representing 71 Accredited museums including branches of multi-site services), at museumdata.uk. Another two million records have been ingested and are being processed; a further 150 organisations (6 million records) have already expressed interest.
We are also unlocking the enormous research potential of the collection summaries that all Accredited museums are required to write (but mostly do not publish), bringing them together as a single, searchable dataset. To date our website has summaries covering nearly 500 Accredited museums and we continue to gather more. To plug gaps in the meantime, we have reviewed and posted nearly 500 more legacy summaries from the Cornucopia initiative (Turner, 2004).

2. Mapping diverse data structures to a usable standard

MDS has no prescribed metadata schema for incoming data. This is partly to lower a known barrier to entry, but also avoids a key deficiency of 'traditional' cultural heritage aggregators that standardise diverse source data, making assumptions that might suit one end purpose, but not others. In MDS, the raw source data remains available to those who need it that way.
For those who want to search across datasets, we have taken the pragmatic decision to map the diverse field names of our source records to the 500 or so 'units of information' described in Spectrum, the UK museum collections management standard (Collections Trust, 2017).
Our Spectrum mappings are intended to be a stepping stone to other metadata schemas. There is a mapping from Spectrum to the widely-used LIDO schema (McKenna, 2018a), while Art UK is already using a similar mapping to bring MDS records onto its platform. As further mappings are completed in collaboration with data users (e.g. to Darwin Core, the Linked Art Data Model, and the Europeana Data Model) we will publish these.

3. Persistent identifiers

MDS mints a PID that meets the FAIR data requirement for each object record held. PIDs for those records published via the public website follow the format shown in this example:
https://museumdata.uk/objects/926fbbc9-7f7d-393d-8b2a-f947800d5c2e

4. Enabling re-use of data through APIs

The MDS website offers researchers and other users two ways to work with collections data. If an object search produces fewer than 1,000 results, key fields from the relevant records can be downloaded as a convenient CSV reference file. Each full record (which might be anything from a handful of fields to several hundred) can be consulted online via its PID.
For more records, and all the publicly available fields of those records, an API token can be requested via a simple form and used as documented on Swagger (MDS 2024a). As with the CSV downloads, the search interface is used to define the dataset returned by the token.

5. Connecting non-public data

The data viewable via the MDS website is just the tip of the iceberg. Already, many of the participating museums have taken advantage of an innovative tool (scoped with help of the Open Data Institute) that allows each individual field of an ingested dataset to be tagged as 'public' or locked down to one of three access permission levels: 'restricted non-confidential', 'restricted confidential' or 'administrator'.
Museums benefit because they can upload non-public data to share securely with their own staff and volunteers, mitigating internal access problems (Gosling et al., 2022, p. 58) or as a backup. Researchers benefit because museums can easily give them access to information not generally available. This is currently dealt with case-by-case, but museums will soon have the option to give researchers blanket access to 'restricted non-confidential' data.

6. How MDS supports research

MDS supports three main kinds of research.

i. There is a long tradition of object-based research in many disciplines, but its scope has often been limited by the practical challenges of tracking down potentially relevant material, particularly from the hundreds of smaller museums without online collections; 'I started with the big organisations that I already knew' (Bailey-Ross et al., 2024, p. 7). As we work towards gathering all object records from all UK museums, MDS will become increasingly useful for these researchers - transformative for the research questions that can be posed, for the efficiencies made in the field, and for the opportunities to reach and discover new evidence.

ii. Beyond the specific record and object level, MDS can also transform our ability to survey museum collections as a whole, to notice trends in collecting and patterns documentation practice, as well as the situated language of curatorship. At a time when museums and the academy are engaging with vital questions of decolonisation (particular in the context of the museum), researchers will now be able to look across, understand, and critique the entirety of the UK's cultural record in its accredited museums, for the first time.

iii. The mass of data at the end of our API also offers unprecedented opportunities as well as research challenges to information science. Cohesive and structured, and yet diverse and complex, MDS also stands (at this moment of AI revolution) as a dataset with unique characteristics to drive the next generation of data science and digital humanities research. Early users of our data for AI research will be the RAIUK project Participatory Harm Auditing Workbenches and Methodologies, which will feed our data into a use case, audited for potential harms to inform a certification framework for such AI use (PHAWM, 2024).

7. MDS supports innovation

When assessing the recent and current research projects in our space, we see many prototypes and demonstrators with the potential to become useful tools in a living ecosystem clustered around MDS. The tools piloted by the Heritage Connector project are just one example. However, as the report's literature review ruefully notes: It is rare for promising experimental projects to move beyond the prototype stage (Winters et al., 2022, p. 9). We offer MDS as an infrastructure through which the prototypes developed by fixed-term research projects can be of wider and longer-lasting use than if they had simply been parked in GitHub. This strategy will provide enhancements to our data and our service we could not afford to develop ourselves, while increasing AHRC's return on investment. Our Innovation Co-I will be involved in the initial planning of such projects, but we also want to explore how MDS can give imminent and legacy outputs an additional lease of life.

8. MDS and the wider digital research infrastructure ecosystem

MDS is an engaged member of iDAH's emerging community of practice, and seeks to be of use to related research infrastructures. As well as HSDS (see Section 3.5, above) a further collaboration is already planned with the Distributed System of Scientific Collections UK (DiSSCo UK): 'DiSSCo UK will collaborate with MDS on user-driven and FAIR data approaches including best practices in institutional-level collections records and linking or sharing data where relevant' (Hardy, 2024).

9. Contributing to the AHRC's draft digital and data strategy

AHRC's digital and data strategy is being developed in the wider context of UKRI's strategic ambition of creating 'a step-change in the next generation of infrastructure capability' with respect to, '[u]sing new technologies to enhance access to our unique museum collections, to harness our heritage for future generations' (UKRI, 2019, p. 3). Whatever else that vision might entail, anyone wanting to use data from more than one collection - for any reason - needs: (1) the relevant records to be brought together in the first place; and (2) mapping to navigate the varying data structures and cataloguing practice of different museums. MDS does both, as well as contributing to the following specific points of the draft AHRC strategy.

10. Discovery and linking of data across services

Representatives from TaNC's Discovery projects recently noted that, '[e]stablishing and maintaining the necessary infrastructure to store, analyse and share data is resource intensive, and a lack of centralised infrastructure leads to wasted resources and unnecessary environmental impacts (Hawkins & Sichani, 2024, p. 1). Their recommendations include 'invest in sustainable data infrastructure', noting how '[c]entralised data infrastructure solutions will ensure sustainability, security, maintenance and accessibility of data and data-related outputs beyond a project's end date and thus impact positively on the wider research and cultural heritage communities' (ibid.). For museum data, at least, MDS will ease many of the infrastructure-related challenges that the TaNC projects had to address on an ad hoc basis. We also share AHRC's ambition to develop common data standard frameworks. There is already a mapping between Spectrum and the archival standard ISAD(G) (McKenna, 2018b)

11. Trusted Research Environments

To meet the need for researchers to access data that museums do not want to make public, we are currently working on a second, restricted API. Next year, museums will be offered the option to allow researchers access to their 'restricted non-confidential' level of data.

12. Long-tail of un-curated content

The fourth level of our data hierarchy is new content linked to level 3 object records. Building on the approach piloted in the TaNC project Making it FAIR, (Cooper et al., 2022) CT has developed a simple tool, and schema, to bring enriched content such as exhibition text into MDS as FAIR data. In its museum support role, CT will encourage take-up of this approach.
Sectors Education

Culture

Heritage

Museums and Collections

URL https://museumdata.uk/
 
Description 1) By 13 March 2025, 5,709,102 objects had been ingested into the MDS platform. 2) An API has been launched, providing tokens for search results or all records. Documentation at https://mds-data-1.ciim.k-int.com/api/swagger-ui/index.html 3) A video guide to joining MDS is available (https://museumdata.uk/sharing-data/how-sharing-data-works/ and https://youtu.be/Gv06CA8TB_4?si=E3y7tjFLALync1E6) 2) The mass of data at the end of the MDS API offers unprecedented opportunities as well as research challenges to information science. Cohesive and structured, and yet diverse and complex, MDS also stands (at this moment of AI revolution) as a dataset with unique characteristics to drive the next generation of data science and digital humanities research. Early users of MDS data for AI research will be the RAIUK project Participatory Harm Auditing Workbenches and Methodologies, which will feed our data into a use case, audited for potential harms to inform a certification framework for such AI use. https://phawm.org/ 4) The recent Total Economic Value case study by the DCMS of MDS Partner Art UK, showed an aggregate annual value of over £71m based on just 40% of its audience (the UK audience). This compares to just £1.5m of annual costs. MDS is supporting this economic contribution by allowing Art UK to sharply grow the content it shares with its audience. In December MDS allowed Art UK to increase the number of records on Art UK by over 20%. https://assets.publishing.service.gov.uk/media/600b02c78fa8f5655299d204/GOV.UK_-_Framework_Accessible_v2.pdf
First Year Of Impact 2024
Sector Education,Culture, Heritage, Museums and Collections
Impact Types Cultural

Economic

Policy & public services

 
Description MDS Hackathon 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Our aim was to bring together researchers, developers, and museum practitioners to explore the innovative use of, and potential for, the new Museum Data Service.

The Museum Data Service (MDS) is a transformative new digital infrastructure that aims to connect all the object records of all UK museums, providing the raw material for researchers, curators and anyone else wanting to work with this wealth of knowledge.

Across the two-days our objective was to to imagine (and where possible, rapidly prototype) a set of demonstrators for using the unprecedented breadth and depth of collections data made available by the ever-growing MDS.

This event is the first of an on-going series of 'design residentials'. These will help us to discover the use cases and potential innovations that can be built in, around, and alongside the MDS.
Year(s) Of Engagement Activity 2025