Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMatter)

Lead Research Organisation: University of Dundee
Department Name: UNLISTED

Abstract

Trusted Research Environments (TREs) provide a secure location for researchers to analyse data for projects in the public interest e.g. providing information to SAGE to fight the COVID-19 pandemic. TRE staff check outputs to prevent disclosure of individuals’ confidential data.

TREs have historically supported only classical statistical data analysis. There is an increasing need to also facilitate the training of Artificial Intelligence (AI) models. AI has many valuable applications e.g., spotting human errors, streamlining processes, helping with repetitive tasks and supporting clinical decision making. The trained models then need to be exported from TREs for use. The size and complexity of AI models presents significant challenges for the disclosure-checking process. Models may be susceptible to external hacking: complicated methods to reverse engineer the learning process to find out about the data used for training, with more potential to lead to re-identification than conventional statistical methods.

With input from public representatives, GRAIMatter will assess a range of tools and methods to support TREs to assess output from AI methods for potentially identifiable information, investigate the legal and ethical implications and controls, and produce a set of guidelines and recommendations to support all TREs with export controls of AI algorithms.

Technical Summary

TREs are widely, and increasingly being used to support statistical analysis of sensitive data across a range of sectors (e.g., education, police, tax and health) as they enable secure and transparent research whilst protecting data confidentiality.

There is increasing desire from academia and industry to train AI models in TREs. The field of AI is developing quickly with applications including spotting human errors, streamlining processes, task automation and decision support. These more complex AI models require more information to describe and reproduce, increasing the possibility that sensitive information regarding secure data can be inferred from such descriptions. TREs do not have mature processes and controls against these risks. This is a complex topic, and it is unreasonable to expect all researchers to be aware of all risks or that TRE researchers have addressed these risks in AI-specific training.

We aim to address this problem by developing a set of usable recommendations for TREs to use to guard against the additional risks when disclosing trained AI models from TREs. We will draw upon our internationally recognised expertise in TREs, AI, data governance, disclosure control, data security and confidentiality, law and ethics.

WP1: Quantitative Assessment of Risk: a detailed empirical study to evaluate vulnerabilities of a selection of machine learning models. We will explore different models, hyper-parameter settings and training algorithms over common data types.

WP2: Controls and Evaluation of Tools: evaluation of effectiveness of a range of tools for addressing vulnerabilities identified in WP1 technically (do they accurately quantify disclosure risks) and organisationally (what is their impact on TRE output checking, and in assisting researchers to produce checkable ‘safe’ outputs).

WP3: Legal and Ethical Implications: a legal and ethical analysis of information from WP1/2 to develop a framework for regulation of AI models developed in a protected environment, and identification of aspects of existing legal and regulatory frameworks requiring reform to facilitate the use and export of AI models from protected environments.

WP4: PPIE: 4 workshops run by lay Co-Is seeking input from public/patients on our approach. We will produce lay summaries of project outputs and support and train our researcher team onhow best to work with the public.

WP5: Green Paper: all WPs will collaborate on drafting a green paper and seek input from the wider community though a consultancy period. Our international collaborator will provide a perspective external to the UK.

Publications

10 25 50
publication icon
Caldwell J (2022) Scottish Medical Imaging Service - Technical and Governance controls. in International Journal of Population Data Science

publication icon
Gao C (2022) A National Network of Safe Havens: Scottish Perspective. in Journal of medical Internet research

publication icon
Kavianpour S (2022) Next-Generation Capabilities in Trusted Research Environments: Interview Study. in Journal of medical Internet research

publication icon
Ritchie F (2023) Machine learning models in trusted research environments -- understanding operational risks in International Journal of Population Data Science

 
Description Application of AI SDC in Scotland
Geographic Reach National 
Policy Influence Type Contribution to new or improved professional practice
Impact Information governance and data teams managing access to patient data are now better informed regarding the potential additional risks posed by AI/ML.
 
Description Industry access to public sector data: Review of current operational practice
Geographic Reach National 
Policy Influence Type Citation in other policy documents
URL https://www.researchdata.scot/news-and-insights/new-report-examines-safeguards-for-businesses-to-acc...
 
Description Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMatter)
Amount £315,488 (GBP)
Funding ID MC_PC_21033 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 01/2022 
End 08/2022
 
Description SATRE - Standardised Architecture for Trusted Research Environments
Amount £614,112 (GBP)
Organisation United Kingdom Research and Innovation 
Sector Public
Country United Kingdom
Start 01/2023 
End 10/2023
 
Description Semi-Automated Checking of Research Outputs (SACRO)
Amount £637,821 (GBP)
Organisation United Kingdom Research and Innovation 
Sector Public
Country United Kingdom
Start 01/2023 
End 10/2023
 
Description TRE-FX: Delivering a federated network of TREs to enable safe analytics
Amount £562,457 (GBP)
Organisation United Kingdom Research and Innovation 
Sector Public
Country United Kingdom
Start 01/2023 
End 10/2023
 
Title Collection of tools and resources for managing the statistical disclosure control of trained machine learning models 
Description Tools for the Automatic Checking of Research Outputs 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact Software to support TREs to check for disclosure of trained ML models 
URL https://github.com/ai-sdc
 
Description Building a legacy for UK health data research infrastructure (speaker) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Presentation on 'Alleviate: the Advanced Pain Discovery Platform (APDP) Data Hub' for the DIH Programme Showcase Event: Building a legacy for UK health data research infrastructure.
Year(s) Of Engagement Activity 2022
URL https://www.youtube.com/watch?v=DLRU_35dfYQ&list=PLBI5k9SgYrItfzjZ17c1b20GUp6V2wDRH&index=15
 
Description Cambridge Spark Lecture Series 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Presentation: Overcoming the Challenges of Providing Access to Population Scale, Routinely Collected Health and Imaging Data for AI Development whilst Protecting Patient Confidentiality.
Year(s) Of Engagement Activity 2022
 
Description DARE UK Sprint Exemplar Event 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Event focussed on DARE UK's Sprint Exemplar Projects, in this case: GRAIMATTER: Guidelines and Resources for Artificial Intelligence Model Access from Trusted Research Environments
Year(s) Of Engagement Activity 2022
 
Description ELSI Webinar on AI for Health 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented lecture on the ethical and privacy concerns around the use of AI/ML with health data within the context of a European project - HT-Advance. Initiated discussion around the project needs for data privacy and also better awareness of the issue for legal experts at INSERM, France.
Year(s) Of Engagement Activity 2023
 
Description GRAIMATTER Recommendations Workshop (organiser) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact GRAIMATTER Recommendations Workshop.
Year(s) Of Engagement Activity 2022,2023
 
Description HDR UK Multi-omics Cohorts Consortium NIP Insight Sharing Day 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Invited speaker to HDR UK Multi-omics Cohorts Consortium National Implementation Project Insight Sharing Day. Presentation: Experiences of developing a TRE to support multi-omic data within the AWS cloud.
Year(s) Of Engagement Activity 2023
 
Description HDR UK Technology Ecosystem Conference/Workshop (organiser and speaker) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact The purpose of the meeting was to kick off the Technology Ecosystem work as part of HDR UK 23-28 strategy, bringing together various overlapping initiatives to share knowledge and plan for how we will work together going forward.
Year(s) Of Engagement Activity 2023
 
Description HPC-AI Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was an invited speaker at the 5th Annual HPC-AI Advisory Council UK Conference. Presentation: TREs at Scale.
Year(s) Of Engagement Activity 2023
URL https://www.hpcwire.com/off-the-wire/5th-annual-hpc-ai-advisory-council-uk-conference-set-for-octobe...
 
Description Invited speaker: HDR Technology Ecosystem and the Gateway: UKRI Data Infrastructure Club Show and Tell 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Emily Jefferson was an invited speaker, leading a presentation on the HDR UK Technology Ecosystem and the Gateway at the UKRI Data Infrastructure Club Show and Tell: 31st Jan 2023.
Year(s) Of Engagement Activity 2023
 
Description Invited speaker: HDR Technology Ecosystem. UK DRI Informatics Scoping Event - London. 8th and 9th March 2023 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson, CTO of HDR UK was an invited speaker to lead a presentation on the HDR Technology Ecosystem at the UK DRI Informatics Scoping Event - London. 8th and 9th March 2023
Year(s) Of Engagement Activity 2023
 
Description Invited speaker: Technology Ecosystem - Launch. Technology Ecosystem Conference/Workshop. Birmingham. 6th Feb 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Technology Ecosystem Conference (6th February 2023) brought together different technology groups from across the community to strengthen relationships and generate ideas to deliver trustworthy infrastructure and services across the health data research ecosystem
Year(s) Of Engagement Activity 2023
 
Description Invited speaker: The power of DRI: A health data perspective. UKRI Digital Research Infrastructure (DRI) Congress. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson was an invited speaker to present on: The power of DRI: A health data perspective at the UKRI Digital Research Infrastructure (DRI) Congress. 6th and 7th March 2023.
Year(s) Of Engagement Activity 2023
 
Description Japan Association for Medical Informatics Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was a keynote speaker at the 43rd Joint Conference on Medical Informatics. Presentation: The UK's progress towards enabling secure, researcher access to sensitive health data at a UK population scale.
Year(s) Of Engagement Activity 2023
URL https://confit-atlas-jp.translate.goog/guide/event/jcmi2023/session/3A11-13/detail?_x_tr_sl=ja&_x_tr...
 
Description Keynote speaker: Towards Federated Analytics for Population Data. International Data Science Conference - Tokyo, Japan, 22/05/23 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson was invited as a keynote speaker to present 'Towards Federated Analytics for Population Data. International Data Science Conference - Tokyo, Japan' on 22/05/23
Year(s) Of Engagement Activity 2023
 
Description Keynote speaker: Towards Federated Analytics for Population Data. Precision Medicine & Real-World Data Conference - Singapore, 23/05/23 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Key note speaker, presented on experiences enabling a new UK infrastructure for finding and accessing population-wide data for research and public health analysis
Year(s) Of Engagement Activity 2023
URL https://info.bcplatforms.com/precision-medicine-and-rwd-conference-singapore-2023
 
Description Overcoming the Challenges of Providing Access to Population Scale, Routinely Collected Health and Imaging Data for AI Development whilst Protecting Patient Confidentiality 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Presentation to the AI summit in London covering the work of PICTURES, CO-CONNECT and GRAIMATTER
Year(s) Of Engagement Activity 2022
 
Description PPIE Workshops: GRAIMATTER 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Patients, carers and/or patient groups
Results and Impact PPIE 5 Workshops: GRAIMATTER - What is a TRE? What is ML? What are the GRAIMATTER project aims?
Year(s) Of Engagement Activity 2022
 
Description Pistoia Alliance Christmas Lecture 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was a keynote speaker at Pistoia Alliance UK Life Science Informatics Forum. Presentation: Guidelines and Resources for Artificial Intelligence Model Access from Trusted Research Environments (GRAIMATTER).
Year(s) Of Engagement Activity 2023
URL https://www.pistoiaalliance.org/events/
 
Description Research Software Engineers (RSE) Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was an invited speaker to the Seventh Annual Research Software Engineering Conference. Presentation: Can convening a Technology Ecosystem help TREs to work together?
Year(s) Of Engagement Activity 2023
URL https://rsecon23.society-rse.org/
 
Description The AI Summit London (panel) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Expert panel speaker: The Landscape of AI Adoption in Medical Imaging - Challenges & Opportunities.
Year(s) Of Engagement Activity 2022
URL https://london.theaisummit.com/
 
Description UK DRI Informatics Scoping Event (speaker) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Invited speaker to UK Dementia Research Institute Informatics Scoping Event. Presentation on HDR UK's Technology Ecosystem Workstream.
Year(s) Of Engagement Activity 2023
 
Description UK TRE Community Meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was the keynote speaker at the UK TRE Community Meeting that was part of the RSE Conference. Presentation: Call to action!
Year(s) Of Engagement Activity 2023
URL https://www.eventbrite.com/e/uk-tre-community-september-meeting-tickets-676066472017
 
Description UKRI DRI Community Congress (speaker) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact The UKRI Digital Research Infrastructure (DRI) Community Congress brought together stakeholders of the UKRI DRI strategy. Presentation: The power of DRI: A health data perspective.
Year(s) Of Engagement Activity 2023
URL https://web.cvent.com/event/fc0032b7-0b22-4dd0-8c4c-38f3155df75f/summary