Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMatter)
Lead Research Organisation:
University of Dundee
Department Name: UNLISTED
Abstract
Trusted Research Environments (TREs) provide a secure location for researchers to analyse data for projects in the public interest e.g. providing information to SAGE to fight the COVID-19 pandemic. TRE staff check outputs to prevent disclosure of individuals’ confidential data.
TREs have historically supported only classical statistical data analysis. There is an increasing need to also facilitate the training of Artificial Intelligence (AI) models. AI has many valuable applications e.g., spotting human errors, streamlining processes, helping with repetitive tasks and supporting clinical decision making. The trained models then need to be exported from TREs for use. The size and complexity of AI models presents significant challenges for the disclosure-checking process. Models may be susceptible to external hacking: complicated methods to reverse engineer the learning process to find out about the data used for training, with more potential to lead to re-identification than conventional statistical methods.
With input from public representatives, GRAIMatter will assess a range of tools and methods to support TREs to assess output from AI methods for potentially identifiable information, investigate the legal and ethical implications and controls, and produce a set of guidelines and recommendations to support all TREs with export controls of AI algorithms.
TREs have historically supported only classical statistical data analysis. There is an increasing need to also facilitate the training of Artificial Intelligence (AI) models. AI has many valuable applications e.g., spotting human errors, streamlining processes, helping with repetitive tasks and supporting clinical decision making. The trained models then need to be exported from TREs for use. The size and complexity of AI models presents significant challenges for the disclosure-checking process. Models may be susceptible to external hacking: complicated methods to reverse engineer the learning process to find out about the data used for training, with more potential to lead to re-identification than conventional statistical methods.
With input from public representatives, GRAIMatter will assess a range of tools and methods to support TREs to assess output from AI methods for potentially identifiable information, investigate the legal and ethical implications and controls, and produce a set of guidelines and recommendations to support all TREs with export controls of AI algorithms.
Technical Summary
TREs are widely, and increasingly being used to support statistical analysis of sensitive data across a range of sectors (e.g., education, police, tax and health) as they enable secure and transparent research whilst protecting data confidentiality.
There is increasing desire from academia and industry to train AI models in TREs. The field of AI is developing quickly with applications including spotting human errors, streamlining processes, task automation and decision support. These more complex AI models require more information to describe and reproduce, increasing the possibility that sensitive information regarding secure data can be inferred from such descriptions. TREs do not have mature processes and controls against these risks. This is a complex topic, and it is unreasonable to expect all researchers to be aware of all risks or that TRE researchers have addressed these risks in AI-specific training.
We aim to address this problem by developing a set of usable recommendations for TREs to use to guard against the additional risks when disclosing trained AI models from TREs. We will draw upon our internationally recognised expertise in TREs, AI, data governance, disclosure control, data security and confidentiality, law and ethics.
WP1: Quantitative Assessment of Risk: a detailed empirical study to evaluate vulnerabilities of a selection of machine learning models. We will explore different models, hyper-parameter settings and training algorithms over common data types.
WP2: Controls and Evaluation of Tools: evaluation of effectiveness of a range of tools for addressing vulnerabilities identified in WP1 technically (do they accurately quantify disclosure risks) and organisationally (what is their impact on TRE output checking, and in assisting researchers to produce checkable ‘safe’ outputs).
WP3: Legal and Ethical Implications: a legal and ethical analysis of information from WP1/2 to develop a framework for regulation of AI models developed in a protected environment, and identification of aspects of existing legal and regulatory frameworks requiring reform to facilitate the use and export of AI models from protected environments.
WP4: PPIE: 4 workshops run by lay Co-Is seeking input from public/patients on our approach. We will produce lay summaries of project outputs and support and train our researcher team onhow best to work with the public.
WP5: Green Paper: all WPs will collaborate on drafting a green paper and seek input from the wider community though a consultancy period. Our international collaborator will provide a perspective external to the UK.
There is increasing desire from academia and industry to train AI models in TREs. The field of AI is developing quickly with applications including spotting human errors, streamlining processes, task automation and decision support. These more complex AI models require more information to describe and reproduce, increasing the possibility that sensitive information regarding secure data can be inferred from such descriptions. TREs do not have mature processes and controls against these risks. This is a complex topic, and it is unreasonable to expect all researchers to be aware of all risks or that TRE researchers have addressed these risks in AI-specific training.
We aim to address this problem by developing a set of usable recommendations for TREs to use to guard against the additional risks when disclosing trained AI models from TREs. We will draw upon our internationally recognised expertise in TREs, AI, data governance, disclosure control, data security and confidentiality, law and ethics.
WP1: Quantitative Assessment of Risk: a detailed empirical study to evaluate vulnerabilities of a selection of machine learning models. We will explore different models, hyper-parameter settings and training algorithms over common data types.
WP2: Controls and Evaluation of Tools: evaluation of effectiveness of a range of tools for addressing vulnerabilities identified in WP1 technically (do they accurately quantify disclosure risks) and organisationally (what is their impact on TRE output checking, and in assisting researchers to produce checkable ‘safe’ outputs).
WP3: Legal and Ethical Implications: a legal and ethical analysis of information from WP1/2 to develop a framework for regulation of AI models developed in a protected environment, and identification of aspects of existing legal and regulatory frameworks requiring reform to facilitate the use and export of AI models from protected environments.
WP4: PPIE: 4 workshops run by lay Co-Is seeking input from public/patients on our approach. We will produce lay summaries of project outputs and support and train our researcher team onhow best to work with the public.
WP5: Green Paper: all WPs will collaborate on drafting a green paper and seek input from the wider community though a consultancy period. Our international collaborator will provide a perspective external to the UK.
Organisations
Publications
Caldwell J
(2022)
Scottish Medical Imaging Service - Technical and Governance controls.
in International Journal of Population Data Science
Gao C
(2022)
A National Network of Safe Havens: Scottish Perspective.
in Journal of medical Internet research
Kavianpour S
(2022)
Next-Generation Capabilities in Trusted Research Environments: Interview Study.
in Journal of medical Internet research
Kerasidou C
(2023)
Machine learning models, trusted research environments and UK health data: ensuring a safe and beneficial future for AI development in healthcare
in Journal of Medical Ethics
Mansouri-Benssassi E
(2023)
Disclosure control of machine learning models from trusted research environments (TRE): New challenges and opportunities.
in Heliyon
Ritchie F
(2023)
Machine learning models in trusted research environments -- understanding operational risks
in International Journal of Population Data Science
Description | Application of AI SDC in Scotland |
Geographic Reach | National |
Policy Influence Type | Contribution to new or improved professional practice |
Impact | Information governance and data teams managing access to patient data are now better informed regarding the potential additional risks posed by AI/ML. |
Description | Industry access to public sector data: Review of current operational practice |
Geographic Reach | National |
Policy Influence Type | Citation in other policy documents |
URL | https://www.researchdata.scot/news-and-insights/new-report-examines-safeguards-for-businesses-to-acc... |
Description | Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMatter) |
Amount | £315,488 (GBP) |
Funding ID | MC_PC_21033 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2022 |
End | 08/2022 |
Description | SATRE - Standardised Architecture for Trusted Research Environments |
Amount | £614,112 (GBP) |
Organisation | United Kingdom Research and Innovation |
Sector | Public |
Country | United Kingdom |
Start | 01/2023 |
End | 10/2023 |
Description | Semi-Automated Checking of Research Outputs (SACRO) |
Amount | £637,821 (GBP) |
Organisation | United Kingdom Research and Innovation |
Sector | Public |
Country | United Kingdom |
Start | 01/2023 |
End | 10/2023 |
Description | TRE-FX: Delivering a federated network of TREs to enable safe analytics |
Amount | £562,457 (GBP) |
Organisation | United Kingdom Research and Innovation |
Sector | Public |
Country | United Kingdom |
Start | 01/2023 |
End | 10/2023 |
Title | Collection of tools and resources for managing the statistical disclosure control of trained machine learning models |
Description | Tools for the Automatic Checking of Research Outputs |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | Software to support TREs to check for disclosure of trained ML models |
URL | https://github.com/ai-sdc |
Description | Building a legacy for UK health data research infrastructure (speaker) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | Presentation on 'Alleviate: the Advanced Pain Discovery Platform (APDP) Data Hub' for the DIH Programme Showcase Event: Building a legacy for UK health data research infrastructure. |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.youtube.com/watch?v=DLRU_35dfYQ&list=PLBI5k9SgYrItfzjZ17c1b20GUp6V2wDRH&index=15 |
Description | Cambridge Spark Lecture Series |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | Presentation: Overcoming the Challenges of Providing Access to Population Scale, Routinely Collected Health and Imaging Data for AI Development whilst Protecting Patient Confidentiality. |
Year(s) Of Engagement Activity | 2022 |
Description | DARE UK Sprint Exemplar Event |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | Event focussed on DARE UK's Sprint Exemplar Projects, in this case: GRAIMATTER: Guidelines and Resources for Artificial Intelligence Model Access from Trusted Research Environments |
Year(s) Of Engagement Activity | 2022 |
Description | ELSI Webinar on AI for Health |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presented lecture on the ethical and privacy concerns around the use of AI/ML with health data within the context of a European project - HT-Advance. Initiated discussion around the project needs for data privacy and also better awareness of the issue for legal experts at INSERM, France. |
Year(s) Of Engagement Activity | 2023 |
Description | GRAIMATTER Recommendations Workshop (organiser) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | GRAIMATTER Recommendations Workshop. |
Year(s) Of Engagement Activity | 2022,2023 |
Description | HDR UK Multi-omics Cohorts Consortium NIP Insight Sharing Day |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | Invited speaker to HDR UK Multi-omics Cohorts Consortium National Implementation Project Insight Sharing Day. Presentation: Experiences of developing a TRE to support multi-omic data within the AWS cloud. |
Year(s) Of Engagement Activity | 2023 |
Description | HDR UK Technology Ecosystem Conference/Workshop (organiser and speaker) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | The purpose of the meeting was to kick off the Technology Ecosystem work as part of HDR UK 23-28 strategy, bringing together various overlapping initiatives to share knowledge and plan for how we will work together going forward. |
Year(s) Of Engagement Activity | 2023 |
Description | HPC-AI Conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was an invited speaker at the 5th Annual HPC-AI Advisory Council UK Conference. Presentation: TREs at Scale. |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.hpcwire.com/off-the-wire/5th-annual-hpc-ai-advisory-council-uk-conference-set-for-octobe... |
Description | Invited speaker: HDR Technology Ecosystem and the Gateway: UKRI Data Infrastructure Club Show and Tell |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | Emily Jefferson was an invited speaker, leading a presentation on the HDR UK Technology Ecosystem and the Gateway at the UKRI Data Infrastructure Club Show and Tell: 31st Jan 2023. |
Year(s) Of Engagement Activity | 2023 |
Description | Invited speaker: HDR Technology Ecosystem. UK DRI Informatics Scoping Event - London. 8th and 9th March 2023 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Emily Jefferson, CTO of HDR UK was an invited speaker to lead a presentation on the HDR Technology Ecosystem at the UK DRI Informatics Scoping Event - London. 8th and 9th March 2023 |
Year(s) Of Engagement Activity | 2023 |
Description | Invited speaker: Technology Ecosystem - Launch. Technology Ecosystem Conference/Workshop. Birmingham. 6th Feb 2023 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | Technology Ecosystem Conference (6th February 2023) brought together different technology groups from across the community to strengthen relationships and generate ideas to deliver trustworthy infrastructure and services across the health data research ecosystem |
Year(s) Of Engagement Activity | 2023 |
Description | Invited speaker: The power of DRI: A health data perspective. UKRI Digital Research Infrastructure (DRI) Congress. |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Emily Jefferson was an invited speaker to present on: The power of DRI: A health data perspective at the UKRI Digital Research Infrastructure (DRI) Congress. 6th and 7th March 2023. |
Year(s) Of Engagement Activity | 2023 |
Description | Japan Association for Medical Informatics Conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was a keynote speaker at the 43rd Joint Conference on Medical Informatics. Presentation: The UK's progress towards enabling secure, researcher access to sensitive health data at a UK population scale. |
Year(s) Of Engagement Activity | 2023 |
URL | https://confit-atlas-jp.translate.goog/guide/event/jcmi2023/session/3A11-13/detail?_x_tr_sl=ja&_x_tr... |
Description | Keynote speaker: Towards Federated Analytics for Population Data. International Data Science Conference - Tokyo, Japan, 22/05/23 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Emily Jefferson was invited as a keynote speaker to present 'Towards Federated Analytics for Population Data. International Data Science Conference - Tokyo, Japan' on 22/05/23 |
Year(s) Of Engagement Activity | 2023 |
Description | Keynote speaker: Towards Federated Analytics for Population Data. Precision Medicine & Real-World Data Conference - Singapore, 23/05/23 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | Key note speaker, presented on experiences enabling a new UK infrastructure for finding and accessing population-wide data for research and public health analysis |
Year(s) Of Engagement Activity | 2023 |
URL | https://info.bcplatforms.com/precision-medicine-and-rwd-conference-singapore-2023 |
Description | Overcoming the Challenges of Providing Access to Population Scale, Routinely Collected Health and Imaging Data for AI Development whilst Protecting Patient Confidentiality |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Presentation to the AI summit in London covering the work of PICTURES, CO-CONNECT and GRAIMATTER |
Year(s) Of Engagement Activity | 2022 |
Description | PPIE Workshops: GRAIMATTER |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Patients, carers and/or patient groups |
Results and Impact | PPIE 5 Workshops: GRAIMATTER - What is a TRE? What is ML? What are the GRAIMATTER project aims? |
Year(s) Of Engagement Activity | 2022 |
Description | Pistoia Alliance Christmas Lecture |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was a keynote speaker at Pistoia Alliance UK Life Science Informatics Forum. Presentation: Guidelines and Resources for Artificial Intelligence Model Access from Trusted Research Environments (GRAIMATTER). |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.pistoiaalliance.org/events/ |
Description | Research Software Engineers (RSE) Conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was an invited speaker to the Seventh Annual Research Software Engineering Conference. Presentation: Can convening a Technology Ecosystem help TREs to work together? |
Year(s) Of Engagement Activity | 2023 |
URL | https://rsecon23.society-rse.org/ |
Description | The AI Summit London (panel) |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Expert panel speaker: The Landscape of AI Adoption in Medical Imaging - Challenges & Opportunities. |
Year(s) Of Engagement Activity | 2022 |
URL | https://london.theaisummit.com/ |
Description | UK DRI Informatics Scoping Event (speaker) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | Invited speaker to UK Dementia Research Institute Informatics Scoping Event. Presentation on HDR UK's Technology Ecosystem Workstream. |
Year(s) Of Engagement Activity | 2023 |
Description | UK TRE Community Meeting |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was the keynote speaker at the UK TRE Community Meeting that was part of the RSE Conference. Presentation: Call to action! |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.eventbrite.com/e/uk-tre-community-september-meeting-tickets-676066472017 |
Description | UKRI DRI Community Congress (speaker) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | The UKRI Digital Research Infrastructure (DRI) Community Congress brought together stakeholders of the UKRI DRI strategy. Presentation: The power of DRI: A health data perspective. |
Year(s) Of Engagement Activity | 2023 |
URL | https://web.cvent.com/event/fc0032b7-0b22-4dd0-8c4c-38f3155df75f/summary |