Semi-Automated Checking of Research Outputs

Lead Research Organisation: University of the West of England
Department Name: UNLISTED

Abstract

Trusted Research Environments (TREs) play a vital role in enabling researchers to analyse confidential data such as health records then report findings. The Five-Safes
Framework is used to ensure data confidentiality and includes Safe Outputs. Outputs are typically checked by two expert staff before release, which is a significant expense for TRE
operators, and can cause a bottleneck for researchers.
Meanwhile, the parallel development of TREs and understanding of disclosure-risk, has
created a need to consolidate theory and practice to minimise inconsistent behaviour
between TREs. Addressing both these issues, this project seeks to reduce the operating costs of TREs, and the time taken to release research results. It will:
•Produce a consolidated framework with a rigorous statistical basis that provides
guidance for TREs to agree consistent, standard processes to assist in Quality
Assurance.
•Design and implement a semi-automated system for checks on common research outputs, with increasing levels of support for other types such as AI.
•Work with a range of different types of TRE in different sectors (health, social data)
and organisations (academia, government, private sector) to ensure wide applicability.
•Work with public and patients to explore what is needed for public trust that any
automation is acting as “an extra pair of eyes”: supporting not supplanting TRE staff,
helping them to make easy decisions more rapidly and therefore focus on more complex or nuanced cases.

Technical Summary

Trusted Research Environments (TREs) cannot function without appropriate output
statistical disclosure control, and DARE has rightly recognised that “There are already
significant issues with staffing resources to support statistical disclosure control (safe outputs). This staffing issue is acting as a barrier to scaling up the use of TREs for
research” [3, p59]. This proposal builds on existing work in this area, the vast majority by
the proposers [1, 2, 6, 7, 9 - 14], to investigate and advance a framework for automating components of TREs’ disclosure control process to support a consistent, efficient, and trustworthy approach that lets them “focus … on the areas of most significant risk” [3].

To achieve consistency, within and between TREs, we will consider processes as well as
algorithms - recognising that TREs differ in risk appetites, but these need to be expressed in a consistent fashion.
• WP1 will bring together the latest theoretical advances to provide a coherent intellectual framework and resources including practical guidelines for disclosure control in different technical and procedural environments

To achieve efficiency and transparency we will create resources that help researchers
make fewer disclosive requests and TREs spot these faster. The design framework and tools will:
• Support researchers to use the major analytical languages (R, Python and Stata), with minimal changes, by exploiting the ‘wrapper’ approach we have successfully
trialled elsewhere [7, 10] (WP2)
• Support TREs with different operating models and output checking workflows,
through a process of co-design to maximise useability [6] (WP3)
• Automate checking of most common statistics, using best-practice principles-based modelling [15],
• Build on the GRAIMatter project [9] to create practical guidelines and prototype tools
for triaging AI models for clearance (WP5)
To achieve trustworthiness, WP4 builds on our existing PPIE arrangements to explore public attitudes to how disclosure control is currently managed and how conceptual and technical developments, especially around automated decision making, need to be framed to ensure and maintain trust.

To achieve impact, WP6 will build on our extensive UK and global networks of TRE practitioners, to conduct a series of workshops and ‘hands-on’ evaluations to ensure the design frameworks support buy-in from a wide range of prospective users across health and social sciences, and from the public and private sectors.
 
Description Application of AI SDC in Scotland
Geographic Reach National 
Policy Influence Type Contribution to new or improved professional practice
Impact Information governance and data teams managing access to patient data are now better informed regarding the potential additional risks posed by AI/ML.
 
Description Consensus statement on public attitudes to how TREs should implement semi-automated output checking safely and securely in the future
Geographic Reach Multiple continents/international 
Policy Influence Type Contribution to new or improved professional practice
Impact The consensus statement outlines a number of principles required to maintain public confidence in the safety of their confidential data, ranging from processes to training requirements. Recognition of and adherence to these principles by organisations will affect public willingness to the use of their data for research purposes for the public good.
 
Description First comprehensive conceptual framework for risk assessment and statistical outcome checking
Geographic Reach Multiple continents/international 
Policy Influence Type Contribution to new or improved professional practice
Impact This report has been widely welcomed by practitioners in the field for bringing together standard processes. and guidance into a clear, easily accessible framework this enabling more consistent behaviour both by individual output checkers, and across the 'Secure Data Landscape' as a whole. The impact is a greater understanding of the risks (or not) associated with different types of outputs, and hence a more consistent application of disclosure control principles both within and between TREs and other organisations providing access to confidential data. At a national level this includes important new services being established such as the NHS Secure Data Environments, CPRD, and the Government's Integrated Data Services (IDS).
 
Description Formation of a community of interest for mutual support in disclosure risk assessment, especially of AI
Geographic Reach Multiple continents/international 
Policy Influence Type Contribution to new or improved professional practice
 
Description Improvements in Disclosure Control of Research Outputs from Confidential Data
Geographic Reach National 
Policy Influence Type Contribution to new or improved professional practice
Impact Streamlining, standardising and better informing the workflows around disclosure risk assessment leads directly to improved flow of research outputs, and consequently (depending on the nature the research being conducted) to improved public outcomes in health or other domains. Moreover, the approach taken by SACRO of explicitly including the researchers in the process, so that disclosure control is something that is done 'with them' not 'to them' leads to a richer understanding of the shared ethical and legal responsibilities. This is already being seen in a number projects where the SACRO toilets have lead to the successful release of AI models trained on private data.
 
Description DARE UK Community Groups
Amount £37,374 (GBP)
Funding ID HDRUK2023.0444 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 11/2023 
End 03/2024
 
Title ACRO: Automatic Checking of Research Outputs 
Description Statistical agencies and other custodians of secure facilities such as Trusted Research Environments (TREs) routinely require the checking of research outputs for disclosure risk. This can be a time-consuming and costly task, requiring skilled staff. ACRO (Automatic Checking of Research Outputs) is an open source tool for automating the statistical disclosure control (SDC) of research outputs. ACRO assists researchers and output checkers by distinguishing between research output that is safe to publish, output that requires further analysis, and output that cannot be published because of substantial disclosure risk. It does this by providing a light-weight 'skin' that sits over well-known analysis tools, in a variety of languages researchers might use. This adds functionality to: identify potentially disclosive outputs against a range of commonly used disclosure tests; suppress outputs where required; report reasons for suppression; produce simple summary documents TRE staff can use to streamline their workflow. 
Type Of Material Data analysis technique 
Year Produced 2023 
Provided To Others? Yes  
Impact The impact of the tool suite of the acro library for researchers and the sacro-viewer for output checkers is that it greatly reduces the bottleneck of output disclosure control- the final stage before research findings can be released from Trusted Research Environments. The design ethos of transparency engages the researcher more, so that disclosure control becomes something that is done with the researchers, not 'to' them. These tools are now deployed (or in the process of being deployed) at a range of organisations including several Scottish Safe Havens, and Eurostat. Following outreach activities we have been told that many other UK and international TREs intend to deploy it including : Office for National statistics / Integrated Data Services, SAIL Databank, UKLLC, CPRD, many of the NHS Secure Data Environments currently being set up, NIHR-BioResource, Genomics England. 
URL https://github.com/AI-SDC/ACRO
 
Title AISDC: A collection of tools and resources for managing the statistical disclosure control of trained machine learning models 
Description AISDC is a resource intended to support Trusted Research Environments in making decisions about whether it is safe to allow the release of Machine Learning models trained on confidential data. It has the following components: (I) A variety of privacy attacks on machine learning models, including membership inference, attribute inference, and 'structural' attacks that relate to 'standard' disclosure control notions such as k-anonymity and class disclosure. (ii) Preprocessing modules for test datasets that can be used for further research or development. (iii) The safemodel package - an open source wrapper for common machine learning models. It is designed for use by researchers in Trusted Research Environments (TREs) where disclosure control methods must be implemented. Safemodel aims to give researchers greater confidence that their models are more compliant with disclosure control. (iv) User Stories: A flowchart and range of accompanying scripts to help TRE staff run different suites of attacks depending on how the researcher has presented their model (I.e. what details of preprocessing , train/test set splits etc.) 
Type Of Material Computer model/algorithm 
Year Produced 2023 
Provided To Others? Yes  
Impact AISDC is currently deployed at a number of Trusted Research Environments, providing new capability to allow and support researchers who wish to use Machine Learning to gain insights from confidential data. 
URL https://github.com/AI-SDC/AI-SDC
 
Title SACRO Outputs Viewer 
Description A viewer for research outputs and accompanying disclosure risk analysis produced using the ACRO tools. Open-source installers are available for windows, maces and linux. It can load the JSON metadata output by the tool, and displays the outputs for an output checker to review. The reviewer can see each file, researcher comments, and as the outcomes of any statistical analysis performed by ACRO tools. It also supports the viewing (with appropriate syntax highlighting) and decision making about 'supplementary' files, including the researcher's code It allow the output checker to approve or reject the outputs, and can generate a zipfile with approved outputs for releasing. 
Type Of Material Computer model/algorithm 
Year Produced 2023 
Provided To Others? Yes  
Impact The impact of the tool suite of the acro library for researchers and the sacro-viewer for output checkers is that it greatly reduces the bottleneck of output disclosure control- the final stage before research findings can be released from Trusted Research Environments. The design ethos of transparency engages the researcher more, so that disclosure control becomes something that is done with the researchers, not 'to' them. These tools are now deployed (or in the process of being deployed) at a range of organisations including several Scottish Safe Havens, and Eurostat. Following outreach activities we have been told that many other UK and international TREs intend to deploy it including : Office for National statistics / Integrated Data Services, SAIL Databank, UKLLC, CPRD, many of the NHS Secure Data Environments currently being set up, NIHR-BioResource, Genomics England. 
URL https://github.com/AI-SDC/SACRO-Viewer
 
Description AI Model security round-table with Information Commisioners Office 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Professors Smith and Ritchie were invited to a meeting with Information Commissioner's Office to provide input on the development of their ' AI and Data Protection Risk Assessment Toolkit' and advice for organisations on measuring and reporting risk.
Year(s) Of Engagement Activity 2023
 
Description Attend Open Safely Digital Critical Friends Meeting 26th April 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Public Involvement and Engagement meeting to get initial thoughts on the concept of semi-automated output checking
Year(s) Of Engagement Activity 2023
 
Description Attend OpenSafely Digital Critical Friends Meeting 19th July 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Get feedback on, and input to, development of Consensus Statement describing principles for organisations intending to adopt automation in their process of evaluating privacy disclosure risk.
Year(s) Of Engagement Activity 2023
 
Description Attending SCADR/RDS Public Panel Meeting 29th June 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Attended Public Panel to introduce SACRO concept, get feedback on explanation. and get input to guide development of Consensus statement
Year(s) Of Engagement Activity 2023
 
Description DARE-UK 'meet the driver projects' 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presented SACRO project as part of DARE-organised initiative. As a result ~5 new contacts made with requests for future involvement.
Year(s) Of Engagement Activity 2023
 
Description ELSI Webinar on AI for Health 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented lecture on the ethical and privacy concerns around the use of AI/ML with health data within the context of a European project - HT-Advance. Initiated discussion around the project needs for data privacy and also better awareness of the issue for legal experts at INSERM, France.
Year(s) Of Engagement Activity 2023
 
Description Festival of the Future Lunch Club - Healthcare 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact As part of the University of Dundee's Festival of the Future, SATRE team presented as part of a "Healthcare" panel discussion.
Year(s) Of Engagement Activity 2023
URL https://www.dundee.ac.uk/festival-future
 
Description HPC-AI Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was an invited speaker at the 5th Annual HPC-AI Advisory Council UK Conference. Presentation: TREs at Scale.
Year(s) Of Engagement Activity 2023
URL https://www.hpcwire.com/off-the-wire/5th-annual-hpc-ai-advisory-council-uk-conference-set-for-octobe...
 
Description Invited speaker at Hartree internal lecture series 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Presented a talk discussing TREs and ML disclosure control at the STFC Hartree Centre
Year(s) Of Engagement Activity 2023
 
Description Invited speaker: HDR Technology Ecosystem and the Gateway: UKRI Data Infrastructure Club Show and Tell 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Emily Jefferson was an invited speaker, leading a presentation on the HDR UK Technology Ecosystem and the Gateway at the UKRI Data Infrastructure Club Show and Tell: 31st Jan 2023.
Year(s) Of Engagement Activity 2023
 
Description Invited speaker: HDR Technology Ecosystem. UK DRI Informatics Scoping Event - London. 8th and 9th March 2023 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson, CTO of HDR UK was an invited speaker to lead a presentation on the HDR Technology Ecosystem at the UK DRI Informatics Scoping Event - London. 8th and 9th March 2023
Year(s) Of Engagement Activity 2023
 
Description Invited speaker: Technology Ecosystem - Launch. Technology Ecosystem Conference/Workshop. Birmingham. 6th Feb 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Technology Ecosystem Conference (6th February 2023) brought together different technology groups from across the community to strengthen relationships and generate ideas to deliver trustworthy infrastructure and services across the health data research ecosystem
Year(s) Of Engagement Activity 2023
 
Description Invited speaker: The power of DRI: A health data perspective. UKRI Digital Research Infrastructure (DRI) Congress. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson was an invited speaker to present on: The power of DRI: A health data perspective at the UKRI Digital Research Infrastructure (DRI) Congress. 6th and 7th March 2023.
Year(s) Of Engagement Activity 2023
 
Description Japan Association for Medical Informatics Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was a keynote speaker at the 43rd Joint Conference on Medical Informatics. Presentation: The UK's progress towards enabling secure, researcher access to sensitive health data at a UK population scale.
Year(s) Of Engagement Activity 2023
URL https://confit-atlas-jp.translate.goog/guide/event/jcmi2023/session/3A11-13/detail?_x_tr_sl=ja&_x_tr...
 
Description Keynote speaker: Towards Federated Analytics for Population Data. International Data Science Conference - Tokyo, Japan, 22/05/23 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson was invited as a keynote speaker to present 'Towards Federated Analytics for Population Data. International Data Science Conference - Tokyo, Japan' on 22/05/23
Year(s) Of Engagement Activity 2023
 
Description Pistoia Alliance Christmas Lecture 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was a keynote speaker at Pistoia Alliance UK Life Science Informatics Forum. Presentation: Guidelines and Resources for Artificial Intelligence Model Access from Trusted Research Environments (GRAIMATTER).
Year(s) Of Engagement Activity 2023
URL https://www.pistoiaalliance.org/events/
 
Description Presentation at HDR UK Technology Workshop - Manchester 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presented on the SATRE and SACRO project to a group of 40-50 influential leaders and stakeholders in the health technology community.
Year(s) Of Engagement Activity 2023
URL https://www.eventbrite.co.uk/e/hdr-uk-technology-ecosystem-autumn-workshop-tickets-743465674847
 
Description Presentation at HDR UK conference on Grand challenges in Health Data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentations of the SACRO findings and essential future role of semi automated disclosure control for AI models
Year(s) Of Engagement Activity 2024
URL https://www.hdruk.ac.uk/hdruk-conference-2024/
 
Description Presentation at UK TRE Community September 2023 Meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Professor Smith gave a talk entitled: "The SACRO project and how it can help TREs work together"
at the UK-TRE community hybrid meeting with over 100 participants in person and around the same online.


The theme of the meeting was how TREs can work together through the development of open standards, codebases and federation to enable sharing of technology and workflows, and encompassed a range of topics including the development of technical and governance standards for TREs, the major challenges of operating TREs today, and how the community can work together to address these and other common challenges.

The meeting was open to everyone, and was attended by a range of stakeholders including those involved in the day-to-day development and operations of TREs, those responsible for commissioning and funding TREs, and users of TREs from the health, administrative and industry sectors.
Year(s) Of Engagement Activity 2023
URL https://www.uktre.org/en/latest/events/wg_workshops/2023-09-04-september-meeting/index.html
 
Description Presentation to HDR-UK Alliance 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation to UK Health Data Research Alliance to describe project and resources, created, get feedback, and build bridges for future engagement.
Year(s) Of Engagement Activity 2023
URL https://ukhealthdata.org
 
Description Presentation to Scottish Centre for Administrative Data Research monthly research meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation of project a nd results to mix of researchers, Government and Public Health Scotland colleagues. Several contacts made by people interested in deploying tools within their TREs which are being followed up for future deployment.
Year(s) Of Engagement Activity 2023
 
Description Research Software Engineers (RSE) Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was an invited speaker to the Seventh Annual Research Software Engineering Conference. Presentation: Can convening a Technology Ecosystem help TREs to work together?
Year(s) Of Engagement Activity 2023
URL https://rsecon23.society-rse.org/
 
Description UK TRE Community Meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was the keynote speaker at the UK TRE Community Meeting that was part of the RSE Conference. Presentation: Call to action!
Year(s) Of Engagement Activity 2023
URL https://www.eventbrite.com/e/uk-tre-community-september-meeting-tickets-676066472017
 
Description Workshop at University of Aberdeen: "getting SACRO up and running with researchers" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Professor Ritchie delivered a talk and workshop to researchers at University of Aberdeen, including staff from the Grampian Safe Haven (DaSH) to encourage and inform on-boarding of researchers to use the SACRO toolkits that have been deployed within DaSH.
Year(s) Of Engagement Activity 2023