📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Establishing a common federated infrastructure for secure API-driven multi-party federation on clinico-genomic cohorts

Lead Research Organisation: University of Cambridge
Department Name: UNLISTED

Abstract

Trusted Research Environments (TREs) are secure spaces for researchers to access and analyse sensitive data. They help prevent unauthorised access and/or re-identification of individuals from de-identified data. Many research institutions and data providers have their own TREs, but their TREs currently cannot “talk” to each other. The ability for TREs to talk is known as federation. Even where researchers are allowed to use data held in two separate TREs, analysing them together would still require their combination within a single TRE. This is challenging and costly with large datasets, like whole genome sequences, and can delay new discoveries. We propose a UK first demonstration of federation of genomic data by bridging the TREs of the NIHR Cambridge Biomedical Research Centre and Genomics England. Both contain rich, secure, governed sources of fully consented clinical-genomic data from patients.
After querying the data within the two separate TREs to find individuals with certain characteristics, a joint analysis will be run within both environments, and the results combined in a separate secure cloud environment. This means that no original data will move, only results.
New standards for federated TRE systems will be developed and shared. Learning from the project will unlock unprecedented possibilities for collaborations with clinical-genomic data across the research councils, potentially leading to new discoveries with long term public benefit.

Technical Summary

The most significant challenge for advancements in precision medicine lies in accessing and analysing large-scale distributed biomedical datasets. Due to their size and sensitive nature, biomedical datasets are primarily stored in siloed, inaccessible locations. The World Economic Forum states that 97% of all hospital data goes untouched. Critically, researchers are hampered by the inability to combine datasets at sufficient scale to maximise translational medical insights. Bringing precision medicine to life requires research that uses as much biomedical data as possible, it is reported that by increasing data 10-fold, there is a corresponding 100x increase in findings. An effective way to achieve secure, widespread data interoperability and access is through standing up Federated Trusted Research Environments (TREs).

Members of this consortium have implemented a new, live TRE, capable of federation, for the national genomics endeavour, Genomics England (GEL). With its TRE linking genomic, NHS clinical data and various national registries, GEL has demonstrated that diagnostic yield can be increased 4-5 fold for rare disease patients, and clinically actionable variants can be identified in 65% of cancer patients, resulting in transformational healthcare benefits across the NHS. The GEL TRE is thus an extraordinary resource, holding enormous potential going forward. As such, it would be valuable to the scientific and clinical communities across HEIs in the UK to improve connectivity with GEL, to reap the benefits of this national resource.
UKRI has recently propelled the notion of enhancing data connectivity. To help UKRI achieve a common federated data infrastructure, we propose a technology demonstration to establish the UK’s first live multi-party federation between TREs of a leading Higher Education Institution (HEI) of medical research, NIHR Cambridge BRC (UCAM-BRC) and a public-sector clinical research endeavour, GEL. This initial prototype will serve as a blueprint defining technical methodologies and information governance matters, ultimately enabling the adoption of federated TREs widely.
As outputs, this project will contribute open-source Application Programming Interface’s (APIs) informing TRE communication, aligned to data standardisation activities of the Global Alliance for Genomics and Health (GA4GH) and others. It will seek solutions to data governance and security matters and develop a Federated Airlock process permissive of distributed computation over disparate datasets. We will present a use-case demonstrating the value of joint analyses achieved through a secure, scalable role-based access (RBAC). We have Patient and Public Involvement and Engagement (PPIE) embedded into this project to ensure that our priorities place patients’ best interests first and foremost. This project will seed a framework from which the research communities in other HEI’s can grow to develop common data processes and infrastructure that can advance research impacts in a secure way.
 
Description RNAlater clinical pathway pilot in the NHS Genomic Medicine Services
Geographic Reach National 
Policy Influence Type Contribution to a national consultation/review
 
Description Bench, Bytes, Bedside
Amount £500,000 (GBP)
Organisation NHS England 
Sector Public
Country United Kingdom
Start 01/2023 
End 12/2023
 
Title KOs of Mismatch repair deficiency and polymerase genes, single and double KOs 
Description Isogenic KOs in RPE1 lines. Paper in review at Nature Genetics 
Type Of Material Cell line 
Year Produced 2023 
Provided To Others? No  
Impact In review, will be available to other researchers and will be deposited at ECACC 
 
Title Nexflow Mutational signatures pipeline 
Description Nexflow Mutational signatures pipeline 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? Yes  
Impact Can perform distributed analysis using federation analytics 
 
Title CYNAPSE 
Description Trusted Research Environment for multiomic data in Cambridge 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Increase use of genomic data on campus 
 
Title Federation capability between CYNAPSE and Genomics England - open APIs 
Description Federation capability 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2022 
Open Source License? Yes  
Impact First demonstration of federation between HEI and public sector clinics-genomic cohort 
 
Title Update on Signal 
Description Signal web tool. Regular updates. Latest publication in review is in pre-release stage 
Type Of Technology Webtool/Application 
Year Produced 2023 
Impact Usage wide. including by clinical colleagues. Got a specific mention at Festival of Genomics by Dr Patrick Tarpey 
 
Description Breakfast Morning Run : BFM Malaysia 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Morning run interview about women in STEM and academic medicine
Year(s) Of Engagement Activity 2024
URL https://www.bfm.my/podcast/morning-run/the-breakfast-grille/serena-nik-zainal-cambridge-cancer-genet...
 
Description Video 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact https://dareuk.org.uk/sprint-exemplar-project-multi-party-trusted-research-environment-federation/
Year(s) Of Engagement Activity 2022
URL https://dareuk.org.uk/sprint-exemplar-project-multi-party-trusted-research-environment-federation/