Establishing a common federated infrastructure for secure API-driven multi-party federation on clinico-genomic cohorts
Lead Research Organisation:
University of Cambridge
Department Name: UNLISTED
Abstract
Trusted Research Environments (TREs) are secure spaces for researchers to access and analyse sensitive data. They help prevent unauthorised access and/or re-identification of individuals from de-identified data. Many research institutions and data providers have their own TREs, but their TREs currently cannot “talk” to each other. The ability for TREs to talk is known as federation. Even where researchers are allowed to use data held in two separate TREs, analysing them together would still require their combination within a single TRE. This is challenging and costly with large datasets, like whole genome sequences, and can delay new discoveries. We propose a UK first demonstration of federation of genomic data by bridging the TREs of the NIHR Cambridge Biomedical Research Centre and Genomics England. Both contain rich, secure, governed sources of fully consented clinical-genomic data from patients.
After querying the data within the two separate TREs to find individuals with certain characteristics, a joint analysis will be run within both environments, and the results combined in a separate secure cloud environment. This means that no original data will move, only results.
New standards for federated TRE systems will be developed and shared. Learning from the project will unlock unprecedented possibilities for collaborations with clinical-genomic data across the research councils, potentially leading to new discoveries with long term public benefit.
After querying the data within the two separate TREs to find individuals with certain characteristics, a joint analysis will be run within both environments, and the results combined in a separate secure cloud environment. This means that no original data will move, only results.
New standards for federated TRE systems will be developed and shared. Learning from the project will unlock unprecedented possibilities for collaborations with clinical-genomic data across the research councils, potentially leading to new discoveries with long term public benefit.
Technical Summary
The most significant challenge for advancements in precision medicine lies in accessing and analysing large-scale distributed biomedical datasets. Due to their size and sensitive nature, biomedical datasets are primarily stored in siloed, inaccessible locations. The World Economic Forum states that 97% of all hospital data goes untouched. Critically, researchers are hampered by the inability to combine datasets at sufficient scale to maximise translational medical insights. Bringing precision medicine to life requires research that uses as much biomedical data as possible, it is reported that by increasing data 10-fold, there is a corresponding 100x increase in findings. An effective way to achieve secure, widespread data interoperability and access is through standing up Federated Trusted Research Environments (TREs).
Members of this consortium have implemented a new, live TRE, capable of federation, for the national genomics endeavour, Genomics England (GEL). With its TRE linking genomic, NHS clinical data and various national registries, GEL has demonstrated that diagnostic yield can be increased 4-5 fold for rare disease patients, and clinically actionable variants can be identified in 65% of cancer patients, resulting in transformational healthcare benefits across the NHS. The GEL TRE is thus an extraordinary resource, holding enormous potential going forward. As such, it would be valuable to the scientific and clinical communities across HEIs in the UK to improve connectivity with GEL, to reap the benefits of this national resource.
UKRI has recently propelled the notion of enhancing data connectivity. To help UKRI achieve a common federated data infrastructure, we propose a technology demonstration to establish the UK’s first live multi-party federation between TREs of a leading Higher Education Institution (HEI) of medical research, NIHR Cambridge BRC (UCAM-BRC) and a public-sector clinical research endeavour, GEL. This initial prototype will serve as a blueprint defining technical methodologies and information governance matters, ultimately enabling the adoption of federated TREs widely.
As outputs, this project will contribute open-source Application Programming Interface’s (APIs) informing TRE communication, aligned to data standardisation activities of the Global Alliance for Genomics and Health (GA4GH) and others. It will seek solutions to data governance and security matters and develop a Federated Airlock process permissive of distributed computation over disparate datasets. We will present a use-case demonstrating the value of joint analyses achieved through a secure, scalable role-based access (RBAC). We have Patient and Public Involvement and Engagement (PPIE) embedded into this project to ensure that our priorities place patients’ best interests first and foremost. This project will seed a framework from which the research communities in other HEI’s can grow to develop common data processes and infrastructure that can advance research impacts in a secure way.
Members of this consortium have implemented a new, live TRE, capable of federation, for the national genomics endeavour, Genomics England (GEL). With its TRE linking genomic, NHS clinical data and various national registries, GEL has demonstrated that diagnostic yield can be increased 4-5 fold for rare disease patients, and clinically actionable variants can be identified in 65% of cancer patients, resulting in transformational healthcare benefits across the NHS. The GEL TRE is thus an extraordinary resource, holding enormous potential going forward. As such, it would be valuable to the scientific and clinical communities across HEIs in the UK to improve connectivity with GEL, to reap the benefits of this national resource.
UKRI has recently propelled the notion of enhancing data connectivity. To help UKRI achieve a common federated data infrastructure, we propose a technology demonstration to establish the UK’s first live multi-party federation between TREs of a leading Higher Education Institution (HEI) of medical research, NIHR Cambridge BRC (UCAM-BRC) and a public-sector clinical research endeavour, GEL. This initial prototype will serve as a blueprint defining technical methodologies and information governance matters, ultimately enabling the adoption of federated TREs widely.
As outputs, this project will contribute open-source Application Programming Interface’s (APIs) informing TRE communication, aligned to data standardisation activities of the Global Alliance for Genomics and Health (GA4GH) and others. It will seek solutions to data governance and security matters and develop a Federated Airlock process permissive of distributed computation over disparate datasets. We will present a use-case demonstrating the value of joint analyses achieved through a secure, scalable role-based access (RBAC). We have Patient and Public Involvement and Engagement (PPIE) embedded into this project to ensure that our priorities place patients’ best interests first and foremost. This project will seed a framework from which the research communities in other HEI’s can grow to develop common data processes and infrastructure that can advance research impacts in a secure way.
Organisations
Publications
Krumm N
(2023)
Diagnosis of Ovarian Carcinoma Homologous Recombination DNA Repair Deficiency From Targeted Gene Capture Oncology Assays.
in JCO precision oncology
Luen SJ
(2023)
Genomic characterisation of hormone receptor-positive breast cancer arising in very young women.
in Annals of oncology : official journal of the European Society for Medical Oncology
| Description | RNAlater clinical pathway pilot in the NHS Genomic Medicine Services |
| Geographic Reach | National |
| Policy Influence Type | Contribution to a national consultation/review |
| Description | Bench, Bytes, Bedside |
| Amount | £500,000 (GBP) |
| Organisation | NHS England |
| Sector | Public |
| Country | United Kingdom |
| Start | 01/2023 |
| End | 12/2023 |
| Title | KOs of Mismatch repair deficiency and polymerase genes, single and double KOs |
| Description | Isogenic KOs in RPE1 lines. Paper in review at Nature Genetics |
| Type Of Material | Cell line |
| Year Produced | 2023 |
| Provided To Others? | No |
| Impact | In review, will be available to other researchers and will be deposited at ECACC |
| Title | Nexflow Mutational signatures pipeline |
| Description | Nexflow Mutational signatures pipeline |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2022 |
| Provided To Others? | Yes |
| Impact | Can perform distributed analysis using federation analytics |
| Title | CYNAPSE |
| Description | Trusted Research Environment for multiomic data in Cambridge |
| Type Of Material | Database/Collection of data |
| Year Produced | 2022 |
| Provided To Others? | Yes |
| Impact | Increase use of genomic data on campus |
| Title | Federation capability between CYNAPSE and Genomics England - open APIs |
| Description | Federation capability |
| Type Of Technology | New/Improved Technique/Technology |
| Year Produced | 2022 |
| Open Source License? | Yes |
| Impact | First demonstration of federation between HEI and public sector clinics-genomic cohort |
| Title | Update on Signal |
| Description | Signal web tool. Regular updates. Latest publication in review is in pre-release stage |
| Type Of Technology | Webtool/Application |
| Year Produced | 2023 |
| Impact | Usage wide. including by clinical colleagues. Got a specific mention at Festival of Genomics by Dr Patrick Tarpey |
| Description | Breakfast Morning Run : BFM Malaysia |
| Form Of Engagement Activity | A broadcast e.g. TV/radio/film/podcast (other than news/press) |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Public/other audiences |
| Results and Impact | Morning run interview about women in STEM and academic medicine |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.bfm.my/podcast/morning-run/the-breakfast-grille/serena-nik-zainal-cambridge-cancer-genet... |
| Description | Video |
| Form Of Engagement Activity | Engagement focused website, blog or social media channel |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Public/other audiences |
| Results and Impact | https://dareuk.org.uk/sprint-exemplar-project-multi-party-trusted-research-environment-federation/ |
| Year(s) Of Engagement Activity | 2022 |
| URL | https://dareuk.org.uk/sprint-exemplar-project-multi-party-trusted-research-environment-federation/ |