DARE: Creating the blueprint for a federated network of next generation, cross-council Trusted Research Environments.
Lead Research Organisation:
University Hospitals Birmingham NHS Foundation Trust
Department Name: UNLISTED
Abstract
Solving society’s complex challenges requires experts working together, studying data collected for different purposes & from different sources & locations. However, combining data is challenging. There are public concerns about data security & access, especially for health data. Data governance (legal & ethical frameworks for data sharing) is critical. There are technical challenges in combining data collected in different “data languages” & in building secure computer networks which enable collaborative work, but protect privacy.
FED-NET builds on our operational system, providing a scalable solution to the technical & governance challenges of analysing datasets separated by geography & data language.
Working with patients, the public, analysts & clinicians, we have co-designed a secure way to combine sensitive health data with other data, working across 5 NHS hospitals. We have co-built a transparent governance process, ensuring data access is legal, with full public oversight.
We will scale our existing Trusted Research Environments (secure environments that ensure data privacy but enable large scale analytics) using “federated analytics” where the data stays put & the analysis moves.
We will test how different data languages can be translated into a common standard, focusing on data highly valued in research (laboratory science, meteorological data) but rarely available, using a study of asthma. We will test our governance solution, through public and expert workshops.
FED-NET builds on our operational system, providing a scalable solution to the technical & governance challenges of analysing datasets separated by geography & data language.
Working with patients, the public, analysts & clinicians, we have co-designed a secure way to combine sensitive health data with other data, working across 5 NHS hospitals. We have co-built a transparent governance process, ensuring data access is legal, with full public oversight.
We will scale our existing Trusted Research Environments (secure environments that ensure data privacy but enable large scale analytics) using “federated analytics” where the data stays put & the analysis moves.
We will test how different data languages can be translated into a common standard, focusing on data highly valued in research (laboratory science, meteorological data) but rarely available, using a study of asthma. We will test our governance solution, through public and expert workshops.
Technical Summary
Tackling societal challenges requires data & partnerships which span traditional funder silos. Data collected for specific purposes have distinct structures & ontologies. There are different common data models; none are comprehensive for cross-council research. Comprehensive datasets increase the risk of reidentification. Workshops with >400 lay members confirmed support for data access for public good, with data exposure limited to “where necessary” & “NHS proximity” as a gold standard.
FED-NET will test;
1. If data of differing modalities/languages can be combined using a standardised framework?
2. How open standards map diverse data for cross-council projects?
3. If a federated analytics model (including governance) can be deployed?
4. If this model serves analytical need & enhances public trust?
This DARE sprint will implement & test an innovative, scalable, industry-aligned Trusted Research Environment(TRE) & governance model which facilitates enhanced federated data discovery, focusing on a test case of asthma, including clinical, meteorological, pollution & translational data.
Councils served by the test case include MRC, EPSRC, InnovateUK and NERC.
Methods
The technical architecture is built & operational (HDR-UK PIONEER data haven/TRE). PIONEER’s tested governance model will be piloted across federated TREs, to determine scalability.
We will automate elements of the HDR-UK Five Safes, providing a metadata interchange, expanding equitable access to high-quality research data assets, reducing health inequalities.
Data solutions will be built around open standards including REST, HTTP, OMOP, & FHIR- UK, reducing proprietary/commercial constraints. Both NUH & UHB have experience in this. Research metadata will be queried following W3C international standards for data management & system interoperability.
We will adopt the Resource Description Framework(RDF) to support metadata exchange, using the query language SPARQL to facilitate express queries across diverse linked data sources. Scalability will enable basic statistical work to advanced machine learning. To allow contemporaneous metadata to be pulled or pushed, a secure standards-based RESTful API will be specified & implemented, allowing equitable access over the open HTTP protocol.
Data will be extracted to, staged in, & queried from an RDF-compatible meta-database preserving the original granularity, context, semantics, & encoding.
On request, the API will translate metadata to other populate research models such as OMOP or FHIR for enhanced onwards transportation & federation. Query results can be aggregated or used for statistical analysis, with results sent back to the client.
Data controller, analyst & public involvement events will assess if stakeholder and user-need is met with enhanced public trust.
Test case data assets are in hand, but in native language.
Impacts include:
• Blueprints & code templates for federated TRE networks.
• A map of limitations of common data models versus native language for diverse data assets.
• An understanding of more readily extensible data models than the current CDMs in widespread use.
• Production of deeply phenotyped cross-council research assets covering two large acute trusts and BRCs without direct exposure of sensitive data to researchers or transferring data between data controllers.
• The expansion of a publicly co-produced information governance framework.
Phase 2 test the wider scalability & commercial offer of this model.
FED-NET will test;
1. If data of differing modalities/languages can be combined using a standardised framework?
2. How open standards map diverse data for cross-council projects?
3. If a federated analytics model (including governance) can be deployed?
4. If this model serves analytical need & enhances public trust?
This DARE sprint will implement & test an innovative, scalable, industry-aligned Trusted Research Environment(TRE) & governance model which facilitates enhanced federated data discovery, focusing on a test case of asthma, including clinical, meteorological, pollution & translational data.
Councils served by the test case include MRC, EPSRC, InnovateUK and NERC.
Methods
The technical architecture is built & operational (HDR-UK PIONEER data haven/TRE). PIONEER’s tested governance model will be piloted across federated TREs, to determine scalability.
We will automate elements of the HDR-UK Five Safes, providing a metadata interchange, expanding equitable access to high-quality research data assets, reducing health inequalities.
Data solutions will be built around open standards including REST, HTTP, OMOP, & FHIR- UK, reducing proprietary/commercial constraints. Both NUH & UHB have experience in this. Research metadata will be queried following W3C international standards for data management & system interoperability.
We will adopt the Resource Description Framework(RDF) to support metadata exchange, using the query language SPARQL to facilitate express queries across diverse linked data sources. Scalability will enable basic statistical work to advanced machine learning. To allow contemporaneous metadata to be pulled or pushed, a secure standards-based RESTful API will be specified & implemented, allowing equitable access over the open HTTP protocol.
Data will be extracted to, staged in, & queried from an RDF-compatible meta-database preserving the original granularity, context, semantics, & encoding.
On request, the API will translate metadata to other populate research models such as OMOP or FHIR for enhanced onwards transportation & federation. Query results can be aggregated or used for statistical analysis, with results sent back to the client.
Data controller, analyst & public involvement events will assess if stakeholder and user-need is met with enhanced public trust.
Test case data assets are in hand, but in native language.
Impacts include:
• Blueprints & code templates for federated TRE networks.
• A map of limitations of common data models versus native language for diverse data assets.
• An understanding of more readily extensible data models than the current CDMs in widespread use.
• Production of deeply phenotyped cross-council research assets covering two large acute trusts and BRCs without direct exposure of sensitive data to researchers or transferring data between data controllers.
• The expansion of a publicly co-produced information governance framework.
Phase 2 test the wider scalability & commercial offer of this model.
Organisations
- University Hospitals Birmingham NHS Foundation Trust (Lead Research Organisation)
- University of Manchester (Collaboration)
- UNIVERSITY OF NOTTINGHAM (Collaboration)
- UNIVERSITY OF LEICESTER (Collaboration)
- University of Warwick (Collaboration)
- UNIVERSITY HOSPITALS BIRMINGHAM NHS FOUNDATION TRUST (Collaboration)
Publications
Aiyegbusi OL
(2023)
Considerations for patient and public involvement and engagement in health research.
in Nature medicine
Atkin C
(2022)
How do we identify acute medical admissions that are suitable for same day emergency care?
in Clinical medicine (London, England)
Atkin C
(2022)
The impact of changes in coding on mortality reports using the example of sepsis.
in BMC medical informatics and decision making
Bangash MN
(2022)
Impact of ethnicity on the accuracy of measurements of oxygen saturations: A retrospective observational cohort study.
in EClinicalMedicine
Title | Animation looking at the use of meta genomic diagnostic pathways in infections and how this can rationalise antibiotic use |
Description | An animation co-created with members of the public |
Type Of Art | Film/Video/Animation |
Year Produced | 2024 |
Impact | Being used in public health campaigns locally and to explain concepts |
Title | Your health data could save lives |
Description | An animation, co-written with members of the public to highlight how health data can be used for research and what peoples choices are |
Type Of Art | Film/Video/Animation |
Year Produced | 2022 |
Impact | Very good feedback and wide usage |
Description | MHRA consultation about safe medicines use |
Geographic Reach | National |
Policy Influence Type | Participation in a guidance/advisory committee |
Impact | Move away from Valproate use in emergency medicine |
Description | Met with patient advisory group to discuss use of health data by industry |
Geographic Reach | National |
Policy Influence Type | Participation in a guidance/advisory committee |
Impact | Build commercial model which is being tested nationally |
Description | NICE technology appraisal for remdesivir in COVID-19 |
Geographic Reach | National |
Policy Influence Type | Participation in a guidance/advisory committee |
Impact | Data used to discuss role for this treatment in COVID with guidelines now reflecting this expert testimonial |
Description | Workshop with 50 members of the stakeholder |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
Impact | Helped build protocol for NHSE SDE programme |
Description | Workshop with Members of the Public and Patients to |
Geographic Reach | National |
Policy Influence Type | Contribution to a national consultation/review |
Impact | Survey conducted before and after event showed a change in attitudes and enhanced knowledge |
Description | Biomedical Research Centre. Infections in Acute Care |
Amount | £1,600,000 (GBP) |
Organisation | National Institute for Health Research |
Sector | Public |
Country | United Kingdom |
Start | 12/2022 |
End | 11/2027 |
Description | Medicines in Acute Care Driver programme |
Amount | £5,000,000 (GBP) |
Organisation | Health Data Research UK |
Sector | Private |
Country | United Kingdom |
Start | 03/2023 |
End | 03/2028 |
Description | Patient Safety Reserach Centre Digital Clinical Support Tools in Acute Care |
Amount | £3,600,000 (GBP) |
Organisation | National Institute for Health Research |
Sector | Public |
Country | United Kingdom |
Start | 03/2023 |
End | 03/2028 |
Description | Winter Pressures |
Amount | £75,000 (GBP) |
Organisation | Health Data Research UK |
Sector | Private |
Country | United Kingdom |
Start | 01/2023 |
End | 03/2023 |
Title | A blueprint for a TRE which meets international security standards |
Description | This enables a TRE to be spun up in matter of hours, which is safe and secure |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | We are just publishing this |
Title | The West Midlands NHSE Phase 1 SDE based on PIONEER build |
Description | PIONEER has formed the blue print for the NHSE West Midlands Secure Data Environment and the PIONEER protocol and learnings from federation have formed the protocol blueprint and commercial model. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2024 |
Provided To Others? | Yes |
Impact | Standardised and secure data platform which meets ISO standards and a protocol which is freely available on request |
Title | An NIHR Birmingham Biomedical Research Centre dataset of 21,581 intensive care admissions including demographic data, severity scores (APACHE, SAPS, SOFA) with investigations, serial physiology, treatments, and outcomes up to one year post admission. |
Description | A highly granular dataset of 21,581 critical care admissions, curated by the NIHR Birmingham Biomedical Research Centre Infection and Acute Care Theme in collaboration with PIONEER. The data includes initial presentation, presenting symptoms, and several pre-calculated severity scoring systems including Simple Acute Physiology Score (SAPS), the Acute Physiology and Chronic Health Evaluation (APACHE) and the Sequential Organ Failure Assessment (SOFA) score. Data includes demography, serial physiology, ventilatory parameters, investigations, treatments (drug, dose, route), diagnostic codes (ICD-10 & SNOMED-CT) and outcomes, following patients for one year. This can be supplemented with imaging (results and images) and linked to ambulance conveyance and longer-term outcomes in the community. The current dataset includes admissions from 2017 to 2023 but can be expanded to assess other timelines of interest. |
Type Of Material | Database/Collection of data |
Year Produced | 2024 |
Provided To Others? | Yes |
Impact | Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details. Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can build synthetic data to meet bespoke requirements. Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and "off the shelf" Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and "fast screen" services to assess population size. |
URL | https://web.www.healthdatagateway.org/dataset/ea03d4e1-73e8-4d84-b93a-a41febf73fb4 |
Title | Hospitalised patients with diabetic emergencies & acute diabetic health concerns |
Description | A dataset of 168,706 diabetic emergencies and acute admissions associated with diabetes-related health concerns, including demographic data with investigations, serial physiology and outcomes. |
Type Of Material | Database/Collection of data |
Year Produced | 2024 |
Provided To Others? | Yes |
Impact | All patients admitted to hospital from year 2002 and onwards, curated to focus on Diabetes. Longitudinal & individually linked, so that the preceding & subsequent health journey can be mapped & healthcare utilisation prior to & after admission understood. The dataset includes highly granular patient demographics & co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to acute care process (timings, staff grades, specialty review, wards and triage). Along with presenting complaints, outpatients admissions, microbiology results, referrals, procedures, therapies, all physiology readings (pulse, blood pressure, respiratory rate, oxygen saturations and others), all blood results(urea, albumin, platelets, white blood cells and others). Includes all prescribed & administered treatments and all outcomes. Linked images are also available (radiographs, CT scans, MRI). Available supplementary data: Matched controls; ambulance, OMOP data, synthetic data. Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, "fast screen" services. |
URL | https://web.www.healthdatagateway.org/dataset/0d556d7e-be27-4979-a09e-d419b2e838f3 |
Title | Synthetic data replicating 20,000 ethnically diverse hypertrophic cardiomyopathy patients: This includes clinical and biological phenotyping, co-morbidities, investigations (including ECG, ECHO), any procedures undertaken and outcomes. |
Description | A PIONEER synthetic dataset of 20,000 ethnically diverse hypertrophic cardiomyopathy patients created using CT-GAN generative AI. Data includes clinical & biological phenotyping, co-morbidities, investigations (ECG, ECHO), procedures & outcomes. Well-created synthetic data establishes a governance risk-free environment for algorithm development & experimentation. This includes evaluating new treatment models, care management systems, clinical decision support, and more. Synthetic data is of particular use in rare diseases, where real data may be in short supply, or to replicate disease in less common patient demographics (e.g. ethnicities). Familial hypertrophic cardiomyopathy (HCM) is a rare genetic condition characterised by thickening (hypertrophy) of the cardiac muscle, usually of the interventricular septum. Arrhythmias can be life threatening and HCM is associated with an increased risk of sudden death. Some affected individuals develop potentially fatal heart failure, which may require heart transplantation. Approximately 130,000 people have HCM in the UK, but there is a significant burden of undiagnosed disease and diagnostic delay. |
Type Of Material | Database/Collection of data |
Year Produced | 2024 |
Provided To Others? | Yes |
Impact | Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details. Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can provide real world data to meet bespoke requirements. Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and "off the shelf" Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and "fast screen" services to assess population size. |
URL | https://www.pioneerdatahub.co.uk/wp-content/uploads/Patients-at-Risk-of-Sudden-Death-Hypertrophic-Ca... |
Title | Synthetic dataset of cross council data for asthma exacerbations, cytokines, air pollution and weather |
Description | A synthetic dataset including data fields replicating an EHR, geographical location, air quality, IL-6 levels and ambient temperature above for > 20,000 records |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Enabled deliver of FED-NET |
Description | DARE Sprints 1b - DARE-FX: delivering a federated network of TREs to enable safe analytics |
Organisation | University of Manchester |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | This is a new collaboration which has arisen due to the DARE sprint 1a work, seeking to expand on our work within the initial DARE sprint |
Collaborator Contribution | We are contributing technical expertise and synthetic data |
Impact | The project started 2 months ago -so too early for outputs as yet |
Start Year | 2023 |
Description | DARE Sprints 1b - DARE-FX: delivering a federated network of TREs to enable safe analytics |
Organisation | University of Nottingham |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | This is a new collaboration which has arisen due to the DARE sprint 1a work, seeking to expand on our work within the initial DARE sprint |
Collaborator Contribution | We are contributing technical expertise and synthetic data |
Impact | The project started 2 months ago -so too early for outputs as yet |
Start Year | 2023 |
Description | Data and Enabling Technologies Group |
Organisation | University of Leicester |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | The group are leading an initiative to construct a national medicines data map. Reflecting both national and international populations, this data map is set to become an invaluable asset for informing future medicines-related research. This expert working group is ongoing from 2023-2028 |
Collaborator Contribution | The group is exploring the adoption of innovative technology developed by Leicester. The 'LeHMR' online platform which allows researchers to submit metadata about their datasets. Partners involved: University Hospitals Birmingham, Leicester University and University of Leeds. The group is expected to expand in 2024. |
Impact | Working ongoing |
Start Year | 2023 |
Description | NIHR Patient Safety Research Collaboration Theme - Clinical Decision Support Tools |
Organisation | University of Warwick |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We will lead on building and testing of clinical decision support tools fir use in acute and emergency medicine |
Collaborator Contribution | They will help provide input into user acceptability |
Impact | Starting April 2023 - so no impacts as yet |
Start Year | 2023 |
Description | • Winter Pressures NHSE Funding - Improving patient selection to same day emergency care |
Organisation | University Hospitals Birmingham NHS Foundation Trust |
Department | Acute Medicine |
Country | United Kingdom |
Sector | Hospitals |
PI Contribution | We are running this project, funded by NHSE, to see if we can build better selection tools for SDEC care pathways - to reduce avoidable admissions to hospitals via acute medical units. |
Collaborator Contribution | N/A |
Impact | We have developed a patient facing leaflet about SDEC, held community workshops about our tool, and have developed two potential tools for further assessment. |
Start Year | 2023 |
Title | Blueprint for NHSE West Midlands Secure Data Environment |
Description | A blueprint for an SDE which can be sued by NHS organisations |
Type | Support Tool - For Fundamental Research |
Current Stage Of Development | Initial development |
Year Development Stage Completed | 2023 |
Development Status | Under active development/distribution |
Impact | Effective and efficient model which has been adopted widely |
Title | TRE for federated analytics now being used widely |
Description | A TRE which is currently used across a small number of Data Controllers |
Type | Health and Social Care Services |
Current Stage Of Development | Refinement. Non-clinical |
Year Development Stage Completed | 2022 |
Development Status | Under active development/distribution |
Impact | A cost effective, secure and deployable TRE |
Title | Blue print for NHSE West Midlands SDE |
Description | This is a blueprint for a cybersecurity tested SDE including data ingress and egress, data warehousing - tested and meeting ISO standards |
Type Of Technology | New/Improved Technique/Technology |
Year Produced | 2024 |
Impact | Being used across West Midlands |
Title | TRE for PIONEER for federated analytics |
Description | This is a blueprint for a TRE |
Type Of Technology | Software |
Year Produced | 2022 |
Impact | Adopted across a number of Data controllers |
Description | HDR UK Driver Programmes Priorities Meeting |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Study participants or study members |
Results and Impact | HDR UK convened this meeting to discuss workplans across the national driver programmes. Liz Sapey presented to the group on the Medicines in Acute and Chronic Care Driver Programme ambitions and workplan. This facilitated discussion around opportunities for integration across programmes and informed the group. There was also a deep dive into data and infrastructure priorities, discussion around access/ integration and support from HDR UK Pillars - e.g Trust and Transparency Capacity building plans. |
Year(s) Of Engagement Activity | 2023 |
Description | Medicines in Acute and Chronic Care Driver Programme, Drug-Drug Interactions Workshop |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Study participants or study members |
Results and Impact | Workshop purpose: to facilitate dialogue on developing a Medicines in Acute and Chronic Care Programme policy on the standardisation of drug-drug interactions. This standardisation will serve as a unified way of working across the Programme and will also be extended to other HDR UK driver programmes. The workshop also provided the opportunity to evaluate existing drug-drug interaction resources and explore the feasibility of developing a dedicated resource for multi-way interactions or a gene interaction resource. Further discuss took place on the possibilities for owning and maintaining this type of resource and identify potential funding sources to support it. Munir Pirmohamed and Tjeerd Van Staa presented talks at this workshop on the above topics. The workshop took place on 27/09/2023 |
Year(s) Of Engagement Activity | 2023 |
Description | Medicines in Acute and Chronic Care Programme Meetings (Primary Care/Secondary Care, All Programme) |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Study participants or study members |
Results and Impact | The Medicines in Acute and Chronic Care Programme hosts monthly and quarterly programme meetings to discuss the primary care and secondary care, medicines innovation workstream as well as all other workstreams within the programme. These meetings bring together the programme partners across 10 research organisations. The meeting provides the opportunity to report on updates, progress and encourages collaborative dialogue across the programme. Munir Pirmohamed and Liz Sapey primarily Chair and present at these meetings and the future direct of the programme is coordinated through these meetings. |
Year(s) Of Engagement Activity | 2023,2024 |
Description | Stakeholder workshop of CIO, CMOs, form data controllers and data scientists |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | We held a series of workshops to discuss the implications of the Goldacre Review and how data egress could be prevented through the use of federated analytics and learning through TREs |
Year(s) Of Engagement Activity | 2022 |
Description | Workshop with members of the public about their views on data egress versus federated approaches to consented health data |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Patients, carers and/or patient groups |
Results and Impact | A workshop and follow on series of working groups to agree on knowledge share and form a leaflet for members of the public to describe what federated analysis is, what its benefits and limitations are |
Year(s) Of Engagement Activity | 2022 |