DARE: Creating the blueprint for a federated network of next generation, cross-council Trusted Research Environments.

Lead Research Organisation: University Hospitals Birmingham NHS Foundation Trust
Department Name: UNLISTED

Abstract

Solving society’s complex challenges requires experts working together, studying data collected for different purposes & from different sources & locations. However, combining data is challenging. There are public concerns about data security & access, especially for health data. Data governance (legal & ethical frameworks for data sharing) is critical. There are technical challenges in combining data collected in different “data languages” & in building secure computer networks which enable collaborative work, but protect privacy.

FED-NET builds on our operational system, providing a scalable solution to the technical & governance challenges of analysing datasets separated by geography & data language.

Working with patients, the public, analysts & clinicians, we have co-designed a secure way to combine sensitive health data with other data, working across 5 NHS hospitals. We have co-built a transparent governance process, ensuring data access is legal, with full public oversight.

We will scale our existing Trusted Research Environments (secure environments that ensure data privacy but enable large scale analytics) using “federated analytics” where the data stays put & the analysis moves.

We will test how different data languages can be translated into a common standard, focusing on data highly valued in research (laboratory science, meteorological data) but rarely available, using a study of asthma. We will test our governance solution, through public and expert workshops.

Technical Summary

Tackling societal challenges requires data & partnerships which span traditional funder silos. Data collected for specific purposes have distinct structures & ontologies. There are different common data models; none are comprehensive for cross-council research. Comprehensive datasets increase the risk of reidentification. Workshops with >400 lay members confirmed support for data access for public good, with data exposure limited to “where necessary” & “NHS proximity” as a gold standard.

FED-NET will test;
1. If data of differing modalities/languages can be combined using a standardised framework?
2. How open standards map diverse data for cross-council projects?
3. If a federated analytics model (including governance) can be deployed?
4. If this model serves analytical need & enhances public trust?

This DARE sprint will implement & test an innovative, scalable, industry-aligned Trusted Research Environment(TRE) & governance model which facilitates enhanced federated data discovery, focusing on a test case of asthma, including clinical, meteorological, pollution & translational data.

Councils served by the test case include MRC, EPSRC, InnovateUK and NERC.

Methods
The technical architecture is built & operational (HDR-UK PIONEER data haven/TRE). PIONEER’s tested governance model will be piloted across federated TREs, to determine scalability.

We will automate elements of the HDR-UK Five Safes, providing a metadata interchange, expanding equitable access to high-quality research data assets, reducing health inequalities.

Data solutions will be built around open standards including REST, HTTP, OMOP, & FHIR- UK, reducing proprietary/commercial constraints. Both NUH & UHB have experience in this. Research metadata will be queried following W3C international standards for data management & system interoperability.

We will adopt the Resource Description Framework(RDF) to support metadata exchange, using the query language SPARQL to facilitate express queries across diverse linked data sources. Scalability will enable basic statistical work to advanced machine learning. To allow contemporaneous metadata to be pulled or pushed, a secure standards-based RESTful API will be specified & implemented, allowing equitable access over the open HTTP protocol.

Data will be extracted to, staged in, & queried from an RDF-compatible meta-database preserving the original granularity, context, semantics, & encoding.

On request, the API will translate metadata to other populate research models such as OMOP or FHIR for enhanced onwards transportation & federation. Query results can be aggregated or used for statistical analysis, with results sent back to the client.

Data controller, analyst & public involvement events will assess if stakeholder and user-need is met with enhanced public trust.

Test case data assets are in hand, but in native language.

Impacts include:
• Blueprints & code templates for federated TRE networks.
• A map of limitations of common data models versus native language for diverse data assets.
• An understanding of more readily extensible data models than the current CDMs in widespread use.
• Production of deeply phenotyped cross-council research assets covering two large acute trusts and BRCs without direct exposure of sensitive data to researchers or transferring data between data controllers.
• The expansion of a publicly co-produced information governance framework.

Phase 2 test the wider scalability & commercial offer of this model.

Publications

10 25 50

publication icon
Atkin C (2022) The impact of changes in coding on mortality reports using the example of sepsis. in BMC medical informatics and decision making