Creating a federated cloud-based TRE to facilitate consortium-based research and interoperability between existing institutions/TREs

Lead Research Organisation: The Francis Crick Institute
Department Name: UNLISTED

Abstract

We propose to create a technology demonstrator utilising the SnowflakeTM multi-cloud data fabric to show how a multi-institute, multi-council ecosystem can utilise the same platform for accelerating collaborative research, whilst still observing rigorous access and security management.

Our vision for the next generation of Trusted Research Environments (TRE) is to create a general purpose capability which works for the majority of researchers, the majority of the time, expressly moving away from previous concepts of discrete, monolithic environments, to a distributed ‘data fabric’ which provides the ‘glue’ between any existing institutional investments.

The key innovations we see as necessary are:
• Moving to a ‘compute in situ’ model, away from the current paradigm of ‘download and compute elsewhere’ which dominates many TRE infrastructures
• Creating a modular architecture, which creates generic services for processes such as research access management
• Providing ‘out of the box’ workflows to expedite the creation and management of international, multi-party consortia, guaranteeing compliance with in-country data sovereignty and DPR legislation
• Providing a bridge between academia, public sector bodies, especially the NHS, and industry
• Resolve asymmetry of access to computing and storage environments across the research ecosystem, allowing contribution by talent, not wealth of institution

Technical Summary

Current TREs are generally centred around a thematically consistent group of datasets, owned by a singular institution1. Each has its own computational infrastructure, security arrangements, access management and so on. Researchers need to a) know that the TRE exists, b) know what data is held in it c) have to apply individually per project for access d) have access to their own local, secure environment in which to work with the data. In some cases, d) is not viable due to data sensitivity limiting work to tools provided in the TRE. In all cases, working with data in a TRE is bureaucratically laborious and often scientifically limited, as the ability to work with data sets across TREs, or most importantly, in conjunction with locally collected experimental data, is difficult, or sometimes, impossible. Rather than building another TRE ‘silo’, we will demonstrate a series of components interworking to form a scalable ‘virtual TRE’ or vTRE architecture. This is centred around a multi- cloud ‘data fabric’, built using the commercial SnowflakeTM platform, which offers a powerful toolset for managing security, data sharing and access in a very flexible way. It also has the advantage of extreme scalability, as it is built on the huge resource pools available on AWS, Google and Azure public clouds, allowing for TREs to be easily defined in ‘Infrastructure as Code’ terms, rather than as capitally intensive physical environments. Unlike other cloud-based technologies, SnowflakeTM is unique in being a single data fabric shared between customers, rather than discrete databases which necessitate local access and security controls. Rigid and specific data sharing is enabled and simultaneously makes possible ‘immediate’ access to any approved SnowflakeTM account, with a marketplace to help discovery and public sharing of any appropriate datasets.

This work will build an illustrative data environment, enabled to connect the Crick and partners with sample data. Around this data fabric, we will build modular components to demonstrate the principles of creating a stand-alone ‘access-as-a-service’ platform, which any TRE could use to centralise access workflows and requests. This will be connected to an illustrative policy library and provisioning engine to automatically ‘spin up’ pre-defined TRE environments, removing administration whilst ensuring rigour of definition, security and access control.
 
Title TREllis 
Description An integrated platform of SaaS solutions allowing the rapid configuration and deployment of secure Trusted Research Environments on the Cloud. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Impact Enables Crick researchers to work with sensitive or patient identifiable data in clinical research, in collaboration with researchers from other institutes.