Trusted Research Environment and Enclave for Hosting Open Original Science Exploration (TREEHOOSE)
Lead Research Organisation:
University of Dundee
Department Name: UNLISTED
Abstract
Trusted Research Environments (TREs) are used by many organisations to securely manage sensitive data for research. Despite the commonality of need there is currently little standardisation of infrastructure or deployment between operators, leading to duplication of effort and impeding service improvement. TREEHOOSE aims to build on our experience of migrating a TRE hosting NHS data to public cloud for the benefit of other operators. We will also to include a new capability of Enclave computing to achieve mutually trust-free computation, protecting users’ IP and code as well as the data.
We will release open-source tooling to streamline building and operating TREs on public cloud infrastructure whilst maintaining security and trust. Secure Enclaves go beyond the traditional TRE infrastructure by adding additional barriers to prevent software algorithms from leaking data. A kit will be developed into which researchers can add their AI or other analytical code for execution within a cloud TRE, safe in the knowledge that their code is protected from reverse-engineering or unauthorised sharing even by the TRE operators, with cryptographically-verified repeatable dependencies.
TREEHOOSE will enable wider adoption of TREs and increased security modes of working thereby making possible research data analysis at scale while maintaining maximum data protections.
We will release open-source tooling to streamline building and operating TREs on public cloud infrastructure whilst maintaining security and trust. Secure Enclaves go beyond the traditional TRE infrastructure by adding additional barriers to prevent software algorithms from leaking data. A kit will be developed into which researchers can add their AI or other analytical code for execution within a cloud TRE, safe in the knowledge that their code is protected from reverse-engineering or unauthorised sharing even by the TRE operators, with cryptographically-verified repeatable dependencies.
TREEHOOSE will enable wider adoption of TREs and increased security modes of working thereby making possible research data analysis at scale while maintaining maximum data protections.
Technical Summary
TREs are necessary for the ethical and secure management of sensitive data. Traditionally, they require large, up-front capital, investment in specialist infrastructure and potentially procurement delays. The need for increased power and flexibility in TREs has driven increased adoption of cloud services in place of on-premise equipment. Cloud has matured in provision and security to the point where several security conscious organisations (e.g. Police, Government, Defence, Health) are employing them to allow for scaling efficiencies and on-demand access to leading edge hardware.
Designing and operating a TRE in the cloud requires considerable custom work, with a challenging learning curve for operations staff. The implementation, maintenance and development of TREs requires unique skillsets for which there is a current shortage. Upskilling of new and existing staff is crucial for the continued growth and development of Data Science in the UK.
Recent technical advances in Trusted Computing and cloud equipment such as AWS Nitro Enclaves allows the creation of a cryptographic ‘enclave’, within which code can execute free from tampering or inspection by the host environment: a ‘black box’, into which sensitive data can be passed, producing output which is captured and scrutinised before release to the user, without exposing the code itself to analysis or potential reverse-engineering by the TRE operators. This also provides enhanced, cryptographically-verified reproducibility of the container environment in which the code was executed.
We have been running a TRE for over a decade and have recently collaborated with AWS to develop a Cloud TRE which is now in production. TREEHOOSE will share open-source code and documentation for both operating and performing research within a modern cloud TRE with enclave facilities. This will
• Reduce the learning curve for TRE operators and users alike.
• Reduce the cost of migration to the cloud for other TREs.
• Aid portability of code between TREs.
• Make federation more straightforward through the use of common cloud deployments.
TREEHOOSE will also upskill project team members for the benefit of future research projects.
None of this work is possible nor sensible without engagement with the public and patients as they are the final arbiters of acceptable reuse of their data. We will embed workshops on trustworthiness of cloud computing with research data and report on our findings.
Designing and operating a TRE in the cloud requires considerable custom work, with a challenging learning curve for operations staff. The implementation, maintenance and development of TREs requires unique skillsets for which there is a current shortage. Upskilling of new and existing staff is crucial for the continued growth and development of Data Science in the UK.
Recent technical advances in Trusted Computing and cloud equipment such as AWS Nitro Enclaves allows the creation of a cryptographic ‘enclave’, within which code can execute free from tampering or inspection by the host environment: a ‘black box’, into which sensitive data can be passed, producing output which is captured and scrutinised before release to the user, without exposing the code itself to analysis or potential reverse-engineering by the TRE operators. This also provides enhanced, cryptographically-verified reproducibility of the container environment in which the code was executed.
We have been running a TRE for over a decade and have recently collaborated with AWS to develop a Cloud TRE which is now in production. TREEHOOSE will share open-source code and documentation for both operating and performing research within a modern cloud TRE with enclave facilities. This will
• Reduce the learning curve for TRE operators and users alike.
• Reduce the cost of migration to the cloud for other TREs.
• Aid portability of code between TREs.
• Make federation more straightforward through the use of common cloud deployments.
TREEHOOSE will also upskill project team members for the benefit of future research projects.
None of this work is possible nor sensible without engagement with the public and patients as they are the final arbiters of acceptable reuse of their data. We will embed workshops on trustworthiness of cloud computing with research data and report on our findings.
Publications
Cole C
(2022)
Health Data in Research Workshop - TREEHOOSE Project
Cole C
(2022)
Health Data in Research Workshop - TREEHOOSE Project
| Description | Influence on UK and NHS policy on use of patent data for research |
| Geographic Reach | National |
| Policy Influence Type | Citation in other policy documents |
| URL | https://zenodo.org/records/13353747 |
| Description | Use of Cloud TRE in MSc Precision Medicine degree programme |
| Geographic Reach | Multiple continents/international |
| Policy Influence Type | Influenced training of practitioners or researchers |
| Impact | Students have been trained in handling real world patient data including governance, data cleaning/preparation, genomics and machine learning which they are able to use in their future careers in health data science, academic research and other careers. |
| URL | https://www.dundee.ac.uk/postgraduate/health-data-science-applied-precision-medicine |
| Description | DARE Transformational Programme Core Component: TREvolution |
| Amount | £4,940,092 (GBP) |
| Funding ID | MC_PC_24038 |
| Organisation | Medical Research Council (MRC) |
| Sector | Public |
| Country | United Kingdom |
| Start | 03/2025 |
| End | 03/2027 |
| Description | EOSC-ENTRUST: A European Network of TRUSTed research environments |
| Amount | £3,500,000 (GBP) |
| Funding ID | 10088076 (Innovate UK) & 101131056 (Horizon Europe) |
| Organisation | Innovate UK |
| Sector | Public |
| Country | United Kingdom |
| Start | 03/2024 |
| End | 02/2027 |
| Description | SATRE - Standardised Architecture for Trusted Research Environments |
| Amount | £614,112 (GBP) |
| Funding ID | MC_PC_23008 |
| Organisation | Medical Research Council (MRC) |
| Sector | Public |
| Country | United Kingdom |
| Start | 02/2023 |
| End | 10/2023 |
| Description | UK Smart Factory Data Innovation Hub (SMDIH) |
| Amount | £15,000,000 (GBP) |
| Funding ID | 10017032 |
| Organisation | Innovate UK |
| Sector | Public |
| Country | United Kingdom |
| Start | 09/2022 |
| End | 03/2025 |
| Description | Collaboration with The Alan Turing Institute |
| Organisation | Alan Turing Institute |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | We have contributed our knowledge and experience of TREs and Safe Havens from the Scottish context. We also shared our software/infrastructure for testing with the collaborator. |
| Collaborator Contribution | They contributed their experience and knowledge from their more industrial or local government background of projects. |
| Impact | Received joint funding to support the SATRE project plus on-going interest in TRE specification and open source. |
| Start Year | 2023 |
| Description | Research Data Scotland |
| Organisation | Research Data Scotland |
| Country | United Kingdom |
| Sector | Charity/Non Profit |
| PI Contribution | We have provided RDS with knowledge and expertise of the TRE landscape within the UK which is of relevance to Scotland. |
| Collaborator Contribution | RDS have supported us and the wider Scottish Safe Haven Network with funding, promotion and staff time as part of our working relationship. They directly funded a collaborative project to align the Scottish Safe Havens to SATRE in 2024. |
| Impact | We developed research proposals together, provided feedback on public output and involvement in Scotland, and cross-promoted the importance of research data for public benefit. |
| Start Year | 2022 |
| Title | SATRE specification source |
| Description | First stable release of the SATRE specification. This release is the output of 8 months of work with the Trusted Research Community and represents a baseline for the community to review and contribute to. Please see our contributing guide. This release includes evaluations against the SATRE standard of TREs deployed at the Alan Turing Institute and the Health Informatics Centre at the University of Dundee and we would especially welcome other organisations contributing evaluations for their own TRE deployments. Read the blog post for this release for more information. |
| Type Of Technology | Software |
| Year Produced | 2023 |
| Impact | Promoting change within the UK TRE/SDE community for working together on solving common issues around data access of sensitive datasets for research. |
| URL | https://zenodo.org/doi/10.5281/zenodo.8017044 |
| Title | TREEHOOSE v1.0.0-beta1 |
| Description | Initial public release of TREEHOOSE beta. |
| Type Of Technology | Software |
| Year Produced | 2022 |
| Open Source License? | Yes |
| Impact | The TREEHOOSE open source infrastructure as code was the encapsulation of 10 years experience of managing a Trusted Research Environment (TRE) by the Health Informatics Centre (HIC). Since it's release it being implemented within the Smart Manufacturing Data Hub (SMDH) Innovate UK funded project to securely host sensitive manufacturing data from SMEs. The TREEHOOSE and Turing Data Safe Haven codebases have been included in a new collaboration project, Standardised Architecture for TREs (SATRE), funded by DARE |
| URL | https://zenodo.org/record/6908253 |
| Description | ELIXIR Human Data Communities Event |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Talk on HDR and Gateway |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://docs.google.com/document/d/1lF7wiSKzvJrFsfEJ6NAtZNlpDd6Vqjc-UCcEYZcEX2E/edit?tab=t.0#heading... |
| Description | HPC-AI Conference |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was an invited speaker at the 5th Annual HPC-AI Advisory Council UK Conference. Presentation: TREs at Scale. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.hpcwire.com/off-the-wire/5th-annual-hpc-ai-advisory-council-uk-conference-set-for-octobe... |
| Description | Health Data Research UK Conference 2024: The Grand Challenges in Health Data |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | • Health Data Research UK's annual conference will be a free two-day, hybrid event to celebrate progress and bring people together to accelerate the trustworthy use of health data for public benefit. Day 1 tackles the grand challenges in health data. Talks and panel discussions will cover everything from molecules to AI, GP data to wearables, maintaining public confidence to gaining an international perspective. The sessions are designed to inspire, enthuse and bring together members of the scientific community, industry, patients and the public. • Day 2 aims to promote the technology ecosystem and the contributions of colleagues across the sector addressing the grand challenges in health data research. This means bringing together the community developing technical solutions, encouraging adoption of standards for interoperability and pushing the science of data infrastructure, capacity building and training. Join scientists, research software engineers and technologists from across the UK and around the world in sessions to share learning and technical solutions and build new networks. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.hdruk.ac.uk/hdruk-conference-2024/ |
| Description | Invited speaker at Hartree internal lecture series |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presented a talk discussing TREs and ML disclosure control at the STFC Hartree Centre |
| Year(s) Of Engagement Activity | 2023 |
| Description | Invited speaker: HDR Technology Ecosystem and the Gateway: UKRI Data Infrastructure Club Show and Tell |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Other audiences |
| Results and Impact | Emily Jefferson was an invited speaker, leading a presentation on the HDR UK Technology Ecosystem and the Gateway at the UKRI Data Infrastructure Club Show and Tell: 31st Jan 2023. |
| Year(s) Of Engagement Activity | 2023 |
| Description | Invited speaker: HDR Technology Ecosystem. UK DRI Informatics Scoping Event - London. 8th and 9th March 2023 |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Emily Jefferson, CTO of HDR UK was an invited speaker to lead a presentation on the HDR Technology Ecosystem at the UK DRI Informatics Scoping Event - London. 8th and 9th March 2023 |
| Year(s) Of Engagement Activity | 2023 |
| Description | Invited speaker: Technology Ecosystem - Launch. Technology Ecosystem Conference/Workshop. Birmingham. 6th Feb 2023 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Other audiences |
| Results and Impact | Technology Ecosystem Conference (6th February 2023) brought together different technology groups from across the community to strengthen relationships and generate ideas to deliver trustworthy infrastructure and services across the health data research ecosystem |
| Year(s) Of Engagement Activity | 2023 |
| Description | Invited speaker: The power of DRI: A health data perspective. UKRI Digital Research Infrastructure (DRI) Congress. |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Emily Jefferson was an invited speaker to present on: The power of DRI: A health data perspective at the UKRI Digital Research Infrastructure (DRI) Congress. 6th and 7th March 2023. |
| Year(s) Of Engagement Activity | 2023 |
| Description | Invited talk at UKRI Cloud Workshop at Crick Institute London |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | TREEHOOSE team member invited to Cloud workshop organised by UKRI working group. Lots of discussion regarding the cloud infrastructure in research and requests for more information refarding the TREEHOOSE project and HIC TRE expertise in general. |
| Year(s) Of Engagement Activity | 2022 |
| URL | https://cloud.ac.uk/2022/02/27/programme-for-ukri-cloud-workshop-2022/ |
| Description | Japan Association for Medical Informatics Conference |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was a keynote speaker at the 43rd Joint Conference on Medical Informatics. Presentation: The UK's progress towards enabling secure, researcher access to sensitive health data at a UK population scale. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://confit-atlas-jp.translate.goog/guide/event/jcmi2023/session/3A11-13/detail?_x_tr_sl=ja&_x_tr... |
| Description | Keynote speaker: Towards Federated Analytics for Population Data. International Data Science Conference - Tokyo, Japan, 22/05/23 |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Emily Jefferson was invited as a keynote speaker to present 'Towards Federated Analytics for Population Data. International Data Science Conference - Tokyo, Japan' on 22/05/23 |
| Year(s) Of Engagement Activity | 2023 |
| Description | Pankhurst/HDR Collaboration Meeting |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Professional Practitioners |
| Results and Impact | Delivering a presentation on Delivering a Sustainable technology ecosystem and Gateway Development. Providing an update on HDR UK's work to accelerate trustworthy data use. |
| Year(s) Of Engagement Activity | 2024 |
| Description | Presentation at AWS North East Scotland User Group meeting |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presentation on the HIC TRE given to the AWS North East Scotland user group |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.manicstreetpreacher.co.uk/hic-presentations-public/20230524-aws-nescotland-tre/ |
| Description | Presentation at HDR UK Tech meeting 6th Feb |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presented the Alleviate and TRE technology developments to an HDR UK technology workshop. |
| Year(s) Of Engagement Activity | 2023 |
| Description | Presentation at Research Software Engineers |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Gave presentation to RSEcon TRE workshop which included international participants from Denmark and the US, plus industrial members from Microsoft and HP. Developed a TRE working group and agreement which led to a DARE UK collaboration. |
| Year(s) Of Engagement Activity | 2022 |
| URL | https://rsecon2022.society-rse.org/ |
| Description | Presentation to ELIXIR-UK - Human Data Community |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presentation covering the Gateway, Researcher Passports, Federated Analytics and the DARE Programme. Outputs were new collaborations and understanding of what we are doing in this field. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://elixiruknode.org/activities/uk-human-data-community/ |
| Description | Public engagment workshop on health data in research 11th May 2022 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Patients, carers and/or patient groups |
| Results and Impact | An online workshop with interested public participants were involved in contributing to understanding of how health data are used in research for the benefit of the public and patients. This was an interactive session with participates providing feedback and contributing via in-meeting resources such as mentimeter. |
| Year(s) Of Engagement Activity | 2022 |
| Description | Research Software Engineers (RSE) Conference |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was an invited speaker to the Seventh Annual Research Software Engineering Conference. Presentation: Can convening a Technology Ecosystem help TREs to work together? |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://rsecon23.society-rse.org/ |
| Description | SATRE - A National Specification for Trusted Research Environments |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presented at the Annual HDR UK conference in Leeds |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.hdruk.ac.uk/hdruk-conference-2024/ |
| Description | TRE Quarterly Community Meetings |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Professional Practitioners |
| Results and Impact | Quarterly meeting to discuss current workstreams and planning for further work |
| Year(s) Of Engagement Activity | 2024 |
| Description | TRE demo webinar |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Public/other audiences |
| Results and Impact | Demonstration on the operation of the regional Trusted Research Environment (TRE), managed by Health Informatics Centre (HIC) of Dundee, followed by a brief talk on the Data Linkage services offered by Health Informatics Centre. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.eventbrite.co.uk/e/trusted-research-environment-demo-and-intro-to-hic-data-linkage-servi... |
| Description | Talk to European Genomic Data Infrastructure |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presentation covering the work of HDR UK in the technology ecosystem, the Gateway, the Phenotype Library, the Disease Atlas, Federated Analytics and cohort discovery along with DARE UK projects. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://gdi.onemilliongenomes.eu/ |
| Description | Technology Session at HDR UK Conference |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Focus on Technology for a Day of the HDR UK conference. A range of talks covering many different projects and promoting their adoption |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.hdruk.ac.uk/hdruk-conference-2024/ |
| Description | UK TRE Community Meeting |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Emily Jefferson (CTO, HDR UK and Interim Director of DARE UK) was the keynote speaker at the UK TRE Community Meeting that was part of the RSE Conference. Presentation: Call to action! |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.eventbrite.com/e/uk-tre-community-september-meeting-tickets-676066472017 |
| Description | Workshop on terminology used in health data research 23rd June 2022 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Patients, carers and/or patient groups |
| Results and Impact | This was a follow-up session from the 11th May event where the same participants were involved in providing feedback and contributing to better definitions of terms used in health data research for public use. |
| Year(s) Of Engagement Activity | 2022 |
