FAIR TREATMENT: Federated analytics and AI Research across TREs for AdolescenT MENTal health
Lead Research Organisation:
University of Cambridge
Department Name: UNLISTED
Abstract
Negative aspects of a young person's life can lead to poor mental health (MH). However, services are stretched so often intervene late, leaving young people to suffer with longer lasting / more severe problems. It is possible to spot patterns showing who needs professional help early. However this is difficult as the information needed is secured in different places (e.g.health, education, social care records and falls under the remit of different research councils (MRC, ESRC). The main problems are:
1) predictive models aren’t accurate enough: difficulties linking the above data together probably result in many factors being missed;
2) models built in one place may not be effective in others: we need a way to securely analyse data from different places;
3) there is no agreement on how to make sure data are managed safely, fairly and transparently.
To solve these problems we will:
1) combine two new technologies to demonstrate it is possible to analyse data across trusted research environments in different places and preserve individual’s privacy;
2) consult with patients, the public, organisations contributing data, and legal/ethics experts to agree the best way to oversee data use, ensuring it’s managed safely and fairly.
We can start quickly as we have been working together for three years and have already been funded to bring data together from education, social care and health services in Cambridgeshire and Peterborough, and the necessary ethical permissions are in place.
1) predictive models aren’t accurate enough: difficulties linking the above data together probably result in many factors being missed;
2) models built in one place may not be effective in others: we need a way to securely analyse data from different places;
3) there is no agreement on how to make sure data are managed safely, fairly and transparently.
To solve these problems we will:
1) combine two new technologies to demonstrate it is possible to analyse data across trusted research environments in different places and preserve individual’s privacy;
2) consult with patients, the public, organisations contributing data, and legal/ethics experts to agree the best way to oversee data use, ensuring it’s managed safely and fairly.
We can start quickly as we have been working together for three years and have already been funded to bring data together from education, social care and health services in Cambridgeshire and Peterborough, and the necessary ethical permissions are in place.
Technical Summary
Artificial intelligence research initiatives are supported by the NHS but current practical barriers prevent researchers making use of the substantial datasets potentially available. The barriers are both technical (e.g. secure data federation) and legal (e.g. lack of an appropriate/acceptable governance model). We will address these issues by a) combining existing technologies to create a federated trusted research environment (TRE) based on the Five Safes principles and b) developing a governance package to support federated data analysis.
Our motivation is the need to improve the effectiveness of mental health services for young people, which, against a background of increasing demand, are overstretched. Mental health problems can manifest in ways that are hard to detect from the perspective of a single agency (e.g. health service, school, social service) but we hypothesise they will be apparent when combining data from these services.
There are distinct challenges in combining such data at scale. We propose to: 1) provide a technical demonstration of approaches for federation across Trusted Research Environments (TREs) as a model for cross-council digital research environments; 2) provide a Use Case that requires such cross section integration and 3) examine the unique governance issues that arise, co-creating an aligned governance model that is acceptable to public, patients and data contributors.
The technology demonstrator will combine the BBSRC-funded InterMine platform with federation technology from Bitfount within the AIMES TRE. InterMine makes use of automatic code generation to create, from the underlying data model, the required database infrastructure, including APIs and UIs, and is designed to enable flexible and high performance querying. These features are useful in an environment with complex and evolving metadata. It is easy to configure interfaces that allow flexible querying while constraining access to data: we do this to implement governance rules with access controls provided by Bitfount, as an example of best practice. Using APIs from multiple InterMine instances, Bitfount technology will demonstrate secure privacy-preserving federation of queries to address the Use Case, for cohort identification, as well as for vertical federated learning protocols across the TREs.
We will use synthetic data generated from real-world data dictionaries and have the necessary permission to do this. This will allow much of the work to be done outside controlled environments and will generate freely available non-sensitive datasets.
Importantly, we have the ethical approval to integrate the necessary data in the Cambridge & Peterborough region as part of the Cam-CHILD project and, with Turing funding, we will work with Birmingham and Essex to establish analogous ethical approval in preparation for a possible DARE Phase 2 bid. Involving three different localities (Cambridge, Essex, and Birmingham) will allow us to demonstrate the generalisability of the project outcomes to different TREs and databases.
Linking and exploiting data from such diverse sources presents unique challenges. The governance work we propose will undertake extensive engagement with the public, patients, practitioners, data controllers and legal experts to examine all aspects of the public acceptability and legal framework for supporting federated analysis across multiple TREs. This will produce freely available public communications documents and legal templates.
Our motivation is the need to improve the effectiveness of mental health services for young people, which, against a background of increasing demand, are overstretched. Mental health problems can manifest in ways that are hard to detect from the perspective of a single agency (e.g. health service, school, social service) but we hypothesise they will be apparent when combining data from these services.
There are distinct challenges in combining such data at scale. We propose to: 1) provide a technical demonstration of approaches for federation across Trusted Research Environments (TREs) as a model for cross-council digital research environments; 2) provide a Use Case that requires such cross section integration and 3) examine the unique governance issues that arise, co-creating an aligned governance model that is acceptable to public, patients and data contributors.
The technology demonstrator will combine the BBSRC-funded InterMine platform with federation technology from Bitfount within the AIMES TRE. InterMine makes use of automatic code generation to create, from the underlying data model, the required database infrastructure, including APIs and UIs, and is designed to enable flexible and high performance querying. These features are useful in an environment with complex and evolving metadata. It is easy to configure interfaces that allow flexible querying while constraining access to data: we do this to implement governance rules with access controls provided by Bitfount, as an example of best practice. Using APIs from multiple InterMine instances, Bitfount technology will demonstrate secure privacy-preserving federation of queries to address the Use Case, for cohort identification, as well as for vertical federated learning protocols across the TREs.
We will use synthetic data generated from real-world data dictionaries and have the necessary permission to do this. This will allow much of the work to be done outside controlled environments and will generate freely available non-sensitive datasets.
Importantly, we have the ethical approval to integrate the necessary data in the Cambridge & Peterborough region as part of the Cam-CHILD project and, with Turing funding, we will work with Birmingham and Essex to establish analogous ethical approval in preparation for a possible DARE Phase 2 bid. Involving three different localities (Cambridge, Essex, and Birmingham) will allow us to demonstrate the generalisability of the project outcomes to different TREs and databases.
Linking and exploiting data from such diverse sources presents unique challenges. The governance work we propose will undertake extensive engagement with the public, patients, practitioners, data controllers and legal experts to examine all aspects of the public acceptability and legal framework for supporting federated analysis across multiple TREs. This will produce freely available public communications documents and legal templates.
Organisations
- University of Cambridge (Lead Research Organisation)
- Kaleidoscope (Collaboration)
- Microsoft Research (Collaboration)
- National Institute for Health Research (Collaboration)
- UNIVERSITY OF CAMBRIDGE (Collaboration)
- BITFOUNT LTD (Collaboration)
- UNIVERSITY OF BIRMINGHAM (Collaboration)
- Cambridgeshire Community Services NHS Trust (Collaboration)
- Illumina Inc. (Collaboration)
- CAMBRIDGE UNIVERSITY HOSPITALS NHS FOUNDATION TRUST (Collaboration)
- Cambridgeshire County Council (Collaboration)
- Cambridgeshire and Peterborough NHS Foundation Trust (Collaboration)
- UNIVERSITY OF ESSEX (Collaboration)
- Anna Freud Centre (Collaboration)
Publications
Astle DE
(2023)
We need timely access to mental health data: implications of the Goldacre review.
in The lancet. Psychiatry
Cardinal RN
(2023)
De-identified Bayesian personal identity matching for privacy-preserving record linkage despite errors: development and validation.
in BMC medical informatics and decision making
Description | Presented work to shadow minister for innovation and technology |
Geographic Reach | National |
Policy Influence Type | Participation in a guidance/advisory committee |
Description | Timely: towards early identification of child and adolescent mental health problems |
Amount | £300,000 (GBP) |
Funding ID | T2-15 |
Organisation | Alan Turing Institute |
Sector | Academic/University |
Country | United Kingdom |
Start | 09/2021 |
End | 03/2022 |
Description | Towards early identification of child and adolescent mental health problems |
Amount | £297,000 (GBP) |
Funding ID | T2-15 |
Organisation | Alan Turing Institute |
Sector | Academic/University |
Country | United Kingdom |
Start | 09/2021 |
End | 06/2022 |
Description | Transforming child mental health: co-designing, building and evaluating a digitally enabled, personalised, prevention pathway |
Amount | £3,080,011 (GBP) |
Funding ID | MR/X034917/1 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 04/2024 |
End | 04/2031 |
Title | Building an infrastructure able to integrate health, education and social data relating to children for research purposes. |
Description | 1. we have developed a successful model to enable the governance and IG to be put in place to support multi-agency working. This is accompanied by a toolkit describing the steps to securing ethics for a research program. 2. Brought together two technologies to enable the build of a trusted research environment (TRE) including multi-agency data. 3. Created the data architecture to enable the build of a TRE including multi-agency childrens data |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2021 |
Provided To Others? | No |
Impact | The approach will be published in 2022. |
Title | Created federated informatics network for research purposes for paediatrics |
Description | integrated regional data and developed software to enable its safe access. Will be available for others in the future. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2022 |
Provided To Others? | No |
Impact | We have been adopted by the mental health mission to provide digital infrastructure to enable paeds research for the UK |
Title | CADRE |
Description | Linked database including paeds data from health, education and social care. Will be available to others in the future. |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | None yet, its a WIP. |
Title | Child mental health services database |
Description | The database is currently being finalised. It includes de-identified data relating to four years of patient level child & adoleascent MH services (CAMHS) data, relating to 20 sites. We are building this using InterMine - this is enabling us to translate a genetics informatics platform into one that can be used for NHS service data, |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | The database is in its final stages of being completed. However it is being used to enable a 20 site case control of the effectiveness of a new model of care for CAMHS (THRIVE). We are using the process to support the process of translating InterMine into an informatics platform for health services data, as part of the MRC grant. |
Title | Linking data relating to health, education and social care for all children in WALEs within the SAIL/ADP databank. |
Description | WE linked 17 databases relating to children in WALES for the first time. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | No |
Impact | We are able to carry out epidemiological research on this database, and build early identification models for child health. |
Description | Building capacity for federated AI for adolescent mental health. |
Organisation | University of Birmingham |
Department | School of Psychology Birmingham |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We are supporting Birmingham to build a TRE locally, using the methods developed by the MRC Timely grant. We will work together to federate our TREs, creating a mechanism for external validation of our early identification models. We are also expanding recruitment of our child and adolescent cohort (11-15y) to include birmingham, so this data will be included in our models as well. |
Collaborator Contribution | Birmingham supported us in drafting an application to the Turing Institute which was successful. We are currently drafting an HDRUK/UKRI application for the sprints. |
Impact | Successful application to Turing Foundation for £300,000 funding. Application to HDRUK/UKRI sprints Started recruitment of a cohort of adolescents into our genetic cohort for includion into the database. We are doign a lot of work on inequalitites and reducing these in datasets - to make them more representative. |
Start Year | 2021 |
Description | Collaboration to build capacity for federated AI - partnership with Essex University |
Organisation | University of Essex |
Department | Department of Psychology |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Collaboration to support Essex create a TRE based on the model created by Cam-CHILD during the MRC Engagement Award Collaboraiton with us on a successful grant |
Collaborator Contribution | Will create another TRE, enabling us to federate and buld capacity for adolescent MH research |
Impact | Successful application for funding to Turing Institution Application to the HDR UK/ UKRI DARE sprints |
Start Year | 2021 |
Description | Collaboration with charity to develop integrated data resource |
Organisation | Anna Freud Centre |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | AFC are partnering with the research team to build an informatics platform integrating health and social care data. This involves a workstream involving members of the public and service users to explore the acceptability of the use of electronic health and care records data. We are also collaborating to create the first general population cohort of children and adolescents for the NIHR BioResource. The PPI team is partially funded by the MRC Adolescent Engagement Award I hold. We provide a link into the research including communication and training. |
Collaborator Contribution | Helped to recruit a Young Champion Academic leadership of PPI workstream Schools team is helping to identify and liaise with schools to support recruitment. |
Impact | - active PPI group contributing to research - secured funding for study co-ordinator from NIHR BioResource - it is multidisciplinary (psychiatry, genetics, BRC, health services research, informatics, PPI) |
Start Year | 2020 |
Description | Illumina |
Organisation | Illumina Inc. |
Department | Illumina |
Country | United Kingdom |
Sector | Private |
PI Contribution | Contact with Illumina to discuss with them the value of research in child MH genetics. |
Collaborator Contribution | They are contributing £100k of whole genome sequencing. |
Impact | UKRI Future Leaers Fellowship |
Start Year | 2023 |
Description | Microsoft research |
Organisation | Microsoft Research |
Country | Global |
Sector | Private |
PI Contribution | Collaboration to design digital tools |
Collaborator Contribution | They are providing me with mentorsip, access to training for team and I, and direct input to project work. |
Impact | UKRI luture leadership fellowship |
Start Year | 2023 |
Description | Parternship with NIHR BioResource |
Organisation | National Institute for Health Research |
Department | National Institute for Health Research (NIHR) BioResource |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | I have been appointed Clinical Lead of the NIHR Children and Young People's BioResource. I am helping to establish a novel cohort of children and young people to join the BioResource. I have also secured the partnership of a leading national charity (Anna Freud Centre) to support this work. We will be taking a novel approach and recruiting children via schools. |
Collaborator Contribution | The aim of my work with the BioR is to include genetic data into the linked data resource we are currently building. We will work with them to determine how best to link this data to the linked platform, addressing information governance, technical, security and legal issues. |
Impact | Creation of a national Expert Working Group, partnership with a leading national charity, commissioning of a schools PPI group and young people's PPI group. |
Start Year | 2020 |
Description | Partnership with Bitfount - start up company that specialises in federated AI |
Organisation | Bitfount Ltd |
Country | United Kingdom |
Sector | Private |
PI Contribution | We have supported Bitfount to understand what is required to build a Trusted Research environment and how federated analytics is important. We have supported them to understand the 'five safes' of research data. They also have an additional clinical example to include in their portfolio. |
Collaborator Contribution | Bitfount will provide the capability to carry out privacy preserving federated analytics across a range of TREs. This is a critical functional requirement to enable the external validation of the early identification AI models we are building, as well as providing larger sample sizes. |
Impact | We have drafted an application to the UKRI/HDRUK DARE sprint program. |
Start Year | 2021 |
Description | Partnership with CPFT Mental Health trust to creating linked health & social care database |
Organisation | Cambridgeshire and Peterborough NHS Foundation Trust |
Country | United Kingdom |
Sector | Public |
PI Contribution | Provide academic input into PPI group, providing access to linked database. |
Collaborator Contribution | Access to CPFT data, secure data bank, support with PPI |
Impact | Publication on digital working Secured an MRC grant together |
Start Year | 2019 |
Description | Partnership with CUH acute hospital to create linked health and social care database |
Organisation | Cambridge University Hospitals NHS Foundation Trust |
Country | United Kingdom |
Sector | Public |
PI Contribution | Building a linked dataset enabling CUH to use its data Analysis of their children's A&E data to support novel pathways |
Collaborator Contribution | Access to EPIC data |
Impact | MRC grant secured |
Start Year | 2020 |
Description | Partnership with CUH acute hospital to create linked health and social care database |
Organisation | Cambridge University Hospitals NHS Foundation Trust |
Country | United Kingdom |
Sector | Public |
PI Contribution | Building a linked dataset enabling CUH to use its data Analysis of their children's A&E data to support novel pathways |
Collaborator Contribution | Access to EPIC data |
Impact | MRC grant secured |
Start Year | 2020 |
Description | Partnership with Community Health Services to creat linked database |
Organisation | Cambridgeshire Community Services NHS Trust |
Country | United Kingdom |
Sector | Public |
PI Contribution | We have provided them with training to de-identify their data using a validated software (CRATE). We have provided facilitated workshops to support the identification of hte data that is required. |
Collaborator Contribution | The clinical team and informatics teams are working with us to develop a linked dataset. This has included the clinical lead and informatics leads working closely to: map the databases, identify the datasets that we require, they are undertaking training to enable them to de-identify the data locally, the data will be transferred to us periodically. They are contributing to the work to develop the live linked database. |
Impact | I will be completing a clinical training post inthe service as a direct result of this collaboration. I also hope to build a clinical service in theri organisation as a direct result of this work. |
Start Year | 2019 |
Description | Partnership with department of Genetics |
Organisation | University of Cambridge |
Department | Department of Genetics |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We have secured a grant that has led to funding for the department. Providing education in health services structure and related informatics. |
Collaborator Contribution | They are providing access to a Wellcome Trust Funded informatics platform that we are adapting for use with healthcare data, |
Impact | We are building the cambridge child health informatics and linked data platform (Cam-CHILD). |
Start Year | 2020 |
Description | Partnership with leading givernance and data security consultancy |
Organisation | Kaleidoscope |
Country | United Kingdom |
Sector | Private |
PI Contribution | We have built a partnership providing the consultancy with a novel challenge and collaboration with Uni of Cambridge to solve some of the most challenging data IG problems - the access, sharing, linkage and use of sensitive children's data for research purposes. |
Collaborator Contribution | They are supporting us as we work with partners to develop a suitable IG model. |
Impact | Application to HDRUK/UKRI DARE sprints Data flow diagrams and we are working towards developing a governance model |
Start Year | 2021 |
Description | Partnership with local authority to create linked database |
Organisation | Cambridgeshire County Council |
Department | Public Health Service; Cambridgeshire County Council |
Country | United Kingdom |
Sector | Public |
PI Contribution | We have provided training and support to develop a method of mapping out data required for the linked dataset. |
Collaborator Contribution | - service, IT and information systems leads are working with us to map out the data requirements for the database. - will pseudonymise data - will extract data for the database and update this periodically - will contribute to governance of subsequent dataset |
Impact | We have submitted an NIHR application to the 'Unlocking Local Authority Data' call |
Start Year | 2019 |
Description | Partnership with the Department of Engineering |
Organisation | University of Cambridge |
Department | Department of Engineering |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We have led the applicaiton of the 'Systems Thinking' approach to the development of early identification tools for child MH. |
Collaborator Contribution | Senior Academic attends all meetings and is leading a workstream on how best to take a systems approach to early identification in adolescent MH. |
Impact | Submitted and secured an MRC adolescent engagement award. |
Start Year | 2020 |
Title | Federated trusted research environment for linked data |
Description | This is a trusted research environment that can securely host multiagency data and make it available for research purposes. It is able to federate with other TREs housing similar data to carry out privacy preserving federated analytics. |
Type Of Technology | New/Improved Technique/Technology |
Year Produced | 2022 |
Impact | It will be used as part of the cambridge children's hospital informatics research infrastructure. |
Description | BBC news coverage for fellowship |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | When the award was announced it got interest from the BBC, and it was featured in a national article, as well as on the regional news. |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.bbc.co.uk/news/uk-england-cambridgeshire-67624048 |
Description | Presentation about MH & genomics for the Cambridge Children's and Illumina meeting about future direction |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | Presented to Illumina & CCH the role of genetics in child MH, and opportunities for use of data in digital early identification tools. |
Year(s) Of Engagement Activity | 2022 |
Description | Presentation at Cambridge Children's Hospital Digital Board |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | We presented on how linked data will be used within the children's hospital, and the role of the infrastructure we built with the grants in the research unit. We also influenced the development of their digital strategy. |
Year(s) Of Engagement Activity | 2022 |
Description | Presentation to Lucy Chappell about digital work taking place in cambridge |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | Presentation to Lucy Chappell about the issues we face in informatics and digital research, what cambridge uni is doing and what we feel are the key issues that currently need to be addressed to advance the field. |
Year(s) Of Engagement Activity | 2023 |
Description | Public engagement with DARE UK sprint program |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | We participated in the DARE UK Sprint launch public launch and presented the aim and purpose of our project to the public and other audiences. |
Year(s) Of Engagement Activity | 2022 |
Description | Recruitment of 200 members of the public to participate in out community of interest, contributing to supporting child health research |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | we used social media and support from charities to recruit over 200 members of the public willing to participate in PPI activities relating to children health research. |
Year(s) Of Engagement Activity | 2022,2023 |
Description | School network presentation |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Schools |
Results and Impact | We attended the Birmingham local authority school wellbeing network to present on the NIHR Young People's BioResource and encourage schools to participate. |
Year(s) Of Engagement Activity | 2022 |
Description | TikTik video presenting the outcomes of patient and public involvement work with parents and young people about the use of linked data and AI for MH |
Form Of Engagement Activity | A broadcast e.g. TV/radio/film/podcast (other than news/press) |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Tiktok video presented findings of the PPI process - acceptability of using linked data, what IG should be put in place and the recommendations for communications with the public. |
Year(s) Of Engagement Activity | 2023 |