UNICORN (Unified Cohorts Research Network): Disaggregating asthma

Lead Research Organisation: Imperial College London
Department Name: National Heart and Lung Institute

Abstract

Asthma and allergies are the most common chronic diseases in childhood and adolescence. They usually start before school age and are responsible for a heavy burden of ill health, including premature death. It is increasingly recognised that asthma is an umbrella term covering several different diseases, which creates a barrier to delivering personalised treatments (i.e., treatments tailored to the individual patient). We propose an innovative scientific research program (UNICORN, Unified Cohorts Research Network), which embraces a team-science approach to understand heterogeneity of asthma and allergies.

We have many ways of researching illnesses: (1) studies of children from their birth (birth cohorts); (2) studies of patients with severe disease; (3) randomised controlled trials (RCTs, where patients are allocated by chance to receive one of several treatments). We propose that we can begin to understand the variation seen in common diseases such as asthma and allergies if we look at all these together. This will help us to predict who will respond best to different treatments. Such a collaborative approach is currently prevented by the lack of a system to jointly manage and analyse the data from different studies.

We will form an alliance between the STELAR consortium of 5 UK birth cohort studies aimed at studying asthma and allergic diseases (in total more than 15,000 participants who have been followed from before birth to adulthood) and clinical studies which recruited large numbers of patients with severe asthma (more than 1000). Our birth cohorts measured environmental exposures before the onset of the disease and contain detailed information on the development of asthma and allergies from early childhood to adulthood. In UNICORN, these will be supplemented by the information collected in studies of patients with severe asthma. These studies measure additional clinical markers (for example, more detailed lung function), and collect biological samples which are not available in birth cohorts (such as sputum, nasal secretions, and airway biopsies). These samples are needed to understand the mechanisms underlying different types of asthma. RCTs provide further important and accurate information about responses to treatment. Thus, birth cohorts, patient cohorts, and RCTs are complementary, and combining them by linking the data appropriately will provide invaluable insights into the mechanisms of different asthma subtypes, markers to predict future risk, and individual responses to treatment.

UNICORN builds on substantial prior investments in the science and infrastructure underpinning asthma research. We will pull together and build upon several earlier investments in data management platforms and in tools that have been created to help data harmonisation and joint analysis. In Workstream 1, we will develop efficient software solutions to integrate, manage, harmonise and analyse different types of studies together. Combining detailed research observations in cohort studies, with less thorough, but more frequent, information from routine clinical records, holds huge potential. In Workstream 2, we will enrich detailed information collected from before birth to early adulthood in STELAR birth cohorts with data from primary care and hospital records. Our programme of work will create conditions that enable collaborative research. The shared digital environment will provide our team of scientists with tools to efficiently analyse existing and newly collected data and help interpretation of findings, and rapid implementation for patient benefit. In Workstream 3, we will use asthma as exemplar to develop and apply methods to jointly analyse data from different settings.

Our findings will underpin new trials of asthma and allergy prevention and treatment, personalised for specific subtypes, and may help identify novel targets for the discovery of subtype-specific treatments required for personalised medicine.

Technical Summary

The development of new methodologies for improving causal inference in epidemiological studies creates an opportunity for a step change in understanding mechanisms underlying asthma development. We propose that the best way to scale up research in asthma is to integrate unselected birth cohorts with patient cohorts and randomised controlled trials (RCTs) for joint analyses, as these different settings provide complementary windows on distinct aspects of understanding disease aetiology. UNICORN will form an alliance between the STELAR consortium of 5 birth cohorts aimed at studying asthma and allergies (in total >15000 participants), and patient cohorts with large numbers of carefully phenotyped patients with asthma (Breathing Together consortium, U-BIOPRED, RBH Severe Asthma cohort). In Workstream 1, we will build on earlier investments (eLab, tranSMART/eTRIX) to develop efficient scalable informatics solutions enabling integration, management, harmonisation and secure co-analysis of birth cohorts, patient cohorts, and RCTs. The development of an integrated data management and analysis platform at the heart of the UNICORN research engine will be a unique resource for the UK health science and will provide a template for implementation in other complex non-communicable disease areas where data integration provides the only realistic prospect of solving the complex and heterogeneous biology of these conditions. Workstream 2 will extend the detailed information collected from ante-natal period to adulthood in STELAR cohorts, with a routinely acquired data in primary care and hospital records, facilitating more sophisticated analyses. In Workstream 3, in an iterative discovery process, we will capitalise on a unique combination of expertise, well characterised birth and patient cohorts, and our novel research engine to promote the discovery of asthma endotypes, and identify and understand mechanisms underpinning such endotypes, thereby advancing stratified medicine.

Planned Impact

Who might benefit from this research?

We will develop the capacity to handle increasing quantities of complex data across different types of studies and ensure that these rich and unique data sets are used to their maximum potential. The project will multiply the effects of previous investments, thereby having an overall scientific impact much greater than its level of requested funding. We will provide an 'engine' for large scale transdisciplinary collaborations to conduct cutting edge science, using existing data resources, to produce health benefit for the UK population, and broader.

Our approach will support greater reproducibility and transparency of research, and enable researchers to explore the sets of variables and analytical methods selected by their colleagues in other disciplines. There are clear and immediate economic benefits in allowing researchers to build on the methods and expertise of others.

Our results will identify risk factors and mechanisms that influence the onset and progression of asthma-related diseases from infancy to early adulthood, and associated adverse lung function trajectories. The discovery of mechanisms underpinning different asthma endotypes may form the basis for identification of novel therapeutic targets, and biomarkers which are predictive of health or disease, or the response to treatment. This will be of great value to patients, society, health-care professionals and industry.

The impact of the programme will include conceptual, methodological and analytical contributions towards data integration and their efficient exploitation. It can lead to advances in artificial intelligence and machine learning, with widespread applications for technology companies (see letter of support from Prof C Bishop, Director of Microsoft Research Cambridge). UNICORN's science-focused approach, nurturing innovation in computational epidemiology while advancing asthma research, represents pull-through of data science from leading, cognate biomedical research, and complements infrastructural approaches such as HDRUK.

How might they benefit from this research?

The ability to access shared analysis resources will be of great value for training and development of researchers, and the ability to access example analyses and expert advice will reduce their learning curve. Enabling the networking of datasets, expertise and methods for data preparation and analysis can help drive greater value from existing investments.

The outputs from our research will support UK Industrial Strategy by enabling Pharmaceutical and Biotechnology companies to identify novel therapeutic targets, which are endotype-specific. Diagnostic companies may develop biomarkers and/or algorithms which can be used as tools to assess future risk, and the response to currently available or novel treatments. Such biomarker and associated algorithms could form the basis of diagnostic or prognostic tests, which may be used to make informed life-style choices to prevent or attenuate disease development, and impact long-term health. The discovered biomarkers may also be used to stratify patients in clinical trials, and facilitate the selection of the most appropriate therapies in a stratified manner.

Technology companies will be well-placed to abstract the underpinning methodology, and help MRC translate it to other domains of research, de-risking the investment for MRC, and enhancing methodological advances for industry.

Our findings may represent potentially valuable intellectual property, which we will seek to commercialise in collaboration with companies invested in diagnostics and/or therapeutics. Participating universities have mechanisms and structures in place for exploring industrial applications. Partnerships such as the one described in this application help to make the UK an attractive location to retain research activities, and help expose academics to the process of translating science into products.

Publications

10 25 50
publication icon
Akar-Ghibril N (2020) Allergic Endotypes and Phenotypes of Asthma. in The journal of allergy and clinical immunology. In practice

publication icon
Custovic A (2020) Atopic phenotypes and their implication in the atopic march. in Expert review of clinical immunology

publication icon
Custovic A (2020) "Asthma" or "Asthma Spectrum Disorder"? in The journal of allergy and clinical immunology. In practice

publication icon
Custovic A (2022) Considering biomarkers in asthma disease severity. in The Journal of allergy and clinical immunology

publication icon
Deliu M (2020) Longitudinal trajectories of severe wheeze exacerbations from infancy to school age and their association with early-life risk factors and late asthma outcomes. in Clinical and experimental allergy : journal of the British Society for Allergy and Clinical Immunology

publication icon
Fontanella S (2021) Machine learning in asthma research: moving toward a more integrated approach. in Expert review of respiratory medicine

publication icon
Frainay C (2021) Atopic dermatitis or eczema? Consequences of ambiguity in disease name for biomedical literature mining. in Clinical and experimental allergy : journal of the British Society for Allergy and Clinical Immunology

 
Description ERS/EAACI Statement on severe exacerbations in asthma
Geographic Reach Europe 
Policy Influence Type Membership of a guideline committee
 
Description European Respiratory Society/American Thoracic Society Guideline on Management of Severe Asthma
Geographic Reach Multiple continents/international 
Policy Influence Type Membership of a guideline committee
Impact Improvements in clinical service delivery
URL https://www.ersnet.org/news-and-features/news/latest-ers-ats-severe-asthma-guidelines-now-available/
 
Description Pediatric Asthma in Real Life (PeARL) Think Tank
Geographic Reach Multiple continents/international 
Policy Influence Type Influenced training of practitioners or researchers
Impact Identification of Research Priorities in Pediatric Asthma
 
Description Automated evaluation and prediction of eczema severity by machine learning methods
Amount £79,264 (GBP)
Organisation Imperial Innovations 
Sector Private
Country United Kingdom
Start 03/2021 
End 04/2022
 
Description Automated evaluation and prediction of eczema severity by machine learning methods
Amount £79,264 (GBP)
Organisation Imperial College London 
Sector Academic/University
Country United Kingdom
Start 03/2021 
End 03/2022
 
Description Remote monitoring to predict and prevent asthma attacks in preschool children
Amount £500,000 (GBP)
Funding ID EP/W002280/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2022 
End 12/2025
 
Title UNICORN Data FAIRification Approach 
Description A two-staged process where first UNICORN sourced data is consolidated into semantically annotated FAIR datasets as the target state for UNICORN data. Mapper files are created for each data source against which the data gets FAIRified by software. These FAIR datasets are Interoperable, and richly annotated datasets that allow future users to discover and re-use for different purposes. The second step is to integrate and harmonise data across these semantically annotated FAIR datasets and load them into an integrated data model that allows cross-study data exploration and analysis. 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? Yes  
Impact The shift towards FAIRification of UNICORN 
 
Title UNICORN FAIR Data Platform 
Description The ICL-DSI UNICORN Data Repository (now the UNICORN FAIR Data Platform) was designed and developed as a full-stack web application with a server-based (back-end) application exposing an API layer that communicates with a client-based (front-end) application. The back-end is a .NET WebAPI application designed according to the multi-layered onion architecture . The front-end comprises a web application based on an angular framework providing end user accessibility to the application. 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? Yes  
Impact Research data management according to the FAIR (Findability, Accessibility, Interoperability, and Reusability) data principles is a data-science-driven data management which aims to enable efficient and error-free data analysis from multiple sources. Since the initiation of the FAIR principle in 2016, FAIR metrics, FAIR infrastructure, and FAIR tools have been developed to aid in making data FAIR ("FAIRification" process). Importantly, data management according to the FAIR principles is becoming expectation of the major funding bodies and publishers. The FAIR approach to data management means that research data is well described, preserved, and enabled for long term use and re-purposing. One of the key advantages of FAIR data is a major increase in reusability beyond the first and original purpose. 
 
Title UNICORN eLab 
Description The UNICORN eLab has been established as part of this project. This involved developing a new FHIR database that is used to manage the STELAR FHIR data. This has allowed the migration of data from a proprietary system to one based on open standards that are strongly supported by an international community. The pre-existing STELAR eLab has been re-architected to offer significant improvements. 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? Yes  
Impact Ease of deployment, upgrades and maintenance Extensibility Auditability Confidentiality/ Security Availability 
 
Title UNICORN's IT infrastructure 
Description At the start of the project, the two teams at ICL-DSI and Manchester worked together on the overall design and architecture for UNICORN's IT infrastructure. This involved the design of the FHIR-compliant eLab, the UNICORN Integrated Data Repository (IDR) and the ETL pipelines to manage the transfer of data and subsequently its integration and harmonisation. The Integrated Data Repository has components that consist of a central data warehouse and those that feed data into the central system. The UNCORN eLab is used to ingest and then manage the STELAR data in FHIR format, and to provide access to these data to the central IDR. 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? Yes  
Impact The informatics solutions allowing the management, integration and harmonisation of UNICORN's heterogenous data sources 
 
Title D11 ISO11179 compliant MDR web application for metadata management 
Description A sub-module in the UNICORN Data Platform to store and serve the standard dataset templates that different UNICORN datasets are mapped and transformed into. Shifting to the FAIRification of datasets meant we had to manage dataset templates as a whole and not individually managed Common Data Elements, which would need a Metadata Data Registry to store and manager them. Therefore, we implemented this feature into the Metadata Governance Module, to store the standard dataset templates and a user interface that enables UNICORN data manager to associate the various datasets imported into the platform with their respective dataset templates for validation and quality checking. 
Type Of Material Data handling & control 
Year Produced 2022 
Provided To Others? Yes  
Impact Enables UNICORN data manager to associate the various datasets imported into the platform with their respective dataset templates for validation and quality checking. 
 
Title UNICORN dataset governance module 
Description This is a 'metadata' governance module that allows data managers to manage UNICORN's data against the metadata specifications 
Type Of Material Data handling & control 
Year Produced 2022 
Provided To Others? Yes  
Impact Information about clinical assessments, biomarker assays and study are entered into the database using the Data Governance Module user interface. For each assessment the data manager associates a standard dataset template, that is relevant to the data generate from this assessment as decided during the specification stage. 
 
Description CADSET: Chronic Airway DiSeases Early sTratification 
Organisation European Respiratory Society (ERS)
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution UNICORN is an integral part of the CADSET, a pan-European network committed to promoting clinical research in chronic airway diseases. The overarching working hypothesis of CADSET is that Asthma and Chronic Obstructive Pulmonary Disease (COPD) represent a continuum of heterogeneous chronic airway diseases that share clinical, functional, imaging and/or biological mechanisms (i.e endotypes), that can be identified by appropriately validated biomarkers, which may constitute novel therapeutic targets. Multi-level (clinical, functional, imaging and molecular) profiling of well-characterized patients with chronic airways disease, spanning the spectrum of asthma and COPD, that considers both peak lung function achieved in early adulthood and the rate of lung function decline, may lead to the identification of distinct endotypes (and appropriate biomarkers) which may, in turn, inform a mechanism-based disease classification and a more personalized treatment of patients with chronic airways diseases.
Collaborator Contribution Scientific board. Joint analyses Joint publications
Impact Spirometric phenotypes from early childhood to young adulthood: a Chronic Airway Disease Early Stratification study. Wang G, Hallberg J, Charalampopoulos D, Sanahuja MC, Breyer-Kohansal R, Langhammer A, Granell R, Vonk JM, Mian A, Olvera N, Laustsen LM, Rönmark E, Abellan A, Agusti A, Arshad SH, Bergström A, Boezen HM, Breyer MK, Burghuber O, Bolund AC, Custovic A, Devereux G, Donaldson GC, Duijts L, Esplugues A, Faner R, Ballester F, Garcia-Aymerich J, Gehring U, Haider S, Hartl S, Backman H, Holloway JW, Koppelman GH, Lertxundi A, Holmen TL, Lowe L, Mensink-Bout SM, Murray CS, Roberts G, Hedman L, Schlünssen V, Sigsgaard T, Simpson A, Sunyer J, Torrent M, Turner S, Van den Berge M, Vermeulen RCH, Vikjord SAA, Wedzicha JA, Maitland van der Zee AH, Melén E. ERJ Open Res. 2021 Dec 6;7(4):00457-2021. doi: 10.1183/23120541.00457-2021. eCollection 2021 Oct. PMID: 34881328
Start Year 2021
 
Description Drakenstein Child Health Study 
Organisation University of Cape Town
Country South Africa 
Sector Academic/University 
PI Contribution Joint MRC grant to investigate lung function trajectories in African children from birth to 8 years of age, and to identify early-life risk factors associated with low lung function trajectories.
Collaborator Contribution Collection of data in DCHS
Impact Ongoing surveillance for LRTI has occurred with very accurate measurements of LRTI incidence and aetiology. The incidence and aetiology of LRTI in early childhood have been carefully measured, with well-established surveillance systems for LRTI detection. Intensive microbiological investigation (33-plex PCR for viruses and bacteria) is done at each LRTI
Start Year 2020
 
Description The Children's Respiratory and Environmental Workgroup (CREW) birth cohort consortium 
Organisation University of Wisconsin-Madison
Country United States 
Sector Academic/University 
PI Contribution Provision of Asthma eLab as an open source software to our US collaborators.
Collaborator Contribution Upgraded eLab (to FIHR standard) made available to the UNICORN consortium
Impact Joint GWASs currently under way
Start Year 2022
 
Title FIHR eLab 
Description The HL7 FHIR enabled eLab has been extended to support the UNICORN project and deployed for production use at: https://unicorn.eLabhub.org 
Type Of Technology Webtool/Application 
Year Produced 2022 
Open Source License? Yes  
Impact The UNICORN eLab has been established as part of this project. This involved developing a new FHIR database that is used to manage the STELAR FHIR data. This has allowed the migration of data from a proprietary system to one based on open standards that are strongly supported by an international community. The pre-existing STELAR eLab has been re-architected to offer significant improvements. Developer Operations tools including Docker and Ansible have been employed to ensure that the deployment, upgrade and maintenance of the UNICORN eLab can be automated. All the eLab components run as Docker Swarm services and are based on images curated by the University of Manchester. Extensibility Services are included to allow additional tools to be added to the system. Two key areas considered were support for single sign-on and an extensible user interface that allows for a seamless experience when switching between tools. The eLab now offers single sign-on through support for OpenID Connect. The graphical interface has been developed to be dashboard based, with dashboard components being customisable to accommodate additional tools. Auditability Information security policies require IT systems to capture information about various events, such as user accounts being created, or users logging in to a system. The eLab includes Elasticsearch, Kibana and Logstash services that are used to capture, process, monitor and visualise system information, including log file content. Confidentiality/ Security The eLab is used to manage anonymised health data and confidentiality must be maintained. The system has been developed to ensure that only specific users are able to access the system from specific computers. Two factor authentication is employed to authenticate users and IP filtering ensures only specific machines can be used. A flexible role-based permissions model has been employed to restrict access to specific datasets. Availability The eLab automates a backup process for all the different system components, such as databases and file stores. Backup files are stored on a replicated remote file system to ensure we can implement disaster recovery in the event of a serious issue. 
 
Title UNICORN integrated data repository (IDR) 
Description The UNICORN FAIR data platform was designed to support two models for data storage: a dataset-based model suitable for storing UNICORN FAIRified datasets, and an observation-based model suitable for storing integrated data based on the Biomedical Observation semantic model. Based on these two models the team at the ICL-DSI implemented a two-schema storage solution that takes away the inefficiencies of moving data between different database implementations, thus removing the need for a separate ETL process. These two storage solutions are: the UNICORN FAIR Dataset Repository to support long term data discovery and re-use of UNICORN datasets and (2) the UNICORN Integrated Data Commons to support querying and exploring cross-study data. The current version includes limited user interface features as the development focused firstly on the development of the two database solutions and the ETL pipelines as laid out in the architecture. A development instance is currently running at ICL-DSI cloud infrastructure. Future releases will include searching capabilities and visual data exploration as more UNICORN data becomes ready to import into the platform. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Open Source License? Yes  
Impact Information about clinical assessments, biomarker assays and study are entered into the database using the Data Governance Module user interface 
 
Description Country Ambassador for the United Kingdom, European Centre for Allergy Research Foundation (ECARF) 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact The mission of the non-profit European Centre for Allergy Research Foundation (ECARF) is to ensure that people with allergies receive the best possible guidance in everyday matters and treatment options. Since allergies are a very complex subject, we have made it our mission to provide all those whose lives are affected by allergies - parents, children, educators and caregivers - with the information they need. Our aim is to build specific knowledge, eliminate any doubts, and enable allergy sufferers to take charge and lead active lives.
Year(s) Of Engagement Activity 2015,2016,2017,2018,2019,2020,2021
URL https://www.ecarf.org/en/