Unified probabilistic latent variable modelling strategies to accelerate endotype discovery in longitudinal studies

Lead Research Organisation: Imperial College London
Department Name: Dept of Medicine

Abstract

The past 5 years has seen an exponential increase in the amount of clinical, genetic and biological data that has become available. However, this data explosion has not been accompanied by a parallel capacity to analyse such data and understand how this data can be translated into solutions for curing diseases. We therefore need to explore statistical models which allow us to identify meaningful patterns in this 'big data' so that we can understand the nature of human diseases. Recent news headlines have identified the need for a "global medical revolution" to rethink the way we understand disease so that we move towards more personalised medicine strategies which incorporate knowledge from a patient's genetic and biological profile. This is an exciting era of medical research where we are moving towards using statistical modelling techniques to translate scientific understanding of human genetics and immunology into a better understanding of disease aetiology and as a consequence improve the health of individuals within the global community. To achieve this requires a truly multi-disciplinary approach to combining clinical knowledge and mathematical-computational expertise so that we can understand the underlying causes of different diseases which will allow us to improve our strategies for identifying the correct treatment.

The abundance of genetic and biological information has led to the recognition that there needs to be a reclassification of diseases as we understand them today. One such disease is asthma. Asthma has featured in the recent news as is the most common chronic disease of childhood which causes many hospital admissions each year and many preventable deaths. The number of children suffering from asthma has increased dramatically over the past 30 years. It is unclear why some people get asthma and others do not. Many factors in the environment may contribute to the development of asthma (for example diet, immunizations, antibiotics, pets and tobacco smoke) but we don't know how to modify the environment to reduce the risks. One reason for the difficulty in understanding causes of asthma is that what we once thought to be a single condition may be a collection of different diseases which cause similar symptoms. A promising approach to investigating these different "asthmas" and allergic diseases is the use of statistical machine learning which allows us to create models which recognise patterns of how disease changes over time. Since most asthma is initiated from childhood, the data I would use for this project looks at various clinical and biological measurements from children starting at birth. These children are in different parts of the UK and data is collected from them approximately every 2-3 years to see how their patterns change over time and whether differences in patterns among children who would generally be classified as "asthmatics" allow us to identify different asthma subtypes or "endotypes". Building models which are appropriately complex for endotype discovery allows us to incorporate more complex datasets including molecular and biological data which tell us how these children respond to viruses and bacteria over time. The complexity of the data and these models needs to reflect the complexity of the diseases. If we construct optimal models which are able to capture and predict change over time, these techniques could also be applied on a global scale that can be used to better identify children that would benefit from different treatment regimens. It is hoped that more refined endotype discovery will lead to a better understanding of the causes and nature of disease and therefore lead to more targeted treatment and management strategies. This approach of using statistical science to inform our understanding of disease by incorporating a large scale of data which is collected over time is applicable not just to asthma and allergy, but is generalizable to other diseases.

Technical Summary

Aim: To determine a unified probabilistic latent variable modelling strategy for integrating immunological, molecular, biological and clinical phenotypes for endotype discovery in longitudinal studies.

Objectives:
1. To build a unified graphical model that represents a broad range of important variables associated with asthma and allergic disease.
2. To use innovative computational statistical latent variable modelling methods to discover novel subtypes of childhood asthma and allergy across multiple cohorts.
3. To extend these graphical models to generate novel insights into a systems modelling framework that is able to upscale in order to integrate clinical data, immunological data, genetic data and epigenetic data.
5. To extend this unified graphical modelling framework to explore principled methods for using probabilistic latent variable models to deal with missing variables in the context where not all variables are available at every time-point.
6. To formally assess the strengths and weakness of Bayesian and Frequentist methods within a longitudinal birth cohort setting.

Methods: The proposal will use data from the STELAR consortium for clinical endotype discovery and data from MAAS for extending graphical models to a systems biology framework. Current Bayesian machine learning and classical-statistical probabilistic modelling approaches will be extended to identify latent disease subtypes. Dimensionality reduction techniques will be developed and explored to understand the latent space which best describes high-dimensional clinical and immunological data - considering viral and bacterial stimuli-cytokine responses separately.

Scientific and Medical Opportunities: It is hoped that more refined endotype discovery will lead to understanding their underlying biological mechanisms and therefore lead to more targeted treatment and management strategies. This is applicable not just to asthma and allergy, but is generalizable to other diseases.

Planned Impact

This research will have a direct impact on the scientific community by evaluating novel asthma endotypes or pathways. By understanding these, it is hoped to gain an understanding of the underlying biological, molecular, immunological and genetic mechanisms which need to be targeted in order to better control asthma and allergic disease.
Patients, health professionals, taxpayers, policy makers and UK businesses will benefit from the applications of this research which is well aligned to the MRC's Strategic plan to deliver research that changes lives. By understanding the underlying mechanisms of different asthma and allergy endotypes, this would lead to identifying potential groups of patients who may respond to different asthma and allergy management strategies.

The Pharmaceutical industry would be encouraged towards more targeted medication strategies and better stratification included in randomised controlled trials. A unified statistical graphical modelling framework may be applied to research into other disease areas as well in attempt to move towards stratified treatment.
This will hopefully lead to new approaches to medicine which, as well as benefiting the groups identified, has a global impact.

Understanding the existence of distinct subtypes of disease in this project is aligned with the Department of Health's NHS Outcomes Strategy for COPD and Asthma (2012) to help to "provide proactive chronic disease management appropriate for the severity level assessed - mild, moderate or severe"

This project is also directly aligned to the UK Trade and Investment which outlines the UK's commitment to stratified medicine. It states that "this smarter model of medicine, in which tools are used to stratify cohorts of patients by subclass of disease or the likelihood of responding to a particular therapy, intervention, or disease management strategy. A more stratified approach to medicine has the potential to increase patient benefit and at the same time unlock business and economic benefits. Effective development and delivery of stratified medicine will require collaboration across sectors. There is scope for innovators in drug discovery, research tools, diagnostics, devices, informatics, clinical decision making and health systems to pull together in this refocused approach to medicine."

This project is an extension of the government's commitment to scientific research through its capital and infrastructure investments from 2015 highlighted in the HM Treasury's document "Investing in Britain's Future". This document outlines the government's commitment to "ensuring that UK science and research stays at the cutting edge of developments and ensure that the UK is a major competitor in the global race. Investment in science and research capital and infrastructure projects will be made on the basis of scientific excellence so there are no further details to announce at the current time." Two of the "Great technologies" highlighted in this document were "Big Data" and "Regenerative Medicine" of which this proposal forms a part.

At a higher level, the multidisciplinary nature of this work is aligned to the WHO report 'Working together for Health' which states that '.....working alone with no regular exchanges of experience for mutual improvement can no longer be considered professionally satisfactory'. Working in a team enables the professions to solve 'complex health problems that cannot be adequately dealt with by one profession alone'. In order to understand disease aetiology, investment is needed in young researchers who will have the skills to train others in seeking solutions to healthcare problems.

As stated in the communications plan, dissemination of the research within this proposal through public engagement, participation in conferences and publication will have a significant impact on all levels of society: the local community, academic interdisciplinary research and the private sector.

Publications

10 25 50

publication icon
Belgrave D (2017) Disaggregating asthma: Big investigation versus big data. in The Journal of allergy and clinical immunology

publication icon
Belgrave D (2016) The importance of being earnest in epidemiology. in Acta paediatrica (Oslo, Norway : 1992)

publication icon
Belgrave D. C. M. (2017) Latent Profile Analysis To Identify Heterogeneous Subgroups Of Lung Function For Personalised And Targeted Early Intervention in AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE

publication icon
Belgrave DC (2015) Atopic Dermatitis and Respiratory Allergy: What is the Link. in Current dermatology reports

publication icon
Deliu M (2017) Asthma phenotypes in childhood. in Expert review of clinical immunology

publication icon
Deliu M (2018) Features of asthma which provide meaningful insights for understanding the disease heterogeneity. in Clinical and experimental allergy : journal of the British Society for Allergy and Clinical Immunology

publication icon
Holt PG (2016) Distinguishing benign from pathologic TH2 immunity in atopic children. in The Journal of allergy and clinical immunology

 
Description Invited to Speak at Course on Predictive Modelling in Mental Health
Geographic Reach Europe 
Policy Influence Type Influenced training of practitioners or researchers
Impact More robust statistical methodology which has been translated from my current research within asthma and allergy to other important areas for improving the quality of healthcare, which is the area of mental health
 
Description European Academy of Allergy and Clinical Immunology Travel Award
Amount € 500 (EUR)
Organisation European Academy of Allergy and Clinical Immunology (EAACI) 
Sector Learned Society
Country European Union (EU)
Start 06/2016 
End 06/2016
 
Description Women in Machine Learning
Amount $900 (USD)
Organisation Women In Machine Learning 
Start 12/2016 
End 12/2016
 
Title Contribution to the development of the Asthma eLab 
Description Giving a statistical-analytical perspective to the Stelar Asthma eLab. Also, as part of my fellowship I am fostering international collaborations with the US and Australia to work on research ethics and governance to obtain data from different geographical locations to ensure study replication not just on a national but international scale 
Type Of Material Data handling & control 
Year Produced 2015 
Provided To Others? Yes  
Impact Researchers are able to replicate results in other study populations and statistical power is increased 
 
Title Development of statistical methodology for merging and analysing data from different geographical locations 
Description This work presents several statistical and data management challenges. With this research, we are able to up-scale current models by incorporating data from different cohorts on a local, national and international scale. Such a process is essential if we are to refine current tools for disaggregating complex phenotypes, and find optimal models to describe the underlying latent structure in the data which will enable the discovery of clinically meaningful disease subgroups. The identification of such subgroups should lead to the novel therapeutic target discovery and enable a genuinely personalized approach to disease management. This process, rather than being hypothesis-driven, is an iterative hypothesi-sgenerating approach which is informed by clinical knowledge. 
Type Of Material Data analysis technique 
Year Produced 2015 
Provided To Others? Yes  
Impact More data sharing and communication between researchers with similar data for a greater understanding of the aetiology of asthma and allergic diseases 
 
Description Causal Modelling framework and Dimensionality Reduction Techniques to understand developmental profiles of microbiome data in infants and children 
Organisation University of Melbourne
Department Centre of Systems Genomics
Country Australia 
Sector Academic/University 
PI Contribution Investigating a causal modelling framework as a novel approach for understanding how the microbiome profile may affect the subsequent development of asthma and allergic diseases and working on validating these models with the partner's data Sharing ideas on latent variable modelling framework for understanding microbiome profiles
Collaborator Contribution Understanding of the latest techniques for analysis of microbiome data The partners hosted my research for a period of 3 months where I was a Visiting Researcher at the University of Melbourne with full access to their facilities
Impact This project is still work in progress and will lead to publication
Start Year 2016
 
Description Collaboration with Study Team for Early Life Asthma Research (STELAR), a consortium of birth cohorts across the UK 
Organisation Study Team for Early Life Asthma Research (STELAR)
Country United Kingdom 
Sector Academic/University 
PI Contribution The aims of this project are to develop a web-based Asthma e-Lab which combines rich phenotypic data across these birth cohorts and to develop innovative computational statistical methods to identify novel endotypes of childhood asthma, enabling investigation of endotype-specific environmental and genetic associates and discovery of endotype-specific pathophysiological mechanism. As part of my fellowship, I collaborate with this consortium in trying to upscale the current models I have developed in the MAAS cohort to the other longitudinal birth cohorts which form a part of this consortium. Collaboration with this consortium has revolved around integrating expert scientific knowledge to develop customised supervised and unsupervised statistical machine learning models to understand the longitudinal progression of disease symptoms and comorbidities. In understanding asthma and allergic disease, this collaboration is of major importance for replication of findings and for validating the generalisability of models which I have previously developed to work towards more targeted treatment and management strategies.
Collaborator Contribution An important aspect of this opportunity is broadening my research horizons through collaboration with leading scientists, clinicians and other statisticians allowing me to develop and explore novel ways of approaching statistical machine learning with complex data.
Impact This research has focused on the development of probabilistic and graphical models in the context of asthma and allergic disease during childhood, but is generalizable to profiling patients with greater accuracy to allow us to move towards more personalised disease management strategies through understanding the underlying latent manifestations of disease and their distinct genetic and environmental characteristics. This required effective communication as a statistician within an interdisciplinary research setting where expert clinical knowledge is used to inform optimal modelling strategies. In terms of outputs of publications, there are currently 5 manuscripts which are work in progress with the aim to publish at the end of the year.
Start Year 2015
 
Description Collaboration with The Childhood Origins of Asthma (COAST) Birth Cohort 
Organisation University of Wisconsin-Madison
Country United States 
Sector Academic/University 
PI Contribution This is a birth cohort based in the University of Wisconsin. We are currently in the process of getting ethics to replicate published work I have done previously using data from the UK to identify different subgroups of asthmatics. The aim is that by identifying heterogeneous subgroups which come under the umbrella term "asthma", we can identify more targeted treatment and management strategies. We are particularly interested in a group of "Persistent Troublesome Wheezers" who do not respond to treatment. Identifying biomarkers for this group and being able to replicate genetic markers across the different studies will facilitate early detection of this group of patients. Incorporating data from this cohort would allow for model validation to extend a current published probabilistic graphical modelling framework by incorporating data from a longitudinal birth cohorts in another geographical location. This collaboration is also geared towards up/multi-scaling subtype discovery based on clinical data to incorporate genetic, biological and molecular data using an integrative systems biology approach to endotype discovery.
Collaborator Contribution Ethics, data, clinical hypotheses and clinical definition of the problem
Impact Manuscript in progress
Start Year 2016
 
Description Genetic Associations for a Multinomial Phenotype 
Organisation University of Bristol
Country United Kingdom 
Sector Academic/University 
PI Contribution Collaboration to investigate the external validity of phenotypes identified jointly with this group by investigating predictive models based on demographic,environmental and genetic characteristics associated with different subgroups of children with distinct longitudinal profiles of asthma and allergic disease. The aim is to investigate endotype-specific pathophysiological mechanisms using longitudinal regression models and hierarchical dimensionality reduction techniques. I am also investigating associations between different subtypes to investigate whether they are truly independent or whether they can be used to infer a single subtype giving rise to multiple longitudinal latent class-defined phenotypes.
Collaborator Contribution Data, collaboration in developing statistical methodology for investigating genetic associations with multinomial outcomes
Impact Ongoing publication
Start Year 2013
 
Description The Importance Allergen Molecules in early life for understanding profiles of allergic diseases in later life 
Organisation Karolinska Institute
Country Sweden 
Sector Academic/University 
PI Contribution Statistical analysis and intellectual input collaborated in manuscript submission
Collaborator Contribution Conceptualising the project Manuscript preparation and submission
Impact Publication
Start Year 2015
 
Description Understanding Developmental profiles of FEV during childhood 
Organisation University of Bristol
Country United Kingdom 
Sector Academic/University 
PI Contribution Creating of manuscript, stataistical analysis framework, intellectual input
Collaborator Contribution Statistical analysis, intellectual input and writing of manuscript
Impact Publication Policy impact
Start Year 2015
 
Description Understanding Longitudinal Profiles of Lung Function over Time 
Organisation University of Melbourne
Department Department of Physiotherapy
Country Australia 
Sector Academic/University 
PI Contribution Presentation of Research to the research group Generation of ideas for understanding longitudinal disease progression
Collaborator Contribution Sharing of data and publication manuscripts
Impact We will submit 2 joint publications
Start Year 2016
 
Description Understanding Subtypes of Childhood Asthma 
Organisation University of Melbourne
Department Centre for Epidemiology & Biostatistics
Country Australia 
Sector Academic/University 
PI Contribution Replicating findings from the Childhood Asthma Study in the Manchester Asthma and Allergy Study Intellectual input in the modelling framework
Collaborator Contribution Intellectual input in the modelling framework
Impact Publication in progress
Start Year 2016
 
Description Cultural Talk (London) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact A talk on the role of Artificial Intelligence in Society. These are cultural sessions for women and the purpose of this talk was to encourage women in science and research and also to tell the public about how artificial intelligence can be used in medicine
Year(s) Of Engagement Activity 2017
 
Description Data Analysis Learning and Inference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited to be a panel member in a discussion on Machine Learning in Society and it's economic and ethical impact
Year(s) Of Engagement Activity 2016
 
Description Gave a Tutorial in Machine Learning for Health at the Deep Learning Indaba in South Africa. The purpose of this summer school was to broaden participation in machine learning among people in the African content 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Gave a Tutorial to 300 students from across Africa in Machine Learning for Health at the Deep Learning Indaba in South Africa. The purpose of this summer school was to broaden participation in machine learning among people in the African content. I am now on the advisory committee of this program
Year(s) Of Engagement Activity 2017
URL http://www.deeplearningindaba.com/
 
Description Invited Keynote at MAchine Learning for Health Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I've been invited to give a keynote at this conference. This is the biggest conference in Machine LEarning for Health
Year(s) Of Engagement Activity 2018
URL http://www.mucmd.org/
 
Description Invited Speaker 1st UK Prediction Modelling in Psychiatric Research (UK-PMPR) Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This talk explored the use of Bayesian Machine Learning for understanding causal mechanisms in asthma and allergic diseases. The result was increased awareness in understanding disease heterogeneity within the context of understanding disease mechanisms and causality. This talk led to further collaborations as well as generation of ideas for understanding different disease areas
Year(s) Of Engagement Activity 2016
 
Description Invited Speaker Royal Statistical Society (Lancashire Group) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact A lecture entitled "The Asthma E-Lab: Discovering Subtypes of Disease with Model-based Machine Learning". More than 100 students and statisticians attended.
Year(s) Of Engagement Activity 2016
 
Description Invited Speaker to STEM at the Laurels School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Invited to the Laurels, an all-girls independent school, by their STEM representative to talk about statistics and a career as a statistician. The school is interested in promoting women to enter careers in Science, engineering and math.
Year(s) Of Engagement Activity 2016
 
Description Mentor (Women in Machine Learning, Barcelona) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Provided mentorship to 2 small groups at the Women in Machine Learning workshop at the NIPs conference. Women are an underrepresented group in machine learning - t NIPs, there were approximately 6000 participants and less than 600 women. The sessions were on (1) considering research careers and how to apply for funding (2) graphical models. The groups were limited to 8 people in each session, thus facilitating open discussion
Year(s) Of Engagement Activity 2016
 
Description Organised Women in Machine Learning 2017 (I was the Senior Program and Mentorship Chair 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I co-organized Women in Machine Learning at NIPS 2018. NIPS is the largest Machine Learning Conference globally (more than 8000 participants) and WiML attracted >900 participants. I was the Senior Program and Mentorship Chair. I organized the program, recruited speakers, recruited more than 50 mentors, all of whom are top of the field in Machine Learning
Year(s) Of Engagement Activity 2017
URL http://wimlworkshop.org/2017/
 
Description Strategies for Compensating for Missing Data (Melbourne) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We generated ideas of how to handling missing data in studies to ensure accurate model inference
Year(s) Of Engagement Activity 2016
 
Description Talk at Imperial on Applying for MRC Fellowships 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Gave a talk on the MRC application process
Year(s) Of Engagement Activity 2016
 
Description Tutorial Chair for Largest Machine Learning Conference (NeurIPS) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This activity is this year. NeurIPS is the largest Machine learning conference and attracts over 8000 participants and influencers in the fields of AI and Machine Learning. I am appointed as 1 of 2 tutorial chairs
Year(s) Of Engagement Activity 2018
 
Description Tutorial at International Conference of Machine Learning on Machine Learning for Personalised Healthcare 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Led and delivered a tutorial at the 2nd largest Machine Learning Conference (5000 people attend the conference, about 2000 attended the tutorial).
Year(s) Of Engagement Activity 2018
URL https://mlhealthtutorial.com/
 
Description What influenced you when you were thinking about applying for that fellowship position 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Participated in a focus group of BME and women to understand
- Identify and examine potential drivers which can encourage you to apply for the next career stage or, conversely, barriers than may inhibit your application
- Generate evidence-based recommendations for developing initiatives to support applicants and increase the number of successful applications
- Compile existing data to build a picture of the applicant pool for both fellowships and lectureships
Year(s) Of Engagement Activity 2017