Application of machine learning to discover new multimorbidity phenotypes associated with poorer outcomes

Lead Research Organisation: Swansea University
Department Name: Institute of Life Science Medical School

Abstract

Multi-morbidity is a poorly defined concept in which people suffer from more than one ongoing condition at the same time. The true extend of multi-morbidity is difficult to assess as there is no agreed definition for reporting. However, analysis of prescribing for chronic conditions and simple counts of different illnesses show that multimorbidity is becoming more common and is associated with poorer outcomes, such as how long people stay in hospital or premature mortality. It would be helpful to identify factors that predate the development of different morbidities to help understand how morbidities develop, which ones are commonly associated with others, to better understand the effectiveness of health services and individual treatments and to identify opportunities to prevent or delay the onset of these conditions.

Because we know so little about the development of these conditions we propose to use new analytical approaches from computer science, known as machine learning, to identify previously hidden or unknown relationships between different conditions. We will use detailed information from the medical records of the 3 million people of Wales held in the Secure Anonymised Information Linkage (SAIL) system. SAIL is a privacy protecting system in which records that have been stripped of all personal identifiers can be used to understand the development of diseases.

We will use the availability of new data on the results of laboratory investigations, such as changes in blood chemistry, to see if these predict the onset of conditions. If we do find useful patterns we will provide this knowledge back to NHS organisations to allow them to improve their services and intervene earlier to protect people's health.

By bringing together routinely collected and epidemiologic data at scale, this proposal exploits the potential of the fast-developing UK health informatics environment. Our team includes a mixture of health service researchers, computer scientists, clinical doctors and members of the public who have helped develop this proposal and will continue to be involved in the research and its dissemination.

Technical Summary

We will exploit the most deeply phenotyped population e-cohort in the UK, created by HDRUK investment, containing detailed multi-sourced data on a 2.5M population with GP records in Wales from 2000-2020, augmented with demographic, multiple disease registry, hospital inpatient, outpatient data and laboratory results from 2007. No other part of the UK has this depth of records in a stable population with low levels of migration and loss to follow up.
Useful algorithms will be adopted by the NHS with tracking of intervention and subsequent impact.
Objectives
A. complete the most deeply phenotyped population e-cohort in the UK using existing data from the Secure Anonymised Information Linkage (SAIL) system augmented with Census data
B. apply innovative machine learning approaches to validate and refine clusters of conditions detected across the adult life course
C. use cohort data to identify mechanistic pathways underlying disease combinations
D. report on prevalence, social patterning and health inequalities using small area, census, taxation, and household composition data
E. identify potential biomarkers predicting individual and multiple morbidities through longitudinal trajectories of values in routinely collected laboratory data
F. undertake a comprehensive analysis of variables used in established morbidity/comorbidity indices with multiple correspondence analysis and factor analysis of mixed data to identify clusters
G. provide new variables for linkage to the 20,000 Welsh participants in UK Biobank, 7,000 in Airwave and 15,000+ in Healthwise Wales and use algorithms for 40+ cohorts in DPUK for further studies into the genetics of shared mechanistic pathways
H. contribute data on incidence, prevalence and burden to the Global Burden of Diseases
I. contribute validated algorithms into NHS systems to allow for early NHS adoption, supporting precision medicine and impact measurement

Planned Impact

Impact summary

The beneficiaries of this research are:

1. People with multiple medical conditions that are being sub-optimally treated by partially effective therapies, or those that cause significant side effects
2. General population
3. Clinicians
4. NHS management
5. Research community
6. Government and NHS policy community
7. UK industry and competitiveness

Specific benefits to these groups are listed below:

Better insight into clustering of morbidities, their causes and the development of better targeted therapies will provide significant benefits to a wide range of beneficiaries through:
A. Fewer people whose lives are blighted by the impact of their morbidities and side effects of sub-optimal therapies will improve their quality of life and creativity, reduce demands on the NHS and for long term social care, increase the proportion of the population able to continue as active contributors to wealth generation (through income tax, reduced call on early pensions, enhanced economic productivity)

B. As better understanding of the causes of multimorbidity are identified and preventive strategies are put in place fewer people should develop these conditions at earlier ages leading to improved population health, wellbeing and productivity. Fewer members of the public will be required to undertake premature carer roles.

C. Improved sustainability and productivity of the NHS through reduced demand, freeing resources to be used for anticipatory and elective care with proven benefits (e.g. earlier identification of cancer and joint replacements that prolong independence and economic activity) and the ability of the NHS to cope better with surges in demand (e.g. influenza). The creation of a total population platform for understanding the development of multimorbidity will also provide the basis for testing novel service redesign and innovative policy approaches to improving NHS efficiency and effectiveness.

D. The development of deeply phenotyped cohorts of patients available to the wider research community through trusted privacy protecting environments will stimulate further research into the aetiology, biological and social determinants of illnesses and support and enhance UK Life Sciences global competitiveness in developing new therapies and approaches to management using artificial intelligence and machine learning. This project will lead to significant advances in multidisciplinary approaches in methodology and application to challenging 'Big data' research questions.

E. The sharing of data and methodological approaches across the two principle sites and NHS organisations (and subsequently through HDR UK and other initiatives) will increase the provision of skill sets and skilled people to the UK workforce.

Publications

10 25 50
 
Description Strategic Coordination of Health of the Public Research Committee (SCHOPR) to develop a set of public health research principles and goals to guide funding decisions
Geographic Reach Local/Municipal/Regional 
Policy Influence Type Participation in a national consultation
 
Description Data phenotyping longitudinal multimorbidity trajectories in cardiovascular disease: a statistical machine learning approach using nationwide electronic healthcare records
Amount £301,553 (GBP)
Organisation Alan Turing Institute 
Sector Academic/University
Country United Kingdom
Start 12/2019 
End 11/2022
 
Title Creation of the Wales multi-morbidity e-cohort 
Description Development of a population-wide e-cohort, derived utilising data linkage techniques and including multi-sourced anonymised routine health and demographic data held within the SAIL Databank. The e-cohorts will be used to characterise multi-morbidity and its clustering, determinants and outcomes (association with mortality and healthcare utilisation). Building the e-cohort has involved multiple disciplines across organisations within the UK, and will be harmonised and compared with data on individuals across the UK. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact The e-cohort allows researchers to apply statistical analyses and machine learning methods to evaluate multi-morbidity clustering and determinants at a population level. 
 
Description Alzheimer's Disease Data Initiative Working Group 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Washington and London
Year(s) Of Engagement Activity 2018,2019
 
Description Big data and public health 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Sanda Macara BMA Memorial Lecture, Cardiff
Year(s) Of Engagement Activity 2020
 
Description Consumer Panel for Data Linkage, Development of Wales Multi-morbidity Cohort 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Swansea
Year(s) Of Engagement Activity 2019
 
Description Creating a national approach: the SAIL Databank 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Joint UK-Switzerland Research Symposium, Zurich
Year(s) Of Engagement Activity 2019
 
Description Dementias Platform UK: Data Portal 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation to MRC Oversight Board, London
Year(s) Of Engagement Activity 2019
 
Description International Collaborative Efforts on Injury Statistics and Methods 2016 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Research officer attended the biennial International Collaborative Efforts on Injury Statistics and Methods meeting, which includes injury epidemiologist experts from around the World to present and discuss new research and findings and provides a platfom to recieve feedback from leading experts in the field. The meeting provides an opportunity for individuals to collaborate and identify key areas for future research to improve survival and quality of life following injury.
Year(s) Of Engagement Activity 2016
 
Description International Collaborative Efforts on Injury Statistics and Methods 2018 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Research officer attended the biennial International Collaborative Efforts on Injury Statistics and Methods meeting, which includes injury epidemiologist experts from around the World to present and discuss new research and findings and provides a platfom to recieve feedback from leading experts in the field. The meeting provides an opportunity for individuals to collaborate and identify key areas for future research to improve survival and quality of life following injury.
Year(s) Of Engagement Activity 2018
 
Description Making game changing improvements in the health of patients and populations 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Adelaide University, Australia
Year(s) Of Engagement Activity 2018
 
Description Meeting with CMO Wales, Dr Frank Atherton, on big data and public health evaluation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Swansea
Year(s) Of Engagement Activity 2019
 
Description Population Research Resources Workshop (ESRC, MRC, Wellcome) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact London
Year(s) Of Engagement Activity 2019
 
Description Presented at the ADRN Conference 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Research Officer presentated at the ADRN Conference 2017 to promote new research carried out by the funding organisation and to provide a platform to highlight research to individuals from various backgrounds (academia, clinicans, students, various organising bodies).
Year(s) Of Engagement Activity 2017
 
Description Presented at the ADRN Conference 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Research Officer presentated at the ADRN Conference 2018 to promote new research carried out by the funding organisation and to provide a platform to highlight research to individuals from various backgrounds (academia, clinicans, students, various organising bodies).
Year(s) Of Engagement Activity 2018
 
Description Presented at the World Safety Conference 2016 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Research officer attended and presented at World Safety Conference 2016 in Tempere, Finland, which included thousands of attendees from all over the World and from varying backgrounds (academia, clinicans, students etc.). Research was presented on a new healthcare service for Wales and provided a platform to promote the research and gain feedback and discussion from individuals working in similar industries or similar expertise.
Year(s) Of Engagement Activity 2016
 
Description Royal College of Physicians. Cardiovascular, metabolic and kidney disease: crosscutting science and best practice in multimorbidity 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact London
Year(s) Of Engagement Activity 2019
 
Description Saving the NHS through Data Science 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Health and Care Research Wales conference, Cardiff
Year(s) Of Engagement Activity 2018
 
Description Symposium on Big Data in Healthcare. Distributed team science: real world and cohort data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Partnership between Weizmann Institute of Science and Nature Medicine, Tel Aviv
Year(s) Of Engagement Activity 2019
 
Description Technology Sector Symposium, Developing a Health Data Research Platform 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact London
Year(s) Of Engagement Activity 2019
 
Description Weizmann Institute of Science, Big Data Meeting, Tel-Aviv 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Tel-Aviv
Year(s) Of Engagement Activity 2019