Application of machine learning to discover new multimorbidity phenotypes associated with poorer outcomes

Lead Research Organisation: Swansea University
Department Name: Institute of Life Science Medical School

Abstract

Multi-morbidity is a poorly defined concept in which people suffer from more than one ongoing condition at the same time. The true extend of multi-morbidity is difficult to assess as there is no agreed definition for reporting. However, analysis of prescribing for chronic conditions and simple counts of different illnesses show that multimorbidity is becoming more common and is associated with poorer outcomes, such as how long people stay in hospital or premature mortality. It would be helpful to identify factors that predate the development of different morbidities to help understand how morbidities develop, which ones are commonly associated with others, to better understand the effectiveness of health services and individual treatments and to identify opportunities to prevent or delay the onset of these conditions.

Because we know so little about the development of these conditions we propose to use new analytical approaches from computer science, known as machine learning, to identify previously hidden or unknown relationships between different conditions. We will use detailed information from the medical records of the 3 million people of Wales held in the Secure Anonymised Information Linkage (SAIL) system. SAIL is a privacy protecting system in which records that have been stripped of all personal identifiers can be used to understand the development of diseases.

We will use the availability of new data on the results of laboratory investigations, such as changes in blood chemistry, to see if these predict the onset of conditions. If we do find useful patterns we will provide this knowledge back to NHS organisations to allow them to improve their services and intervene earlier to protect people's health.

By bringing together routinely collected and epidemiologic data at scale, this proposal exploits the potential of the fast-developing UK health informatics environment. Our team includes a mixture of health service researchers, computer scientists, clinical doctors and members of the public who have helped develop this proposal and will continue to be involved in the research and its dissemination.

Technical Summary

We will exploit the most deeply phenotyped population e-cohort in the UK, created by HDRUK investment, containing detailed multi-sourced data on a 2.5M population with GP records in Wales from 2000-2020, augmented with demographic, multiple disease registry, hospital inpatient, outpatient data and laboratory results from 2007. No other part of the UK has this depth of records in a stable population with low levels of migration and loss to follow up.
Useful algorithms will be adopted by the NHS with tracking of intervention and subsequent impact.
Objectives
A. complete the most deeply phenotyped population e-cohort in the UK using existing data from the Secure Anonymised Information Linkage (SAIL) system augmented with Census data
B. apply innovative machine learning approaches to validate and refine clusters of conditions detected across the adult life course
C. use cohort data to identify mechanistic pathways underlying disease combinations
D. report on prevalence, social patterning and health inequalities using small area, census, taxation, and household composition data
E. identify potential biomarkers predicting individual and multiple morbidities through longitudinal trajectories of values in routinely collected laboratory data
F. undertake a comprehensive analysis of variables used in established morbidity/comorbidity indices with multiple correspondence analysis and factor analysis of mixed data to identify clusters
G. provide new variables for linkage to the 20,000 Welsh participants in UK Biobank, 7,000 in Airwave and 15,000+ in Healthwise Wales and use algorithms for 40+ cohorts in DPUK for further studies into the genetics of shared mechanistic pathways
H. contribute data on incidence, prevalence and burden to the Global Burden of Diseases
I. contribute validated algorithms into NHS systems to allow for early NHS adoption, supporting precision medicine and impact measurement

Planned Impact

Impact summary

The beneficiaries of this research are:

1. People with multiple medical conditions that are being sub-optimally treated by partially effective therapies, or those that cause significant side effects
2. General population
3. Clinicians
4. NHS management
5. Research community
6. Government and NHS policy community
7. UK industry and competitiveness

Specific benefits to these groups are listed below:

Better insight into clustering of morbidities, their causes and the development of better targeted therapies will provide significant benefits to a wide range of beneficiaries through:
A. Fewer people whose lives are blighted by the impact of their morbidities and side effects of sub-optimal therapies will improve their quality of life and creativity, reduce demands on the NHS and for long term social care, increase the proportion of the population able to continue as active contributors to wealth generation (through income tax, reduced call on early pensions, enhanced economic productivity)

B. As better understanding of the causes of multimorbidity are identified and preventive strategies are put in place fewer people should develop these conditions at earlier ages leading to improved population health, wellbeing and productivity. Fewer members of the public will be required to undertake premature carer roles.

C. Improved sustainability and productivity of the NHS through reduced demand, freeing resources to be used for anticipatory and elective care with proven benefits (e.g. earlier identification of cancer and joint replacements that prolong independence and economic activity) and the ability of the NHS to cope better with surges in demand (e.g. influenza). The creation of a total population platform for understanding the development of multimorbidity will also provide the basis for testing novel service redesign and innovative policy approaches to improving NHS efficiency and effectiveness.

D. The development of deeply phenotyped cohorts of patients available to the wider research community through trusted privacy protecting environments will stimulate further research into the aetiology, biological and social determinants of illnesses and support and enhance UK Life Sciences global competitiveness in developing new therapies and approaches to management using artificial intelligence and machine learning. This project will lead to significant advances in multidisciplinary approaches in methodology and application to challenging 'Big data' research questions.

E. The sharing of data and methodological approaches across the two principle sites and NHS organisations (and subsequently through HDR UK and other initiatives) will increase the provision of skill sets and skilled people to the UK workforce.

Publications

10 25 50

publication icon
Rafferty J (2021) Ranking sets of morbidities using hypergraph centrality. in Journal of biomedical informatics

publication icon
Rafferty J (2021) Journal of Biomedical Informatics 2021 in Ranking Sets of Morbidities using Hypergraph Centrality

 
Description Member of the Bevan Commission Working Group
Geographic Reach Local/Municipal/Regional 
Policy Influence Type Participation in a guidance/advisory committee
 
Description Member of the Welsh Government COVID-19 Technical Advisory Group
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
 
Description A multidisciplinary research network to tackle multimorbidity
Amount £37,000 (GBP)
Funding ID P123221 
Organisation University of Manchester 
Sector Academic/University
Country United Kingdom
Start 01/2019 
End 12/2020
 
Description Assembling the data jigsaw: powering robust population research in MSK disease
Amount £1,300,000 (GBP)
Organisation Nuffield Foundation 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2020 
End 12/2023
 
Description Controlling COVID-19 through enhanced population surveillance and intervention (Con-COV): a platform approach
Amount £833,046 (GBP)
Funding ID MR/V028367/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 08/2020 
End 07/2021
 
Description Data phenotyping longitudinal multimorbidity trajectories in cardiovascular disease: a statistical machine learning approach using nationwide electronic healthcare records
Amount £301,553 (GBP)
Organisation Alan Turing Institute 
Sector Academic/University
Country United Kingdom
Start 12/2019 
End 11/2022
 
Description Senior Research Leaders Discretionary Award
Amount £60,000 (GBP)
Funding ID SRL2022-25-09 
Organisation Health and Care Research Wales 
Sector Public
Country United Kingdom
Start 04/2022 
End 03/2023
 
Title Creation of the Wales multi-morbidity e-cohort 
Description Development of a population-wide e-cohort, derived utilising data linkage techniques and including multi-sourced anonymised routine health and demographic data held within the SAIL Databank. The e-cohorts will be used to characterise multi-morbidity and its clustering, determinants and outcomes (association with mortality and healthcare utilisation). Building the e-cohort has involved multiple disciplines across organisations within the UK, and will be harmonised and compared with data on individuals across the UK. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact The e-cohort allows researchers to apply statistical analyses and machine learning methods to evaluate multi-morbidity clustering and determinants at a population level. 
 
Description Novo Nordisk collaboration for advancing understanding of multimorbidity in metabolic disease 
Organisation Novo Nordisk
Country Denmark 
Sector Private 
PI Contribution Novo Nordisk is a Danish multinational pharmaceutical company headquartered in Bagsværd, Denmark, with production facilities in eight countries, and affiliates or offices in 5 countries. Novo Nordisk manufactures and markets pharmaceutical products and services specifically diabetes care medications and devices. Novo Nordisk is also involved with hemostasis management, growth hormone therapy and hormone replacement therapy. The company makes several drugs under various brand names, including Levemir, Tresiba, NovoLog, Novolin R, NovoSeven, NovoEight and Victoza. Novo Nordisk employs more than 40,000 people globally, and markets its products in 180 countries.
Collaborator Contribution Novo Nordisk works with the University of Manchester and the University of Oxford on innovation in statistical machine learning for advancing understanding of multimorbidity and polypharmacy in metabolic disease.
Impact None so far.
Start Year 2020
 
Description 28/01/2020 Big Data and Public Health. Sandra Macara BMA Memorial Lecture 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Industry/Business
Results and Impact PRESENTATION TO BIG DATA AND PUBLIC HEALTH, SANDRA MACARA MEMORIAL LECTURE IN CARDIFF
Year(s) Of Engagement Activity 2020
 
Description Alzheimer's Disease Data Initiative Working Group 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Washington and London
Year(s) Of Engagement Activity 2018,2019
 
Description Big data and public health 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Sanda Macara BMA Memorial Lecture, Cardiff
Year(s) Of Engagement Activity 2020
 
Description Conducting research through globally accessible trusted research environments, University of Sydney 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Conducting research through globally accessible trusted research environments, University of Sydney, 11/11/22
Year(s) Of Engagement Activity 2022
 
Description Consumer Panel for Data Linkage, Development of Wales Multi-morbidity Cohort 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Swansea
Year(s) Of Engagement Activity 2019
 
Description Consumer Panel meeting - Januray 2021 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Patients, carers and/or patient groups
Results and Impact Topics covered;
• UCL - AIM project proposal
• Public Health Wales - The mental health in shielded children and children living in shielded households
• HDR UK - How has COVID-19 impacted non-COVID-19 healthcare service use and provision in Wales?
• Examining associations between complications of pregnancy and incident cardiovascular disease
Year(s) Of Engagement Activity 2021
 
Description Consumer Panel meeting - June 2020 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Patients, carers and/or patient groups
Results and Impact 16 members of the public who meet on a quarterly basis to provide public persepctive on health data research projects. This meeting covered the topics of;
PPI/E impact pathway for the BREATHE Hub
Contact Tracing App deliberation project PPI/E plan
Year(s) Of Engagement Activity 2020
 
Description Consumer Panel meeting - September 2020 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Patients, carers and/or patient groups
Results and Impact 16 members of the public who meet on a quarterly basis to provide public persepctive on health data research projects. This meeting covered the topics of;
Agriculture community wellbeing study
Summary of work carried out by HDR UK during the pandemic
Health Foundation; Networked Data Labs Wales - discussion on topic selection
Year(s) Of Engagement Activity 2020
 
Description Creating a national approach: the SAIL Databank 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Joint UK-Switzerland Research Symposium, Zurich
Year(s) Of Engagement Activity 2019
 
Description Data linkage research- opportunities for enhanced UK/Australia collaboration 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Data linkage research- opportunities for enhanced UK/Australia collaboration, University of New South Wales, 08/11/22
Year(s) Of Engagement Activity 2022
 
Description Dementias Platform UK: Data Portal 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation to MRC Oversight Board, London
Year(s) Of Engagement Activity 2019
 
Description Development of a collaboration with NHS Wales organisations and Chicago Medical School to develop a system to evaluate the utility of artificial intelligence/machine learning to derive additional information from medical images to support research and ultimately efficient health service delivery and patient outcomes. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Development of a collaboration with NHS Wales organisations and Chicago Medical School to develop a system to evaluate the utility of artificial intelligence/machine learning to derive additional information from medical images to support research and ultimately efficient health service delivery and patient outcomes. Presentation to SAIL Consumer Panel.
Year(s) Of Engagement Activity 2022
 
Description International Collaborative Efforts on Injury Statistics and Methods 2016 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Research officer attended the biennial International Collaborative Efforts on Injury Statistics and Methods meeting, which includes injury epidemiologist experts from around the World to present and discuss new research and findings and provides a platfom to recieve feedback from leading experts in the field. The meeting provides an opportunity for individuals to collaborate and identify key areas for future research to improve survival and quality of life following injury.
Year(s) Of Engagement Activity 2016
 
Description International Collaborative Efforts on Injury Statistics and Methods 2018 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Research officer attended the biennial International Collaborative Efforts on Injury Statistics and Methods meeting, which includes injury epidemiologist experts from around the World to present and discuss new research and findings and provides a platfom to recieve feedback from leading experts in the field. The meeting provides an opportunity for individuals to collaborate and identify key areas for future research to improve survival and quality of life following injury.
Year(s) Of Engagement Activity 2018
 
Description Making game changing improvements in the health of patients and populations 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Adelaide University, Australia
Year(s) Of Engagement Activity 2018
 
Description Meeting with CMO Wales, Dr Frank Atherton, on big data and public health evaluation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Swansea
Year(s) Of Engagement Activity 2019
 
Description NIHR AI for Multiple Long Term Conditions Research Support Facility. Panel Discussion on Public and Patient, Involvement and Engagement 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact NIHR AI for Multiple Long Term Conditions Research Support Facility. Panel Discussion on Public and Patient, Involvement and Engagement. 16/09/22
Year(s) Of Engagement Activity 2022
 
Description Overview of Trusted Research Environment globally, Dementias Platform Australia, University of New South Wales 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Overview of Trusted Research Environment globally, Dementias Platform Australia, University of New South Wales, 08/11/22
Year(s) Of Engagement Activity 2022
 
Description PPI/E MuRMUR bid - public working group 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact A regular public working group which has been established as part of the MuRMUR (Multi-morbidity) project. The group meet on a regular basis to provide public perspective on the projects under the MuRMUR banner.
Year(s) Of Engagement Activity 2020,2021
 
Description Population Research Resources Workshop (ESRC, MRC, Wellcome) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact London
Year(s) Of Engagement Activity 2019
 
Description Presented at the ADRN Conference 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Research Officer presentated at the ADRN Conference 2017 to promote new research carried out by the funding organisation and to provide a platform to highlight research to individuals from various backgrounds (academia, clinicans, students, various organising bodies).
Year(s) Of Engagement Activity 2017
 
Description Presented at the ADRN Conference 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Research Officer presentated at the ADRN Conference 2018 to promote new research carried out by the funding organisation and to provide a platform to highlight research to individuals from various backgrounds (academia, clinicans, students, various organising bodies).
Year(s) Of Engagement Activity 2018
 
Description Presented at the World Safety Conference 2016 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Research officer attended and presented at World Safety Conference 2016 in Tempere, Finland, which included thousands of attendees from all over the World and from varying backgrounds (academia, clinicans, students etc.). Research was presented on a new healthcare service for Wales and provided a platform to promote the research and gain feedback and discussion from individuals working in similar industries or similar expertise.
Year(s) Of Engagement Activity 2016
 
Description Proposal for a Population Data Science Research Institute (PDRSI) - presentation to Swansea University faculties 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact Proposal for a Population Data Science Research Institute (PDRSI) - presentation to Swansea University faculties 13/12/22
Year(s) Of Engagement Activity 2022
 
Description Royal College of Physicians. Cardiovascular, metabolic and kidney disease: crosscutting science and best practice in multimorbidity 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact London
Year(s) Of Engagement Activity 2019
 
Description Saving the NHS through Data Science 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Health and Care Research Wales conference, Cardiff
Year(s) Of Engagement Activity 2018
 
Description Statistical Methods for Covid-19: Mortality Statistics - Wales (Royal Statistical Society) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Statistical Methods for Covid-19: Mortality Statistics - Wales. Presentation to Royal Statistical Society four nation meeting on COVID-19 mortality, 23/03/22
Year(s) Of Engagement Activity 2022
 
Description Statistical Methods for Covid-19: Mortality Statistics - Wales (Welsh Government) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Policymakers/politicians
Results and Impact Statistical Methods for Covid-19: Mortality Statistics - Wales. Presentation to Welsh Government COVID-19 Technical Advisory Group, 25/03/22
Year(s) Of Engagement Activity 2022
 
Description Swansea Volunteer Management Network Meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Third sector organisations
Results and Impact Around 30 volunteer managers from a wide range of third sector organisations from across the Swansea area meet on a regular basis to support one another and share good practice, this is a reccuring bimonthly meeting. In the October 2020 meeting, Lynsey Cross delivered a presentation on the PPI/E work which is carried out by HDR UK. From this presentation a number of working relationships have been forged and a new member of the public has applied to become a member of the Consumer Panel.
Year(s) Of Engagement Activity 2020,2021
 
Description Swansea Volunteer Management Network Meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Third sector organisations
Results and Impact Around 30 volunteer managers from a wide range of third sector organisations from across the Swansea area meet on a regular basis to support one another and share good practice, this is a reccuring bimonthly meeting. In the October 2020 meeting, Lynsey Cross delivered a presentation on the PPI/E work which is carried out by HDR UK. From this presentation a number of working relationships have been forged and a new member of the public has applied to become a member of the Consumer Panel.
Year(s) Of Engagement Activity 2021
 
Description Symposium on Big Data in Healthcare. Distributed team science: real world and cohort data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Partnership between Weizmann Institute of Science and Nature Medicine, Tel Aviv
Year(s) Of Engagement Activity 2019
 
Description Technology Sector Symposium, Developing a Health Data Research Platform 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact London
Year(s) Of Engagement Activity 2019
 
Description Update for SAIL Consumer Panel 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact Update for SAIL Consumer Panel. 29/06/22
Year(s) Of Engagement Activity 2022
 
Description Using an advanced routine healthcare data systems to improve population health, clinical care and inform policy: the COVID-19 pandemic and beyond 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Using an advanced routine healthcare data systems to improve population health, clinical care and inform policy: the COVID-19 pandemic and beyond. Maurice Bloch Lecture, Glasgow University, 26/01/22. https://youtu.be/2B4Ak0YuOng
Year(s) Of Engagement Activity 2022
URL https://youtu.be/2B4Ak0YuOng
 
Description Weizmann Institute of Science, Big Data Meeting, Tel-Aviv 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Tel-Aviv
Year(s) Of Engagement Activity 2019
 
Description Working with Swansea University on the ARDC Secure eResearch Platform, Monash University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Working with Swansea University on the ARDC Secure eResearch Platform, Monash University, 15/11/22
Year(s) Of Engagement Activity 2022