Bringing Innovative Research Methods to Clustering Analysis of Multimorbidity (BIRM-CAM)

Lead Research Organisation: University of Birmingham
Department Name: Institute of Applied Health Research


Multimorbidity is when people suffer from more than one long-term illness. It is increasingly common as people live longer. It is important because individual illnesses have knock-on effects on others, it is more complex managing multiple than single illnesses, and multimorbid patients are heavy users of medications and health services.

To understand multimorbidity we need to know which illnesses tend to occur together and which illness combinations most affect health. To adapt health services we need to know which types of people develop multimorbidity: their age, sex, ethnicity, socio-economic status and whether they tend to live in the same households. To learn how to prevent it we need to identify lifestyle factors (physical activity, diet, smoking, alcohol) linked to multimorbidity and the measurements (laboratory test results, weight, blood pressure) that might be early signs.

Electronic health records are a good source of information on multimorbidity because they include information on the same patient over many years. They include information on illnesses, medications, hospital admissions; measurements (laboratory tests, weight, blood pressure) and lifestyle (smoking, alcohol). Previous research has studied multimorbidity using a variety of statistical methods. It finds some illnesses, such as diabetes and heart disease tend to occur together. But different statistical methods often find different groups of illnesses. We need a single, consistent approach to this type of analysis to ensure we are researching the same groups of illnesses. Previous research generally has not made best use of all the available information. For example, patients are considered either to have or not have diabetes but research did not make use of laboratory measurements (such as blood glucose) identifying some people as likely to develop diabetes. Previous research grouped illnesses according to how commonly they occur together, without giving any special significance to combinations of illnesses linked to risk of death or hospital admission. Clearly such combinations of illness are of more importance. There are more advanced analysis methods which can address these and other shortcomings.

The first part of our research will develop methods of data analysis. We will review research on different statistical methods for grouping illnesses together. We will hold a workshop involving leading UK researchers in the field to try to agree on the best approach to this type of analysis. Informed by this we will analyse two large databases of electronic health records, each including several million patients. In each database we will identify the groups of illnesses that co-occur and check our findings in the other database. This is considered good practice in analysis. At the end of this step we will produce software to analyse and find groups of illnesses in electronic health records and make this freely available for other researchers to use.

The next part of our research will use additional information from two large surveys. Both surveys include details not always available in health records e.g. occupation, diet, lifestyle and measures of frailty. One includes 500,000 people the other has information on the same people over a period of 14 years. We will describe the consequences for patients of different combinations of illnesses: their levels of frailty because it is linked to need for social care; development of further illnesses; medications, use of health services and death. We will work with patient advisors to help guide analysis of patients journeys through health services. We will investigate possible causes of multimorbidity including people's social circumstances, the environment, lifestyle (smoking, alcohol, diet and exercise) and laboratory test results that might help indicate causes. This step will point to the areas of environment and lifestyle which should be investigated further as possible causes.

Technical Summary

In this programme, to increase our understanding of multimorbidity (MM), we develop and implement state-of-the-art statistical methods for the analysis of electronic health records data. We will focus on identifying MM clusters and then investigating their consequences and causes. We will hold a stakeholder workshop to seek consensus on methods for investigating MM clusters. As well as critically reviewing existing literature, we will construct new outcome-guided probabilistic clustering techniques to identify cross-sectional and longitudinal patterns of MM associated with clinically relevant outcomes. For the latter, we will produce new time-sequence kernel approaches for grouping sequences of acquired conditions and multi-state models that will additionally model time-to-condition (new comorbidity, hospitalisation or mortality) data. Landmarking approaches will be used to include trajectories of continuous biomarker data within these models to improve their prognostic utility. These methods will serve as the basis of novel clinical prediction tools that can be used to guide decision making. All our methods will be documented and made freely available via a "methodological commons" that will provide reproducible code notebooks and visualisation of findings in ways that aid clinical interpretation. The utility of our methods will be demonstrated and validated using two large national primary care databases (THIN and CPRD) and we will conduct a detailed exploration of longitudinal implications of clusters for polypharmacy, prognosis and frailty. We will examine the aetiological mechanisms of MM clusters by referencing data within the UK Biobank and ELSA using sociodemographic, environmental, behavioural and biomarker variables. By identifying the characteristics of patients within these clusters, we will identify candidates for development and evaluation of interventions to improve outcomes for multimorbid patients by better tailoring care to their MM profile.

Planned Impact

In the short to medium term this research will benefit those responsible for clinical education, authors of clinical guidelines and those responsible for planning services for patients with multimorbidity. By identifying disease clusters most associated with hospital admission and mortality we will identify groups of patients whose clinical care could be more usefully coordinated. Within existing services a first step towards improving care of multimorbid patients is by providing appropriate cross-speciality training for clinicians equipping them with skills to deal with commonly encountered, important comorbidities. A further step is to inform guidelines so that they take account of important comorbidities. A more comprehensive solution is reorganising services to reduce burden of treatment experienced by patients (e.g. duplicate attendances) and to develop tailored services for patients with important multimorbidities. This means organising services around patients' needs not around pre-existing clinical specialities and patterns of service delivery. Ultimately, more rationally organised care also has the potential to reduce treatment burden on patients, to reduce carer burden and to improve health.

In the medium to long term a better understanding of which multimorbidity clusters have the greatest impact on health may lead to development of better care and management for these conditions. This has the potential to improve patient quality of life, clinical outcomes and to reduce health service costs.

In the longer term it is hoped to develop an understanding of the relationship between socioeconomic factors, environmental factors and family factors on health related behaviour and subsequent development of multimorbidity. It is hoped that a better understanding of the causes and progress of multimorbidity may lead to interventions to help move individuals away from an adverse trajectory to a more favourable health trajectory. This means identifying individuals likely to develop multimorbidity clusters and developing and evaluating potential social, behavioural or clinical interventions for these individuals: primary prevention. It also means identifying individuals in the early stages of a multimorbidity cluster (with one or two morbidities), then developing and evaluating potential social, behavioural or clinical interventions to alter the time sequence of conditions: secondary prevention.


10 25 50
Description ADMISSION UK Multimorbidity Research Collaborative on Multiple Long-Term Conditions in Hospital: from burden and inequalities to underlying mechanisms
Amount £4,818,158 (GBP)
Funding ID MR/V033654/1 
Organisation Newcastle University 
Sector Academic/University
Country United Kingdom
Start 03/2021 
End 02/2025
Description OPTIMising therapies, disease trajectories, and AI assisted clinical management for patients Living with complex multimorbidity
Amount £2,495,158 (GBP)
Funding ID NIHR202632 
Organisation University of Birmingham 
Sector Academic/University
Country United Kingdom
Start 07/2021 
End 08/2024
Description Therapies for long COVID in non-hospitalised individuals: from symptoms, patient-reported outcomes and immunology to targeted therapies (The TLC Study)
Amount £225,715,700 (GBP)
Organisation University of Birmingham 
Sector Academic/University
Country United Kingdom
Start 03/2021 
End 02/2023
Title Clustering analysis code for R 
Description R software to undertake clustering using five different methods. Hierarchical Clustering Analysis, K-means, K-modes, Latent Class Analysis, Multiple Correspondence Analysis followed by K means. R software to create a simulated dataset. 
Type Of Material Data analysis technique 
Year Produced 2022 
Provided To Others? Yes  
Impact The software can be used by others. It has contributed to our paper on clustering methodology 
Title Tutorial on profile regression clustering 
Description Tutorial on profile regression clustering of multimorbidity data, including the creation of heatmap comparing outcome-guided and unsupervised clustering. 
Type Of Material Data analysis technique 
Year Produced 2019 
Provided To Others? Yes  
Impact None