Efficient AI tools for equitable handling of missing values in population-wide e-health records to advance prevention of chronic diseases
Lead Research Organisation:
UNIVERSITY OF CAMBRIDGE
Department Name: Public Health and Primary Care
Abstract
WHAT IS THE HEALTH CHALLENGE?
Chronic diseases are conditions that last a long time - possibly for a person's whole lifetime. These diseases, which range from diabetes and heart disease through to dementia and cancer, are the leading cause of disability and death worldwide. Around 25% of people in the UK have been diagnosed with at least one chronic disease, and numbers are rising with our ageing population. Often people have more than one chronic disease. This is because some of the causes for chronic diseases - such as smoking and blood pressure - are the same across different diseases.
It is far better for people and cheaper for health services to prevent chronic diseases than to treat unwell patients. Knowing about the future risk of chronic disease can help people and health professionals caring for them make decisions about how to best manage risk through life-style changes and/or medication.
At present, "risk prediction tools" are used to predict a persons' future risk of single chronic diseases based on particular risk factors (eg, age, cholesterol, blood pressure) measured at a single point in time. Two opportunities are being missed. First, because these tools target single diseases, the opportunity to identify people who are at risk of multiple chronic diseases is being missed. Second, decisions based on measures at a single time point do not take into account important fluctuations and changes in risk over time.
There is an urgent need to develop risk prediction tools for multiple chronic diseases together, and to extend them for monitoring risk of disease over a patients' lifetime. Using them in combination with a person's available electronic health records may support:
1. general practices to prioritise chronic disease risk assessments for people at greatest need;
2. patient and clinician shared decision-making;
3. people to self-monitor their chronic disease risk and their risk factors over time.
This could lead to better patient engagement and health for patients and reduce strains on health services.
WHY ARE INNOVATIVE AI TECHNOLOGIES NEEDED?
We are tackling this challenge through analyses of de-identified, linked, nationally collated electronic healthcare datasets across the UK. The datasets include a person's medical history, diagnoses, medications, hospital admissions, hospital procedures, COVID tests and vaccination dates for ~67million people. However, the amount of information available for different people varies substantially, depending on:
evolving standards of care
evolving methods for recording the data
the person's past and current health
factors that influence a person's access to healthcare.
For example, people with diabetes are more likely to have regular cholesterol measurements as part of their routine health care than people without diabetes. Another example is that people living in rural areas generally have fewer GP consultations than people living in urban areas.
It is important to handle these differences otherwise some patient groups are over or under-represented in data analyses. This is especially important when developing risk prediction models and decision-making strategies to avoid disproportionally benefitting more advantaged social groups, exacerbating health inequalities.
Existing AI technologies can tackle this problem but it is impossible to directly apply them to data of this scale (i.e. 67 million people) and it is not known how to make best use of all the available data to optimise their performance. Therefore, essential adaption is required to ensure they are fit-for-purpose.
WHAT ARE THE AIMS OF THIS RESEARCH?
The overarching goal of this research is to mobilise AI tools for handling missing values in electronic healthcare datasets of ~67 million people. We will adapt the tools so they are computationally efficient and optimise them to ensure all people are represented fairly in data analyses.
Chronic diseases are conditions that last a long time - possibly for a person's whole lifetime. These diseases, which range from diabetes and heart disease through to dementia and cancer, are the leading cause of disability and death worldwide. Around 25% of people in the UK have been diagnosed with at least one chronic disease, and numbers are rising with our ageing population. Often people have more than one chronic disease. This is because some of the causes for chronic diseases - such as smoking and blood pressure - are the same across different diseases.
It is far better for people and cheaper for health services to prevent chronic diseases than to treat unwell patients. Knowing about the future risk of chronic disease can help people and health professionals caring for them make decisions about how to best manage risk through life-style changes and/or medication.
At present, "risk prediction tools" are used to predict a persons' future risk of single chronic diseases based on particular risk factors (eg, age, cholesterol, blood pressure) measured at a single point in time. Two opportunities are being missed. First, because these tools target single diseases, the opportunity to identify people who are at risk of multiple chronic diseases is being missed. Second, decisions based on measures at a single time point do not take into account important fluctuations and changes in risk over time.
There is an urgent need to develop risk prediction tools for multiple chronic diseases together, and to extend them for monitoring risk of disease over a patients' lifetime. Using them in combination with a person's available electronic health records may support:
1. general practices to prioritise chronic disease risk assessments for people at greatest need;
2. patient and clinician shared decision-making;
3. people to self-monitor their chronic disease risk and their risk factors over time.
This could lead to better patient engagement and health for patients and reduce strains on health services.
WHY ARE INNOVATIVE AI TECHNOLOGIES NEEDED?
We are tackling this challenge through analyses of de-identified, linked, nationally collated electronic healthcare datasets across the UK. The datasets include a person's medical history, diagnoses, medications, hospital admissions, hospital procedures, COVID tests and vaccination dates for ~67million people. However, the amount of information available for different people varies substantially, depending on:
evolving standards of care
evolving methods for recording the data
the person's past and current health
factors that influence a person's access to healthcare.
For example, people with diabetes are more likely to have regular cholesterol measurements as part of their routine health care than people without diabetes. Another example is that people living in rural areas generally have fewer GP consultations than people living in urban areas.
It is important to handle these differences otherwise some patient groups are over or under-represented in data analyses. This is especially important when developing risk prediction models and decision-making strategies to avoid disproportionally benefitting more advantaged social groups, exacerbating health inequalities.
Existing AI technologies can tackle this problem but it is impossible to directly apply them to data of this scale (i.e. 67 million people) and it is not known how to make best use of all the available data to optimise their performance. Therefore, essential adaption is required to ensure they are fit-for-purpose.
WHAT ARE THE AIMS OF THIS RESEARCH?
The overarching goal of this research is to mobilise AI tools for handling missing values in electronic healthcare datasets of ~67 million people. We will adapt the tools so they are computationally efficient and optimise them to ensure all people are represented fairly in data analyses.
Description | Key findings from Oct 2023-March 2024: (1) Possible to apply multiple imputation analyses to linked electronic health records from approximately 500,000 individuals - takes around 8 hours in the NHS England Secure Data Environment. (2) Scaling of individuals is approximately proportional to time (ie, doubling the number of individuals will double the time taken to run analyses) (3) We are exploring different methods to do multiple imputation to see if the methods scale differently. |
Exploitation Route | There are other researchers working in the NHS England Secure Data Environment who will use the methods we proposed. |
Sectors | Healthcare |
Description | Collaboration with Bristol University with Jonathan Sterne and Kate Tilling |
Organisation | University of Bristol |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Collaboration to conduct team science with Kate Tilling, Jonathan Sterne and Paul Madley-Dowd from Bristol University. |
Collaborator Contribution | Contributing Scientific expertise |
Impact | Not yet |
Start Year | 2023 |