Identifying subtypes of Alzheimer's disease in Electronic Health Records
Lead Research Organisation:
University College London
Department Name: Institute of Health Informatics
Abstract
Patients with Alzheimer's Disease (AD) display large variation in symptom presentation, rate of progression and commodities. As the number of disease factors involved increases it becomes harder to ascertain the specific role they have in effecting the disease and how it should be treated, especially when having to consider the interactions of all the other disease factors. Examining hidden patterns in these disease factors can help untangle their relationship with AD progression and treatment. One type of pattern that can be identified is distinct clusters of patients with similar patterns of these disease factors. Through representing AD heterogeneity in this way, it offers the opportunity for unique insights about the disease to be made.
This research uses several different cluster analysis methods to examine and validate subtypes of AD using electronic health records (EHR). Using EHR means that a variety of clinical attributes about the patient can be used in the analysis. The outcomes can be directly lifted, and are therefore relatable to the patients' experience in NHS. This research will first find AD subtypes based on symptoms, then it will expand to include a variety of comorbidities, to examine how the subtypes differ in rate of progression and other factors.
This research uses several different cluster analysis methods to examine and validate subtypes of AD using electronic health records (EHR). Using EHR means that a variety of clinical attributes about the patient can be used in the analysis. The outcomes can be directly lifted, and are therefore relatable to the patients' experience in NHS. This research will first find AD subtypes based on symptoms, then it will expand to include a variety of comorbidities, to examine how the subtypes differ in rate of progression and other factors.
Organisations
People |
ORCID iD |
Spiros Denaxas (Primary Supervisor) |
Publications
Uijl A
(2019)
Risk factors for incident heart failure in age- and sex-specific strata: a population-based cohort using linked electronic health records.
in European journal of heart failure
Shah S
(2020)
Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure.
in Nature communications
Shah AD
(2019)
Natural language processing for disease phenotyping in UK primary care records for research: a pilot study in myocardial infarction and death.
in Journal of biomedical semantics
Schmidt AF
(2019)
Phenome-wide association analysis of LDL-cholesterol lowering genetic variants in PCSK9.
in BMC cardiovascular disorders
Rafiq M
(2020)
Allergic disease, corticosteroid use, and risk of Hodgkin lymphoma: A United Kingdom nationwide case-control study.
in The Journal of allergy and clinical immunology
Pikoula M
(2019)
Identifying clinically important COPD sub-types using data-driven approaches in primary care population based electronic health records.
in BMC medical informatics and decision making
McMahon C
(2019)
A novel metadata management model to capture consent for record linkage in longitudinal research studies.
in Informatics for health & social care
Hopkins C
(2019)
Antibiotic usage in chronic rhinosinusitis: analysis of national primary care electronic health records.
in Rhinology
Hingorani AD
(2019)
Improving the odds of drug development success through human genomics: modelling study.
in Scientific reports
Hingorani A
(2017)
Flipping the odds of drug development success through human genomics
Henry A
(2019)
The relationship between sleep duration, cognition and dementia: a Mendelian randomization study.
in International journal of epidemiology
Hemingway H
(2017)
Using nationwide 'big data' from linked electronic health records to help improve outcomes in cardiovascular diseases: 33 studies using methods from epidemiology, informatics, economics and social science in the ClinicAl disease research using LInked Bespoke studies and Electronic health Records (CALIBER) programme
in Programme Grants for Applied Research
Farmer RE
(2019)
Associations Between Measures of Sarcopenic Obesity and Risk of Cardiovascular Disease and Mortality: A Cohort Study and Mendelian Randomization Analysis Using the UK Biobank.
in Journal of the American Heart Association
Dickerman BA
(2019)
Avoidable flaws in observational analyses: an application to statins and cancer.
in Nature medicine
Denaxas S
(2019)
Phenotyping UK Electronic Health Records from 15 Million Individuals for Precision Medicine: The CALIBER Resource.
in Studies in health technology and informatics
Denaxas S
(2019)
UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER.
in Journal of the American Medical Informatics Association : JAMIA
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
MR/R502248/1 | 01/10/2017 | 30/09/2021 | |||
1940103 | Studentship | MR/R502248/1 | 01/10/2017 | 30/12/2021 |
Description | Defining and redefining human disease at scale - the human phenome project (GSK) |
Amount | £851,000 (GBP) |
Organisation | GlaxoSmithKline (GSK) |
Sector | Private |
Country | Global |
Start | 01/2020 |
End | 01/2021 |
Title | Monte Carlo Method for Cluster Evaluation for EHR |
Description | This is a tool to evaluate the structure of clusters found using unsupervised machine learning methods in Electronic health records based on comparison of a monte carlo generated null distribution. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2020 |
Provided To Others? | No |
Impact | This tool will help improve the validity of patient subtypes found in clustering studies in EHR, thus increasing the likelihood that these subtypes could be clinically useful. |
Title | Phenome-wide phenotyping algorithms |
Description | Machine-readable versions (CSV files) of electronic health record phenotyping algorithms for Kuan V., Denaxas S., Gonzalez-Izquierdo A. et al. A chronological map of 308 physical and mental health conditions from 4 million individuals in the National Health Service published in the Lancet Digital Health - DOI 10.1016/S2589-7500(19)30012-3 |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | Algorithms are being used in two additional projects: 1) Pathak N. et al Migrant EHR and 2) Denaxas S. et al GSK/phenomics |
URL | https://github.com/spiros/chronological-map-phenotypes |
Title | Synthetic EHR for Cluster Benchmarking |
Description | This tool is a wrapper for the synthetic health record generator SYNTHEA which transforms the data to useable and realistic health records that has distinct and motifiable parameters such as cluster number and cluster seperation, as well as realistic patient outcomes that can be used by researchers to benchmark methods for finding patient subgroups. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2020 |
Provided To Others? | No |
Impact | This method will help researchers validate future tools for patient subtyping, and allow potential users of those methods to understand in greater detail the benefits of those methods. |
Title | tofu |
Description | Tofu is a Python library for generating synthetic UK Biobank data. The UK Biobank is a large open-access prospective research cohort study of 500,000 middle aged participants recruited in England, Scotland and Wales. The study has collected and continues to collect extensive phenotypic and genotypic detail about its participants, including data from questionnaires, physical measures, sample assays, accelerometry, multimodal imaging, genome-wide genotyping and longitudinal follow-up for a wide range of health-related outcomes. Tofu will generate synthetic data which conform to the structure of the baseline data UK Biobank sends researchers by generating random values: For categorical variables (single or multiple choices), a random value will be picked from the UK Biobank data dictionary for that field. For continous variables, a random value will be generated based on the distribution of values reported for that field on the UK Biobank showcase. For date and date/time fields, a random date will be generated. For all other fields, such as polymorphic fields, no data will be generated. Some general observations: The lookups directory contains lookups downloaded from the UK Biobank showcase - they might need to be updated when new fields become available. Data conform to the structure and schema of the baseline file but are otherwise nonsensical: no checks have been implemented across fields. All eid's (patient identifiers) generated from this tool are prefaced with 'fake' in order to avoid confusion with legitimate datasets. Dates randomly generated are between 1910 and 1990 again to avoid confusion with real data. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | Data has been used for training purposes at a postgraduate and postdoc level. |
URL | https://github.com/spiros/tofu |
Description | MRC Methodology Research Panel |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Invited guest member on MRC Methodology Research Panel |
Year(s) Of Engagement Activity | 2018,2019,2020 |
Description | Organisation of Precision medicine Panal discussion in conjection with HDRUK |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | I lead the organisation of a panal discussion hosted at the Wellcome trust in conjunction with HDRUK where industry professionals, academics and policy makers discussed how to properly harness the potential of precision medicine in the NHS. Aimed at PhD students. |
Year(s) Of Engagement Activity | 2020 |
Description | Organised a Discussion panal on AI in healthcare in conjunction with HDRUK |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | I lead the organisation of a panal discussion hosted at the Wellcome trust where industry professionals, academics and policy makers discussed the holdbacks of AI in the NHS. Aimed at PhD students. |
Year(s) Of Engagement Activity | 2019 |
Description | UK Science & Innovation Network & NIH Maternal Health & AI Research Symposium |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Invited to speak at the Maternal Health & AI Research Symposium in Boston, MA, USA. |
Year(s) Of Engagement Activity | 2020 |
Description | Wellcome Innovations Flagships |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Member of the Innovations Flagships panel. Innovations Flagships support the development of exciting new products, technologies and other interventions to prevent or treat disease. |
Year(s) Of Engagement Activity | 2019,2020 |