Identifying subtypes of Alzheimer's disease in Electronic Health Records

Lead Research Organisation: University College London
Department Name: Institute of Health Informatics

Abstract

Patients with Alzheimer's Disease (AD) display large variation in symptom presentation, rate of progression and commodities. As the number of disease factors involved increases it becomes harder to ascertain the specific role they have in effecting the disease and how it should be treated, especially when having to consider the interactions of all the other disease factors. Examining hidden patterns in these disease factors can help untangle their relationship with AD progression and treatment. One type of pattern that can be identified is distinct clusters of patients with similar patterns of these disease factors. Through representing AD heterogeneity in this way, it offers the opportunity for unique insights about the disease to be made.

This research uses several different cluster analysis methods to examine and validate subtypes of AD using electronic health records (EHR). Using EHR means that a variety of clinical attributes about the patient can be used in the analysis. The outcomes can be directly lifted, and are therefore relatable to the patients' experience in NHS. This research will first find AD subtypes based on symptoms, then it will expand to include a variety of comorbidities, to examine how the subtypes differ in rate of progression and other factors.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
MR/R502248/1 01/10/2017 30/09/2021
1940103 Studentship MR/R502248/1 01/10/2017 30/12/2021 Nonie Alexander
 
Description Defining and redefining human disease at scale - the human phenome project (GSK)
Amount £851,000 (GBP)
Organisation GlaxoSmithKline (GSK) 
Sector Private
Country Global
Start 01/2020 
End 01/2021
 
Title Monte Carlo Method for Cluster Evaluation for EHR 
Description This is a tool to evaluate the structure of clusters found using unsupervised machine learning methods in Electronic health records based on comparison of a monte carlo generated null distribution. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? No  
Impact This tool will help improve the validity of patient subtypes found in clustering studies in EHR, thus increasing the likelihood that these subtypes could be clinically useful. 
 
Title Phenome-wide phenotyping algorithms 
Description Machine-readable versions (CSV files) of electronic health record phenotyping algorithms for Kuan V., Denaxas S., Gonzalez-Izquierdo A. et al. A chronological map of 308 physical and mental health conditions from 4 million individuals in the National Health Service published in the Lancet Digital Health - DOI 10.1016/S2589-7500(19)30012-3 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact Algorithms are being used in two additional projects: 1) Pathak N. et al Migrant EHR and 2) Denaxas S. et al GSK/phenomics 
URL https://github.com/spiros/chronological-map-phenotypes
 
Title Synthetic EHR for Cluster Benchmarking 
Description This tool is a wrapper for the synthetic health record generator SYNTHEA which transforms the data to useable and realistic health records that has distinct and motifiable parameters such as cluster number and cluster seperation, as well as realistic patient outcomes that can be used by researchers to benchmark methods for finding patient subgroups. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? No  
Impact This method will help researchers validate future tools for patient subtyping, and allow potential users of those methods to understand in greater detail the benefits of those methods. 
 
Title tofu 
Description Tofu is a Python library for generating synthetic UK Biobank data. The UK Biobank is a large open-access prospective research cohort study of 500,000 middle aged participants recruited in England, Scotland and Wales. The study has collected and continues to collect extensive phenotypic and genotypic detail about its participants, including data from questionnaires, physical measures, sample assays, accelerometry, multimodal imaging, genome-wide genotyping and longitudinal follow-up for a wide range of health-related outcomes. Tofu will generate synthetic data which conform to the structure of the baseline data UK Biobank sends researchers by generating random values: For categorical variables (single or multiple choices), a random value will be picked from the UK Biobank data dictionary for that field. For continous variables, a random value will be generated based on the distribution of values reported for that field on the UK Biobank showcase. For date and date/time fields, a random date will be generated. For all other fields, such as polymorphic fields, no data will be generated. Some general observations: The lookups directory contains lookups downloaded from the UK Biobank showcase - they might need to be updated when new fields become available. Data conform to the structure and schema of the baseline file but are otherwise nonsensical: no checks have been implemented across fields. All eid's (patient identifiers) generated from this tool are prefaced with 'fake' in order to avoid confusion with legitimate datasets. Dates randomly generated are between 1910 and 1990 again to avoid confusion with real data. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact Data has been used for training purposes at a postgraduate and postdoc level. 
URL https://github.com/spiros/tofu
 
Description MRC Methodology Research Panel 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited guest member on MRC Methodology Research Panel
Year(s) Of Engagement Activity 2018,2019,2020
 
Description Organisation of Precision medicine Panal discussion in conjection with HDRUK 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact I lead the organisation of a panal discussion hosted at the Wellcome trust in conjunction with HDRUK where industry professionals, academics and policy makers discussed how to properly harness the potential of precision medicine in the NHS. Aimed at PhD students.
Year(s) Of Engagement Activity 2020
 
Description Organised a Discussion panal on AI in healthcare in conjunction with HDRUK 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact I lead the organisation of a panal discussion hosted at the Wellcome trust where industry professionals, academics and policy makers discussed the holdbacks of AI in the NHS. Aimed at PhD students.
Year(s) Of Engagement Activity 2019
 
Description UK Science & Innovation Network & NIH Maternal Health & AI Research Symposium 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited to speak at the Maternal Health & AI Research Symposium in Boston, MA, USA.
Year(s) Of Engagement Activity 2020
 
Description Wellcome Innovations Flagships 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Member of the Innovations Flagships panel. Innovations Flagships support the development of exciting new products, technologies and other interventions to prevent or treat disease.
Year(s) Of Engagement Activity 2019,2020