Combining quantitative estimates of phenotype likelihood from EHR with genomic information for genetic discovery and clinical risk prediction

Lead Research Organisation: University College London
Department Name: Primary Care and Population Sciences

Abstract

Readily available phenotypic data from electronic health records is an underused resource that has significant potential to inform diagnostic and therapeutic decisions. Using data science methods, we can aggregate, condense and refine vast amounts of health data to maximise its predictive value to support medical decision-making.

This project seeks to develop clinical grade phenotype definitions from electronic health records that derive quantitative measures of phenotype likelihood or diagnostic certainty. These quantitative measures of phenotype risk have applications for both research and clinical practice; for example, to inform the prioritisation and interpretation of specialist tests, such as genetic or imaging tests.

The initial focus of this project is on cardiovascular disease for which both complex and familial inheritance is described, including heart failure and cardiomyopathies. However, the aim is to develop methods and tools (such as software packages or online applications) that are scalable to address the medical phenome.

Specific aims of the project are to derive EHR-based phenotypic risk scores for Mendelian disorders and later common disease in the UK Biobank, based on a method reported by Bastarache et al. These disease-specific scores are based on the number of relevant phenotypes found in a patient's health records.

Successful outcomes include new genetic discovery e.g. of unknown variants related to the diseases under investigation, potential to find misdiagnosed patients and to effect change in clinics and enable earlier diagnoses and improved treatment, whilst supporting physician's decision-making.

Publications

10 25 50