Machine Learning for high-throughput Phenotyping and Comorbidity Mapping in EHR data

Lead Research Organisation: University College London
Department Name: Institute of Health Informatics


Project aim: To investigate Signature Methods as a feature extraction method to improve the accuracy of early heart failure (HF) disease diagnosis from Electronic Healthcare Records (EHR). The project will investigate the task using structured EHR data from collected primary and secondary care providers.

What: The signature transform operates on a stream of data (often a time series) and extracts information about order and area. In EHR data this translates to extracting information about the order of diagnosis and procedures events.

Why: One desired outcome of improved feature extraction using signatures is to improve the efficiency of screening programs - reducing the number of patients needed to be screened. More broadly, the signature transform has the potential to improve the performance of any EHR prediction task and have not been applied in this area. HF is a common benchmark disease for comparison of EHR machine learning methodologies.

How: Andre will be implementing a predictive model using the PyTorch and the Signatory python packages (Kidger and Lyons, 2020). The signature transform model will be compared with a number of competitive benchmarks cited in literature. This will involve creating a case-control cohort, data pipeline, modelling library and evaluation framework. The code resulting from the project will be made available on GitHub after submission.

Kidger, P., Lyons, T., 2020. Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU. ArXiv200100706 Cs Stat.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S021612/1 01/04/2019 30/09/2027
2298376 Studentship EP/S021612/1 23/09/2019 30/09/2023 Andre Vauvelle