Modelling the genetic variation in populations distributed across non-homogeneous spaces

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

In medical research one is often interested in the time of an event. For example the age at which people develop a certain disease. Typically, only a fraction of the participants of the study will develop this disease and the majority does not. Furthermore the study does not last forever, so some subjects may develop the disease once observation has stopped. Hence studies result in datasets in which some individuals have an associated time, and some have a lower bound on the time saying that the event of interest did not occur by the end of the observation period.

Studies may further ask if this time of getting a disease depends on characteristics of the individuals in the study. For example, do certain genes have an effect on the time of getting a disease? Does body mass index have an effect? And they may try to model this relationship between covariates and time of an event.

Our key objectives are to develop statistical methods to study the relationship between covariates and right-censored times. We aim to extend machine learning techniques that are available for uncensored data to allow for right-censored data.

The main ESPRC research area we hereby hope to contribute is the `Statistics and applied probability' area: we use mathematical methods relating to Hilbert spaces and probability theory to propose new statistical methods. Our work is motivated by data found in medical research and biology and is therefore also relevant to the research areas `Biological informatics' and `Mathematical biology'

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509711/1 01/10/2016 30/09/2021
1929862 Studentship EP/N509711/1 01/10/2017 30/06/2021 David Rindt
 
Description In medical research one is often interested in the time of an event. For example the age at which people develop a certain disease. Typically, only a fraction of the participants of the study will develop this disease and the majority does not. Furthermore the study does not last forever, so some subjects may develop the disease once observation has stopped. Hence studies result in datasets in which some individuals have an associated time, and some have a lower bound on the time saying that the event of interest did not occur by the end of the observation period.



Studies may further ask if this time of getting a disease depends on characteristics of the individuals in the study. For example, do certain genes have an effect on the time of getting a disease? Does body mass index have an effect? And they may try to model this relationship between covariates and time of an event.

To help answer these questions we proposed the first two nonparametric (meaning you do not assume a specific form of the relationship, for example, you do not assume the relationship to be linear) ways to test such dependence between characteristics and event time. The methods we propose form an extension of often used machine learning methods to this so called right-censored data.

In addition to this we are currently doing more general investigations into testing the dependence between characteristics.
Exploitation Route This may be used by those with longitudinal data to test the dependence between characteristics and the observation time. It may also be extended to feature selection methods and regression methods (we are currently attempting this). Then hopefully it will be used in medical research!
Sectors Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL https://arxiv.org/abs/1912.03784
 
Title Kernel Logrank and Opt HSIC 
Description These are nonparametric tests to see if a covariate relates to a right-censored lifetime. 
Type Of Material Data analysis technique 
Year Produced 2019 
Provided To Others? Yes  
Impact None yet. 
URL https://arxiv.org/abs/1912.03784
 
Description Collaboration with Tamara Fernandez and Arthur Gretton from the Gatsby Institute of Computational Neuroscience at UCL 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution We wrote the paper https://arxiv.org/abs/1912.03784 together, which is now under review.
Collaborator Contribution We did the research and mathematics together.
Impact We wrote this paper together https://arxiv.org/abs/1912.03784.
Start Year 2019
 
Title Kernel Logrank and Opthsic code 
Description This is code to run the algorithms we developed. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact The code is online for two weeks now. There are none yet. 
URL https://github.com/davidrindt/kernel_logrank_python_code