Modelling the genetic variation in populations distributed across non-homogeneous spaces
Lead Research Organisation:
University of Oxford
Department Name: Statistics
Abstract
In medical research one is often interested in the time of an event. For example the age at which people develop a certain disease. Typically, only a fraction of the participants of the study will develop this disease and the majority does not. Furthermore the study does not last forever, so some subjects may develop the disease once observation has stopped. Hence studies result in datasets in which some individuals have an associated time, and some have a lower bound on the time saying that the event of interest did not occur by the end of the observation period.
Studies may further ask if this time of getting a disease depends on characteristics of the individuals in the study. For example, do certain genes have an effect on the time of getting a disease? Does body mass index have an effect? And they may try to model this relationship between covariates and time of an event.
Our key objectives are to develop statistical methods to study the relationship between covariates and right-censored times. We aim to extend machine learning techniques that are available for uncensored data to allow for right-censored data.
The main ESPRC research area we hereby hope to contribute is the `Statistics and applied probability' area: we use mathematical methods relating to Hilbert spaces and probability theory to propose new statistical methods. Our work is motivated by data found in medical research and biology and is therefore also relevant to the research areas `Biological informatics' and `Mathematical biology'
Studies may further ask if this time of getting a disease depends on characteristics of the individuals in the study. For example, do certain genes have an effect on the time of getting a disease? Does body mass index have an effect? And they may try to model this relationship between covariates and time of an event.
Our key objectives are to develop statistical methods to study the relationship between covariates and right-censored times. We aim to extend machine learning techniques that are available for uncensored data to allow for right-censored data.
The main ESPRC research area we hereby hope to contribute is the `Statistics and applied probability' area: we use mathematical methods relating to Hilbert spaces and probability theory to propose new statistical methods. Our work is motivated by data found in medical research and biology and is therefore also relevant to the research areas `Biological informatics' and `Mathematical biology'
People |
ORCID iD |
Dino Sejdinovic (Primary Supervisor) | |
David Rindt (Student) |
Publications
Fernandez Tamara
(2019)
A kernel log-rank test of independence for right-censored data
in arXiv e-prints
Rindt David
(2019)
A kernel- and optimal transport- based test of independence between covariates and right-censored lifetimes
in arXiv e-prints
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/N509711/1 | 30/09/2016 | 29/09/2021 | |||
1929862 | Studentship | EP/N509711/1 | 30/09/2017 | 29/06/2021 | David Rindt |
Description | In medical research one is often interested in the time of an event. For example the age at which people develop a certain disease. Typically, only a fraction of the participants of the study will develop this disease and the majority does not. Furthermore the study does not last forever, so some subjects may develop the disease once observation has stopped. Hence studies result in datasets in which some individuals have an associated time, and some have a lower bound on the time saying that the event of interest did not occur by the end of the observation period. Studies may further ask if this time of getting a disease depends on characteristics of the individuals in the study. For example, do certain genes have an effect on the time of getting a disease? Does body mass index have an effect? And they may try to model this relationship between covariates and time of an event. To help answer these questions we proposed the first two nonparametric (meaning you do not assume a specific form of the relationship, for example, you do not assume the relationship to be linear) ways to test such dependence between characteristics and event time. The methods we propose form an extension of often used machine learning methods to this so called right-censored data. In addition to this we are currently doing more general investigations into testing the dependence between characteristics. |
Exploitation Route | This may be used by those with longitudinal data to test the dependence between characteristics and the observation time. It may also be extended to feature selection methods and regression methods (we are currently attempting this). Then hopefully it will be used in medical research! |
Sectors | Healthcare Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology |
URL | https://arxiv.org/abs/1912.03784 |
Title | Kernel Logrank and Opt HSIC |
Description | These are nonparametric tests to see if a covariate relates to a right-censored lifetime. |
Type Of Material | Data analysis technique |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | None yet. |
URL | https://arxiv.org/abs/1912.03784 |
Description | Collaboration with Tamara Fernandez and Arthur Gretton from the Gatsby Institute of Computational Neuroscience at UCL |
Organisation | University College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We wrote the paper https://arxiv.org/abs/1912.03784 together, which is now under review. |
Collaborator Contribution | We did the research and mathematics together. |
Impact | We wrote this paper together https://arxiv.org/abs/1912.03784. |
Start Year | 2019 |
Title | Kernel Logrank and Opthsic code |
Description | This is code to run the algorithms we developed. |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
Impact | The code is online for two weeks now. There are none yet. |
URL | https://github.com/davidrindt/kernel_logrank_python_code |