Developing phylogenetic inference methods using hybrid, continuous and discrete, data, based on single-cell sequencing technologies

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

The domain of application of this project is in bio-mathematics and the overall project is supervised by the bio-mathematics department of Imperial College, London. The objective is to infer the posterior distribution
of phylogenetic trees based on single-cell sequencing data. Among other methods, Bayesian inference will be used to infer parameter values with respect to the nucleotide substitution models, the tree topologies, the
branch lengths.

It has now been proven in many studies that DNA mutations and mtDNA heteroplasmies are linked to genetic defects of various diseases. Single-cell omic-technologies have also been developed in the past few
years that allow manipulation and analysis of larger datasets of DNA sequences [5]. Raw data in our project will be single-cell (sc) sequencing data (e.g. DNA, mRNA, mitochondrial DNA : : :). An ability to infer the
evolution of genes based on these observed single-cell sequences can drastically affect our understanding of somatic-DNA diseases, and eventually contribute to put in place targeted therapies. Therefore the impact
of such work can then be seen on the whole chain from disease prevention to individual cure.

References
[1] Joseph H Camin and Robert R Sokal. "A method for deducing branching sequences in phylogeny". In:
Evolution (1965), pp. 311-326.
[2] Luigi Luca Cavalli-Sforza, Italo Barrai, and Anthony WF Edwards. "Analysis of human evolution under
random genetic drift". In: Cold Spring Harbor symposia on quantitative biology. Vol. 29. Cold Spring
Harbor Laboratory Press. 1964, pp. 9-20.
[3] Alexei J Drummond and Andrew Rambaut. "BEAST: Bayesian evolutionary analysis by sampling trees".
In: BMC evolutionary biology 7.1 (2007), pp. 1-8.
[4] Joseph Felsenstein. "Evolutionary trees from gene frequencies and quantitative characters: finding maximum
likelihood estimates". In: Evolution (1981), pp. 1229-1242.
[5] Jeongwoo Lee, Daehee Hwang, et al. "Single-cell multiomics: technologies and data analysis methods".
In: Experimental & Molecular Medicine 52.9 (2020), pp. 1428-1442.
3

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2442432 Studentship EP/S023151/1 03/10/2020 30/09/2024 Tresnia Berah