Deep-Learning Algorithms for Evolutionary Inferences from Genomic and Ecological Data

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

One of the most elusive questions in evolutionary biology is to what extent adaptation has shaped genomes of extant species and populations. The exposure to novel environmental conditions imposed selective pressures, which led to genetic adaptations and differentiation between populations (Quach, Quintana-Murci 2017). The identification of signatures of natural selection in the genome has therefore the two-fold importance of (i) assessing the ability for endangered species to respond to climate change and (ii) localising functional variants. Due to the limited power of current methods to detect selection signatures, we are still far from a comprehensive view of how neutral and selective events have characterised species' evolution and their genomes. Artificial intelligence, or machine learning (ML), algorithms maximise the predictive accuracy by automatically and iteratively tuning their internal parameters while remaining relatively unconscious to the phenomenon they are trying to predict. A recently introduced class of supervised ML algorithms is deel learning, an inference framework based on artificial neural networks. Deep learning is a subject of intensive research and has provided impressive results in pattern (e.g. speech) recognition, computer vision, robotics and bioinformatics (e.g. identification of splice-sites). Despite their predictive powerfulness, application of deep learning algorithms in evolutionary genomics is still in its infancy (Sheehan and Song, 2016). In population genetics, variation of genomes within and between populations is used to infer historical events, including size changes, characteristic of the species of interest. Thanks to the technological advances of RNA/DNA sequencing, we are now able to collect and analyse a large amount of genomic data. However, population genetics data are inherently noisy and multidimensional and models underlying them are similarly complex, limiting the unveiling of novel insights. Therefore, deep learning algorithms have the potential to solve these problems and address some of the long-standing issues in this field. This project will explore the applicability of deep learning algorithms, specifically convolutional neural networks, to infer evolutionary paramters, such as historical changes of population size or sites targeted by natural selection, from large-scale genomic data of extant and ancient (when available) samples. While similar strategies have been applied to infer binary parameters (e.g. presense or not of recombination hotspots, as in Chen et al. 2018), we will expand these methods by including the possibility of multiclassification and estimation of continuous parameters, a task currently challenging in deep learning. The introduction of deep learning algorithms in population genetics is key for extracting meaningful information from eco-genomics data. The project has the potential to reap the benefits of artificial intelligence to understand how species evolved and adapted to their environments, with obvious implications for conservation strategies. Given our unique application in evolutionary genomics, we foresee the scope for introducing either novel architectures or neural layers which can be applied to other fields.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
NE/P012345/1 01/10/2017 30/09/2027
2366357 Studentship NE/P012345/1 28/09/2019 31/05/2027 Calum Pennington
NE/W503198/1 01/04/2021 31/03/2022
2366357 Studentship NE/W503198/1 28/09/2019 31/05/2027 Calum Pennington