Generative adversarial networks for demographic inferences of nonmodel species from genomic data

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Biological & Behavioural Sciences

Abstract

Understanding the temporal and geographic movement of populations is vital to address key questions in evolutionary and conservation biology. Whilst the generation of high-throughput genomic data enabled the inference of population genomic parameters at unprecedented rate, large-scale datasets also prompted the development of novel computational techniques.

In recent years, the predictive power provided by machine learning algorithms, in particular deep learning, has led to breakthrough discoveries in many disciplines. Nevertheless, the application of deep learning in evolutionary genomics is still in its infancy. Deep learning algorithms exhibits several advantages over commonly-used inferential approaches in population genomics, as they can handle large data sets with minimal compression and are theoretically universal approximators of arbitrarily complex models.

The intrinsic statistical uncertainty associated with genomic sequencing data, the lack of natural training data sets, and the computational resources needed have hampered the exploitation of these powerful techniques to generate novel findings in evolutionary biology. These challenges are particularly prominent in the study of nonmodel species, where prior knowledge of key parameters is typically missing.

A promising strategy to partly overcome such barriers is given by the recent application of Generative Adversarial Networks (GANs), a branch of deep learning methods, which have been successfully applied to generate artificial genomes and estimate cryptic evolutionary parameters. GANs consist of two deep neural networks which are trained together and, at the end, the algorithm generates simulations that are indistinguishable from real examples (as in the case of "Deepfake" methods in Artificial Intelligence). Thus, the final simulator provides estimates of model parameters.

In this project, we aim to to pilot the design, implementation, and deployment of a novel GAN architetcure for population genomic data. As an illustration, we will focus on the inference on demographic parameters, , including temporal changes in population size and migration rate, describing the recent evolution of Anopheles mosquito populations among three villages in Burkina Faso.

As the first objective, we will adapt a recently proposed GAN architecture for population genomic data to incorporate multiple populations with unequal sizes. As the second objective, we will train the algorithm by integrating simulations with extensive genomic data from Anopheles mosquito populations. We will include a significant technological advance by integrating a model selection step to discriminate among competing evolutionary scenarios.

By estimating the migration rate of mosquito populations among villages, we will be able to assist predictions on the spread of resistance mutations and support molecular surveillance and intervention strategies at local scale. In fact, it is still unclear to what extent resistant mutations can spread across the entire continent as different studies have led to contrasting findings on the extent of migration between Anopheles populations. Upon completion of this pilot study, we will be able to scale the deep learning algorithm to all available mosquito populations from sub-Saharan Africa and infer gene flow at the continental scale.

Additionally, the novel deep learning framework will be applicable to all mutations potentially associated with resistance or other notable phenotypes. It can be further extended to model complex modes of adaptation (e.g. via introgression or polygenic adaptation) and to other species of importance.

Publications

10 25 50
 
Description We demonstrated the applicability of generative AI for predicting evolutionary histories in non-model species from genomic data.
Exploitation Route We are in the processing of releasing a software.
Sectors Digital/Communication/Information Technologies (including Software)

Environment

 
Description Expand on the use of pg-gan software 
Organisation Haverford College
Country United States 
Sector Academic/University 
PI Contribution We expanded the potential use of pg-gan software.
Collaborator Contribution They provided us early access to prototype changes to the original code.
Impact Implementation of new version of pg-gan software.
Start Year 2023