Bayesian implementation of the multispecies-coalescent-with-introgression (MSci) model for analysis of population genomic data

Lead Research Organisation: University College London
Department Name: Genetics Evolution and Environment

Abstract

Genomes from different species contain rich information about the evolutionary history of the species. By comparing DNA sequences from different species or different individuals of the same species we can work out how the species are related, when they diverged from each other, whether and when there was cross-species hybridisation. Nevertheless, to extract this information from our genomes, powerful statistical models and efficient computational algorithms are necessary. The multispecies-coalescent-with-introgression (MSci) model provides a natural framework for comparative analysis of genomic sequence data, accommodating the random fluctuations of biological reproduction when genetic materials are passed over generations, random accumulations of genetic mutations as well as possible cross-species hybridisation events. We will implement the MSci model in our Bayesian Markov chain Monte Carlo simulation program, so that it can be used to estimate species phylogenies and species divergence times, ancestral population sizes, and the time and rate of hybridisation. Those parameters will provide important insights into the origin of species. We will apply our newly developed methods to analyse genomic datasets from Heliconius butterflies, Malagasy mouse lemurs, and lizards, generated by our collaborators.

Technical Summary

We will implement the multispecies-coalescent-with-introgression (MSci) model in our Bayesian Markov chain Monte Carlo (MCMC) program BPP, and improve the computational and mixing efficiency of the MCMC algorithms. The MSci model can be used to estimate species phylogenies and species divergence times, ancestral population sizes, and the time and magnitude of hybridisation events. Those parameters will provide important insights into the process of species formation. The Bayesian methods are superior to heuristic methods in that they are able to accommodate ancestral polymorphism and incomplete lineage sorting, gene tree-species tree conflicts, and uncertainties and errors in the gene trees due to limited information in the sequence data. We will develop and evaluate novel MCMC proposals to improve the mixing efficiency of the trans-model MCMC algorithms. We will parallelize the program to make efficient use of modern multi-processor multi-core computer hardware. We will design a friendly web-based graphical user interface (GUI). We will apply our newly developed methods to analyse genomic datasets from Heliconius butterflies, Malagasy mouse lemurs, and lizards, in collaboration with evolutionary biologists.

Planned Impact

Delimiting species boundaries and inferring species phylogenies are of vital importance to assessing the current biodiversity, to understanding the impact of environmental and societal changes on species extinctions and loss of biodiversity, and to developing effective conservation policies. Methods for inferring species phylogenies and cross-species introgression events to be developed in this project will become powerful tools for analysis of genomic datasets, and results obtained from such analyses will be critical to effective decision making concerning biodiversity management and conservation. The methods can also be used to identify species, and are useful for tracking illegal wildlife trade.

Publications

10 25 50

publication icon
Flouri T (2023) Efficient Bayesian inference under the multispecies coalescent with migration. in Proceedings of the National Academy of Sciences of the United States of America

publication icon
Huang J (2022) Inference of Gene Flow between Species under Misspecified Models. in Molecular biology and evolution