Improving Bayesian methods for estimating divergence times integrating genomic and trait data

Lead Research Organisation: University College London
Department Name: Genetics Evolution and Environment

Abstract

After one species splits into two, their genes and genomes will evolve independently. If the rate of evolution is constant over time, the genetic differences between species will accumulate at a fixed pace, proportional to the time of species separation. Thus molecules can serve as a clock, keeping time of species separation by the accumulated changes. If fossil records or geological events can be used to assign an absolute geological time to a species divergence event on the evolutionary tree, one can convert all calculated genetic distances into absolute geological times. This rationale for molecular clock dating has recently been extended to deal with variable evolutionary rate over time in the so-called relaxed-clock models. For fast-evolving viral DNA, the different sampling times of the viral sequences allows us to similarly calibrate the molecular and to obtain estimates of the absolute divergence times and evolutionary rates. In this project, we will develop statistical models of trait evolution, which will be used to analyse morphological trait data for both living and extinct species to generate fossil calibrations, which are crucially important to molecular dating analysis. Such models of trait evolution will also allow us to study the correlation between viral molecular sequence evolution and viral genotypes such as antigenic drift. We will apply the new methods to analyse real datasets to date major divergence events in the tree of life, such as the divergences of the human and the apes, the primates, the animals, and the flowering plants.

Technical Summary

Molecular clock dating methods have been improved recently to accommodate the violation of the clock by the use of relaxed-clock models and to incorporate uncertainties in fossil calibrations through the use of soft bounds. Yet, representation of errors and uncertainties in the fossil record in a molecular dating analysis remains a challenging task. In this project, we will implement models of trait evolution to conduct Bayesian MCMC analysis of morphological traits in fossil and extent species. The resulting posterior for divergence times will be used as calibration densities for molecular clock dating. The new models and methods will be implemented in the MCMCtree program in the paml package, and will be applied to large datasets to date divergence events in the metazoan, the hominoid and primates, and the flowering plants. We will also analyse skull measurements of fossil and modern species within the hominoids, to generate posterior estimates of the hominoid divergence times, which will be used in a multispecies coalescent analysis of the hominoid genomic sequence data, to generate estimates of human-chimpanzee divergence time and of the mutation rate. Our mutation rate estimates will be vitally important to testing hypotheses concerning the origin and migration patterns of modern humans. We will use the same trait-evolution models to analyse viral phenotype (such as influenza virus epitopes) and its correlation with the evolutionary rate of the bird flu protein hemagglutinin.

Planned Impact

We will implement the methods and algorithms to be developed in this project in the MCMCTREE program in the PAML software package, and distribute it at its web site, free of charge to academics. We will also develop a project-specific website including YouTube-hosted video manuals for the software.

We will attend local and international meetings to present our research results. Methodological advances will be disseminated in this way, as well as through teaching in the world-leading MSc Palaeobiology at Bristol, and the advanced workshop on Computational Molecular Evolution (funded by the Wellcome Trust and the EMBO) that is organized and co-instructed by Yang.

Publications

10 25 50

publication icon
Donoghue PC (2016) The evolution of methods for establishing evolutionary timescales. in Philosophical transactions of the Royal Society of London. Series B, Biological sciences

publication icon
Dos Reis M (2016) Notes on the birth-death prior with fossil calibrations for Bayesian estimation of species divergence times. in Philosophical transactions of the Royal Society of London. Series B, Biological sciences

 
Title Data from: Bayesian estimation of species divergence times using correlated quantitative characters 
Description Discrete morphological data have been widely used to study species evolution, but the use of quantitative (or continuous) morphological characters is less common. Here, we implement a Bayesian method to estimate species divergence times using quantitative characters. Quantitative character evolution is modelled using Brownian diffusion with character correlation and character variation within populations. Through simulations, we demonstrate that ignoring the population variation (or population "noise") and the correlation among characters leads to biased estimates of divergence times and rate, especially if the correlation and population noise are high. We apply our new method to the analysis of quantitative characters (cranium landmarks) and molecular data from carnivoran mammals. Our results show that time estimates are affected by whether the correlations and population noise are accounted for or ignored in the analysis. The estimates are also affected by the type of data analysed, with analyses of morphological characters only, molecular data only, or a combination of both; showing noticeable differences among the time estimates. Rate variation of morphological characters among the carnivoran species appears to be very high, with Bayesian model selection indicating that the independent-rates model fits the morphological data better than the autocorrelated-rates model. We suggest that using morphological continuous characters, together with molecular data, can bring a new perspective to the study of species evolution. Our new model is implemented in the MCMCtree computer program for Bayesian inference of divergence times. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
URL https://datadryad.org/stash/dataset/doi:10.5061/dryad.q7rf263
 
Title Estimation of species divergence times in presence of cross-species gene flow 
Description Cross-species introgression can have significant impacts on phylogenomic reconstruction of species divergence events. Here, we used simulations to show how the presence of even a small amount of introgression can bias divergence time estimates when gene flow is ignored in the analysis. Using advances in analytical methods under the multispecies coalescent (MSC) model, we demonstrate that by accounting for incomplete lineage sorting and introgression using large phylogenomic data sets this problem can be avoided. The multispecies-coalescent with-introgression (MSci) model is capable of accurately estimating both divergence times and ancestral effective population sizes, even when only a single diploid individual per species is sampled. We characterize some general expectations for biases in divergence time estimation under three different scenarios: 1) introgression between sister species, 2) introgression between non-sister species, and 3) introgression from an unsampled (i.e., ghost) outgroup lineage. We also conducted simulations under the isolation-with-migration (IM) model, and found that the MSci model assuming episodic gene flow was able to accurately estimate species divergence times despite high levels of continuous gene flow. We estimated divergence times under the MSC and MSci models from two published empirical datasets with previous evidence of introgression, one of 372 target enrichment loci from baobabs (Adansonia), and another of 1,000 transcriptome loci from fourteen species of the tomato relative, Jaltomata. The empirical analyses not only confirm our findings from simulations, demonstrating that the MSci model can reliably estimate divergence times, but also show that divergence time estimation under the MSC can be robust to the presence of small amounts of introgression in empirical datasets with extensive taxon sampling. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL http://datadryad.org/stash/dataset/doi:10.5061/dryad.zs7h44j8x