Efficient Bayesian phylogenomic dating with new models of trait evolution and rich diversities of living and fossil species
Lead Research Organisation:
UNIVERSITY COLLEGE LONDON
Department Name: Genetics Evolution and Environment
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
Timetrees provide much richer information about patterns of species diversification and associations with past climate and major geological events. Bayesian relaxed-clock dating is the method of choice for deriving time trees as it naturally integrates information from the molecules and fossils. However, the method replies on stochastic MCMC sampling of the posterior distribution of times and rates and is computationally too expensive for large datasets. Recently, several large-scale genome sequencing projects have been announced, such as the 66,000 UK eukaryotic genomes project. This genomic revolution is now accompanied by a computer tomography (CT) revolution that is generating vast amounts of scan data for thousands of museum specimens. Thus, methods that can integrate analysis of genomic data from high-throughput sequencing projects with trait data from CT scans are urgently needed. The main advantage of CT-scan data is that the rich diversities of fossil species in museum collections can now be integrated in the dating analysis, providing a more robust calibration of the molecular clock and improving the amount of information in timetrees about past diversification events. The two main aims of this project are: (i) to improve the computational efficiency of MCMC sampling in timetree inference in large phylogenies and (ii) developing new models of trait evolution for co-analysis of genomic and trait data. We will achieve (i) by developing new proposal algorithms to improve the mixing efficiency of the MCMC, and by improving the C code in the MCMCtree software through vectorization and parallelisation. Our preliminary data indicate we can reduce computing time by 2-5 folds, making analysis of thousands of species within rich. The new software and models will be tested in several high-profile real data analysis.
Planned Impact
We will implement the methods and algorithms to be developed in this project in the MCMCTREE program in the PAML software package, and distribute it at its web site, free of charge to academics. We aim to disseminate our new models and software as required in accordance to the Data Driven Biology and System Approaches to the Biosciences BBSRC priority areas. In particular, our new software will allow the analysis of very large datasets from complex phylogenetic ensembles. We will champion integration of rich fossil diversities together with high-throughput sequencing data to infer evolutionary timelines in large phylogenies, thus providing the tools urgently required to analyse the explosive amounts of sequence and phenotype trait data now available.
We will attend national and international meetings to present our research results. Methodological advances will be disseminated in this way, as well as through teaching in the world-leading MSc Palaeobiology at Bristol, and the advanced workshop on Computational Molecular Evolution (funded by the Wellcome Trust and EMBO) that is organized and co-instructed by Yang. These courses will provide much needed training to our academic beneficiaries on how to use our software and models. We will apply for funds from the Royal Society to run a 2-day Discussion Meeting in London (which is open to scientists and the general public) and an associated satellite workshop at the Royal Society's Chicheley Hall. The focus will be on integrating biological and geological timescales to elucidate the co-evolution of Earth and Life. The Chicheley Hall workshop will have the aim of training evolutionary biologists, bioinformaticians, palaeontologists and Earth System modellers to conduct molecular clock dating (and Earth System Modelling) using cutting-edge methods, showcasing the new models and new algorithms to be developed in this project. We will research and design a school's outreach module on the tree of life, evolutionary timescales and evolutionary rates, to be delivered through GeoBus and the Bristol Dinosaur Project STEM engagement projects, as well as making the teaching materials freely available to science teachers. We will engage the broader public in our science and its deliverables by transmitting our science through a science-art collaboration, achieved by hosting an artist in residence at Bristol University and a touring display of their work.
We will attend national and international meetings to present our research results. Methodological advances will be disseminated in this way, as well as through teaching in the world-leading MSc Palaeobiology at Bristol, and the advanced workshop on Computational Molecular Evolution (funded by the Wellcome Trust and EMBO) that is organized and co-instructed by Yang. These courses will provide much needed training to our academic beneficiaries on how to use our software and models. We will apply for funds from the Royal Society to run a 2-day Discussion Meeting in London (which is open to scientists and the general public) and an associated satellite workshop at the Royal Society's Chicheley Hall. The focus will be on integrating biological and geological timescales to elucidate the co-evolution of Earth and Life. The Chicheley Hall workshop will have the aim of training evolutionary biologists, bioinformaticians, palaeontologists and Earth System modellers to conduct molecular clock dating (and Earth System Modelling) using cutting-edge methods, showcasing the new models and new algorithms to be developed in this project. We will research and design a school's outreach module on the tree of life, evolutionary timescales and evolutionary rates, to be delivered through GeoBus and the Bristol Dinosaur Project STEM engagement projects, as well as making the teaching materials freely available to science teachers. We will engage the broader public in our science and its deliverables by transmitting our science through a science-art collaboration, achieved by hosting an artist in residence at Bristol University and a touring display of their work.
Organisations
People |
ORCID iD |
| Ziheng Yang (Principal Investigator) |
Publications
Feng Y
(2021)
Functional and Adaptive Significance of Promoter Mutations That Affect Divergent Myocardial Expressions of TRIM72 in Primates.
in Molecular biology and evolution
Flouri T
(2022)
Bayesian Phylogenetic Inference using Relaxed-clocks and the Multispecies Coalescent.
in Molecular biology and evolution
Huang J
(2021)
The Asymptotic Behavior of Bootstrap Support Values in Molecular Phylogenetics.
in Systematic biology
Jiao X
(2021)
Multispecies coalescent and its applications to infer species phylogenies and cross-species gene flow.
in National science review
Kapli P
(2023)
DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies
in Systematic Biology
Moody ERR
(2024)
The nature of the last universal common ancestor and its impact on the early Earth system.
in Nature ecology & evolution
Thawornwattana Y
(2022)
Full-Likelihood Genomic Analysis Clarifies a Complex History of Species Divergence and Introgression: The Example of the erato-sara Group of Heliconius Butterflies.
in Systematic biology
Tiley GP
(2020)
Molecular Clocks without Rocks: New Solutions for Old Problems.
in Trends in genetics : TIG
Zhu T
(2021)
Complexity of the simplest species tree problem.
in Molecular biology and evolution
Zhu T
(2022)
A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model.
in Molecular ecology
| Title | Data from Moody et al. (2024) The nature of the Last Universal Common Ancestor and its impact on the early Earth system. Nature Ecology and Evolution |
| Description | This dataset contains ALE reconciliations output as .uml_rec files, as well as all data associated with the timetree inference (see below for a detailed explanation). |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://data.bris.ac.uk/data/dataset/405xnm7ei36d2cj65nrirg3ip/ |
| Title | Supplementary data for: DNA sequences are as useful as protein sequences for inferring deep phylogenies |
| Description | Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences, based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2022 |
| Provided To Others? | Yes |
| URL | https://datadryad.org/stash/dataset/doi:10.5061/dryad.sbcc2fr85 |