Addressing the problem of deep coalescence in ancient radiations: Resolving the explosive radiation of the Lophotrochozoa.
Lead Research Organisation:
University College London
Department Name: Genetics Evolution and Environment
Abstract
Using molecular data (sequences of genes and proteins) to resolve the relationships between living organisms has been a major international research goal for over 25 years. Our improved knowledge of the relationships between species has already radically affected our understanding of how the great diversity of life evolved. Recently, completing this project seemed in reach, thanks to plentiful molecular data from the so called next generation sequencing. Despite the ready availability of data, some major parts of the tree of life have nevertheless proved impossible to resolve.While this is a general problem affecting many different groups of organisms we will focus on resolving the relationships between a significant part portion of the animal kingdom.
One of the most striking features of the animal tree is the short time frame over which the constituent major groups (phyla such as chordates, annelids, arthropods and molluscs) diverged from one another over half a billion years ago in so-called rapid, or even explosive radiations. Rapid radiations can cause two problems. First is a lack of phylogenetic signal due to the limited opportunity to accumulate genetic changes that identify individual groups. Second is the insidious problem of Incomplete Lineage Sorting (ILS) or deep coalescence. Due to polymorphism in the ancestral species, gene sequences may not all coalesce within each species but earlier causing gene-trees to diverge from the actual species-trees. Put more simply, ILS results in different genes telling us different things about the relationships of the organisms.
Recent efforts to solve these problems focussed on compiling very large genomic datasets that were concatenated into a single data set and analysed as single supergenes. This approach masks the incongruence between genes we expect to exist and may lead to positively misleading estimates. To solve this final and most recalcitrant aspects of the animal phylogeny we must use methods designed to deal with data incongruence.
Within the animals we will focus specifically on resolving the relationships between the phyla within the Lophotrochozoa. This is one of three ancient groups of bilaterally symmetrical animals. Lophotrochozoa contains approximately half of animal phyla including flatworms, annelids, and molluscs.
Our ultimate goals are I. to use the example of the Lophotrochozoa to better understand the processes involved in rapid radiations and how best to resolve them. II. to use our understanding to reconstruct the phylogenetic relationships between the animal phyla in the Lophotrochozoa. These practical goals will result in the phylogenetic framework that is the essential foundation for understanding the evolution of this major portion of animal diversity.
Our work should give us insight into the processes at work during an ancient, explosive radiation. Such radiation events have repeatedly had a major impact on the evolution of life meaning that developing methods to address such questions will be of broad use. The easy problems having been solved, many remaining phylogenetic questions across the tree of life will involve similar rapid radiations.
One of the most striking features of the animal tree is the short time frame over which the constituent major groups (phyla such as chordates, annelids, arthropods and molluscs) diverged from one another over half a billion years ago in so-called rapid, or even explosive radiations. Rapid radiations can cause two problems. First is a lack of phylogenetic signal due to the limited opportunity to accumulate genetic changes that identify individual groups. Second is the insidious problem of Incomplete Lineage Sorting (ILS) or deep coalescence. Due to polymorphism in the ancestral species, gene sequences may not all coalesce within each species but earlier causing gene-trees to diverge from the actual species-trees. Put more simply, ILS results in different genes telling us different things about the relationships of the organisms.
Recent efforts to solve these problems focussed on compiling very large genomic datasets that were concatenated into a single data set and analysed as single supergenes. This approach masks the incongruence between genes we expect to exist and may lead to positively misleading estimates. To solve this final and most recalcitrant aspects of the animal phylogeny we must use methods designed to deal with data incongruence.
Within the animals we will focus specifically on resolving the relationships between the phyla within the Lophotrochozoa. This is one of three ancient groups of bilaterally symmetrical animals. Lophotrochozoa contains approximately half of animal phyla including flatworms, annelids, and molluscs.
Our ultimate goals are I. to use the example of the Lophotrochozoa to better understand the processes involved in rapid radiations and how best to resolve them. II. to use our understanding to reconstruct the phylogenetic relationships between the animal phyla in the Lophotrochozoa. These practical goals will result in the phylogenetic framework that is the essential foundation for understanding the evolution of this major portion of animal diversity.
Our work should give us insight into the processes at work during an ancient, explosive radiation. Such radiation events have repeatedly had a major impact on the evolution of life meaning that developing methods to address such questions will be of broad use. The easy problems having been solved, many remaining phylogenetic questions across the tree of life will involve similar rapid radiations.
Technical Summary
Resolving ancient, rapid radiations is among the most challenging tasks in phylogenetics. Rapid divergences allow the accumulation of limited phylogenetic information on short internal branches. Long terminal branches mean multiple substitutions, eroding phylogenetic signal.
An even more problematic product of rapid and ancient divergences is the phenomenon of deep coalescence or incomplete lineage sorting (ILS). Because of polymorphism in the ancestral species, gene sequences may not all coalesce within each species but earlier, in the ancestral population. Deep coalescence means some gene trees do not match the species phylogeny.
The multispecies coalescent (MSC) model avoids the incorrect assumption that all markers/loci share the same genealogical history. Within this framework, ILS or incongruence among gene-trees can become a rich source of information regarding the duration of the radiation and the evolutionary history.
Focussing on the unresolved relationships between phyla of the major animal clade of Lophotrochozoa as an important example, we will gather a dataset of up to 1000 nuclear genes from existing and novel transcriptomic data sets from diverse Lophotrochozoans and outgroups.
We will investigate the subset of all possible topologies that are supported by each of these 1000 loci to gauge the level of discordance between genes. We will compare the results of reconstructing trees using standard analyses of concatenated data with those from coalescence aware methods.
Using extensive simulation based on the MSC model parameters measured from our data, we will explore the robustness of different approaches to violations of model assumptions (variable substitution rates both globally and across branches; drifts in amino acid composition across sites and across branches and non homogeneous patterns of substitution across sites). Simulation will allow us to understand the problems of ILS and untangle the lophotrochozoan polytomy.
An even more problematic product of rapid and ancient divergences is the phenomenon of deep coalescence or incomplete lineage sorting (ILS). Because of polymorphism in the ancestral species, gene sequences may not all coalesce within each species but earlier, in the ancestral population. Deep coalescence means some gene trees do not match the species phylogeny.
The multispecies coalescent (MSC) model avoids the incorrect assumption that all markers/loci share the same genealogical history. Within this framework, ILS or incongruence among gene-trees can become a rich source of information regarding the duration of the radiation and the evolutionary history.
Focussing on the unresolved relationships between phyla of the major animal clade of Lophotrochozoa as an important example, we will gather a dataset of up to 1000 nuclear genes from existing and novel transcriptomic data sets from diverse Lophotrochozoans and outgroups.
We will investigate the subset of all possible topologies that are supported by each of these 1000 loci to gauge the level of discordance between genes. We will compare the results of reconstructing trees using standard analyses of concatenated data with those from coalescence aware methods.
Using extensive simulation based on the MSC model parameters measured from our data, we will explore the robustness of different approaches to violations of model assumptions (variable substitution rates both globally and across branches; drifts in amino acid composition across sites and across branches and non homogeneous patterns of substitution across sites). Simulation will allow us to understand the problems of ILS and untangle the lophotrochozoan polytomy.
Planned Impact
Our work will have the most immediate impact in two obvious areas of biology:
Specific major question in Zoology/Evolution/Palaeontology.
We are addressing an important and long standing question in the evolution of the animal phyla. The Lophotrochozoan clade contains roughly half of all animal phyla but the relationships between these groups is entirely unclear. If we are able to solve this problem then an accurate phylogeny of the Lophotrochozoa will mean we have resolved the relationships between a significant portion of the animal tree.
The impact of this stems not simply from understanding this phylogenetic question. An accurate phylogeny is the essential underlying framework for studying all aspects of the evolution of this group. To understand the evolution of the varied morphologies that exist in this very diverse group, knowledge of the phylogenetic background is essential. The same is true for comparative analyses of genetics, genomes, genome structure etc. Finally, there are many Cambrian fossils currently recognised as stem lophotrochozoan species but impossible to interpret further due to the impossibility of relating them to modern groups. Understanding how living lophotrochozoans are related and the likely pathways of their evolution will help interpret the morphological characters of the fossil members of this group.
General problem in resolving the relationships across the tree of life.
Rapid radiations are a feature of many groups across the tree of life. The difficulty in resolving these stems from lack of informative characters and this may be seriously compounded by the problem of deep coalescence. We are studying this phenomenon in one ancient radiation and the lessons we learn and the approaches we develop will be much more broadly applicable. Familiar instances include the radiations of mammals, birds and land plants but the phenomenon of rapid radiations is very common across the tree of life.
The methods developed in this project, for understanding and resolving rapid radiations and deep coalescence, will provide powerful tools for phylogenetic analysis of genomic/transcriptomic datasets. The methods will be widely applicable and results obtained from such analyses, applied across the tree of life may be critical to effective decision making concerning any down stream application of an accurately resolved phylogeny - biodiversity studies, palaeontology, comparative genomics, comparative morphology.
Specific major question in Zoology/Evolution/Palaeontology.
We are addressing an important and long standing question in the evolution of the animal phyla. The Lophotrochozoan clade contains roughly half of all animal phyla but the relationships between these groups is entirely unclear. If we are able to solve this problem then an accurate phylogeny of the Lophotrochozoa will mean we have resolved the relationships between a significant portion of the animal tree.
The impact of this stems not simply from understanding this phylogenetic question. An accurate phylogeny is the essential underlying framework for studying all aspects of the evolution of this group. To understand the evolution of the varied morphologies that exist in this very diverse group, knowledge of the phylogenetic background is essential. The same is true for comparative analyses of genetics, genomes, genome structure etc. Finally, there are many Cambrian fossils currently recognised as stem lophotrochozoan species but impossible to interpret further due to the impossibility of relating them to modern groups. Understanding how living lophotrochozoans are related and the likely pathways of their evolution will help interpret the morphological characters of the fossil members of this group.
General problem in resolving the relationships across the tree of life.
Rapid radiations are a feature of many groups across the tree of life. The difficulty in resolving these stems from lack of informative characters and this may be seriously compounded by the problem of deep coalescence. We are studying this phenomenon in one ancient radiation and the lessons we learn and the approaches we develop will be much more broadly applicable. Familiar instances include the radiations of mammals, birds and land plants but the phenomenon of rapid radiations is very common across the tree of life.
The methods developed in this project, for understanding and resolving rapid radiations and deep coalescence, will provide powerful tools for phylogenetic analysis of genomic/transcriptomic datasets. The methods will be widely applicable and results obtained from such analyses, applied across the tree of life may be critical to effective decision making concerning any down stream application of an accurately resolved phylogeny - biodiversity studies, palaeontology, comparative genomics, comparative morphology.
Organisations
Publications
Flouri T
(2022)
Bayesian Phylogenetic Inference using Relaxed-clocks and the Multispecies Coalescent.
in Molecular biology and evolution
Kapli P
(2020)
Topology-dependent asymmetry in systematic errors affects phylogenetic placement of Ctenophora and Xenacoelomorpha.
in Science advances
Kapli P
(2023)
DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies.
in Systematic biology
Kapli P
(2021)
Systematic errors in phylogenetic trees.
in Current biology : CB
Kapli P
(2021)
Lack of support for Deuterostomia prompts reinterpretation of the first Bilateria.
in Science advances
Kapli P
(2020)
Phylogenetic tree building in the genomic age.
in Nature reviews. Genetics
Natsidis P
(2021)
Systematic errors in orthology inference and their effects on evolutionary analyses.
in iScience
Álvarez-Carretero S
(2023)
Beginner's Guide on the Use of PAML to Detect Positive Selection.
in Molecular biology and evolution
Description | We have conducted an experiment using simulation to test a key idea. We have shown that inequalities of the evolutionary process, especially inequalities in rate of evolution, would be expected to result in specific artefacts as observed using real data. This has been published in Science Advances We worked on the phylogeny of the deuterostomes as a prelude (simpler problem) to the Lophotrochozoa problem. We have published a paper on this work in Science Advances We also looked at the use of gene presence absence as phylogenetic characters and showed, using simulation, problems with published papers. This is published. We have written a review paper outlining the use of large data sets for phylogenetics. We have written a primer giving a detailed explanation of the problem of systematic error in tree reconstruction. We have submitted a paper on the use of nucleotides versus amino acids for phylogeny reconstruction. We have assembled the data set required to answer the Lophotrochozoa problem. The data set is more extensive than originally envisioned. We have worked on the methods required to answer the Lophotrochozoa problem We expect to publish papers on the Lophotrochozoa problem this year as well as additional papers on metazoan phylogeny supported by this grant (work by postdoc Paschalia Kapli) |
Exploitation Route | The papers published are already well cited and therefore of interest to other researchers. |
Sectors | Education |
Description | The unreliable clade Deuterostomia and implications for bilaterian evolution |
Amount | £215,006 (GBP) |
Funding ID | RPG-2021-433 |
Organisation | The Leverhulme Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 03/2022 |
End | 03/2025 |
Title | Data from manuscript: "Lack of support for Deuterostomia prompts reinterpretation of the first Bilateria." |
Description | Data and scripts from paper "Lack of support for Deuterostomia prompts reinterpretation of the first Bilateria." |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | NA |
URL | https://github.com/MaxTelford/MonoDeutData |
Title | Supplementary data for: DNA sequences are as useful as protein sequences for inferring deep phylogenies |
Description | Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences, based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
URL | https://datadryad.org/stash/dataset/doi:10.5061/dryad.sbcc2fr85 |
Title | The folder contains results and code produced in the framework of the study "Topology dependent asymmetry in systematic error affects Ctenophora and Xenacoelomor |
Description | The folder contains results and code produced in the framework of the study "Topology dependent asymmetry in systematic error affects Ctenophora and Xenacoelomor |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | na |
URL | https://github.com/MaxTelford/XenoCtenoSims |
Title | This repository contains code and results produced in the framework of the study "Systematic errors in orthology inference: a bug or a feature for evolutionary analyses?" |
Description | This repository contains code and results produced in the framework of the study "Systematic errors in orthology inference: a bug or a feature for evolutionary analyses?" |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Has prompted new collaboration that will use our data. |
URL | https://github.com/MaxTelford/Gainsandlosses |
Description | Article in the Conversation on line magazine. |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Digest of our new research publication for general public. |
Year(s) Of Engagement Activity | 2020 |
URL | https://theconversation.com/is-our-most-distant-animal-relative-a-sponge-or-a-comb-jelly-our-study-p... |
Description | Interviewed as part of BBC world service programme on animal evolution |
Form Of Engagement Activity | A broadcast e.g. TV/radio/film/podcast (other than news/press) |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Interviewed as part of BBC world service programme on animal evolution. |
Year(s) Of Engagement Activity | 2018 |
URL | https://www.bbc.co.uk/programmes/w3cswhkp |
Description | Research talk to IGNITE training workshop |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | IGNITE training network of researchers. A lecture as part of a graduate school. |
Year(s) Of Engagement Activity | 2021 |