Addressing the problem of deep coalescence in ancient radiations: Resolving the explosive radiation of the Lophotrochozoa.

Lead Research Organisation: University College London
Department Name: Genetics Evolution and Environment

Abstract

Using molecular data (sequences of genes and proteins) to resolve the relationships between living organisms has been a major international research goal for over 25 years. Our improved knowledge of the relationships between species has already radically affected our understanding of how the great diversity of life evolved. Recently, completing this project seemed in reach, thanks to plentiful molecular data from the so called next generation sequencing. Despite the ready availability of data, some major parts of the tree of life have nevertheless proved impossible to resolve.While this is a general problem affecting many different groups of organisms we will focus on resolving the relationships between a significant part portion of the animal kingdom.

One of the most striking features of the animal tree is the short time frame over which the constituent major groups (phyla such as chordates, annelids, arthropods and molluscs) diverged from one another over half a billion years ago in so-called rapid, or even explosive radiations. Rapid radiations can cause two problems. First is a lack of phylogenetic signal due to the limited opportunity to accumulate genetic changes that identify individual groups. Second is the insidious problem of Incomplete Lineage Sorting (ILS) or deep coalescence. Due to polymorphism in the ancestral species, gene sequences may not all coalesce within each species but earlier causing gene-trees to diverge from the actual species-trees. Put more simply, ILS results in different genes telling us different things about the relationships of the organisms.

Recent efforts to solve these problems focussed on compiling very large genomic datasets that were concatenated into a single data set and analysed as single supergenes. This approach masks the incongruence between genes we expect to exist and may lead to positively misleading estimates. To solve this final and most recalcitrant aspects of the animal phylogeny we must use methods designed to deal with data incongruence.

Within the animals we will focus specifically on resolving the relationships between the phyla within the Lophotrochozoa. This is one of three ancient groups of bilaterally symmetrical animals. Lophotrochozoa contains approximately half of animal phyla including flatworms, annelids, and molluscs.

Our ultimate goals are I. to use the example of the Lophotrochozoa to better understand the processes involved in rapid radiations and how best to resolve them. II. to use our understanding to reconstruct the phylogenetic relationships between the animal phyla in the Lophotrochozoa. These practical goals will result in the phylogenetic framework that is the essential foundation for understanding the evolution of this major portion of animal diversity.

Our work should give us insight into the processes at work during an ancient, explosive radiation. Such radiation events have repeatedly had a major impact on the evolution of life meaning that developing methods to address such questions will be of broad use. The easy problems having been solved, many remaining phylogenetic questions across the tree of life will involve similar rapid radiations.

Technical Summary

Resolving ancient, rapid radiations is among the most challenging tasks in phylogenetics. Rapid divergences allow the accumulation of limited phylogenetic information on short internal branches. Long terminal branches mean multiple substitutions, eroding phylogenetic signal.

An even more problematic product of rapid and ancient divergences is the phenomenon of deep coalescence or incomplete lineage sorting (ILS). Because of polymorphism in the ancestral species, gene sequences may not all coalesce within each species but earlier, in the ancestral population. Deep coalescence means some gene trees do not match the species phylogeny.

The multispecies coalescent (MSC) model avoids the incorrect assumption that all markers/loci share the same genealogical history. Within this framework, ILS or incongruence among gene-trees can become a rich source of information regarding the duration of the radiation and the evolutionary history.

Focussing on the unresolved relationships between phyla of the major animal clade of Lophotrochozoa as an important example, we will gather a dataset of up to 1000 nuclear genes from existing and novel transcriptomic data sets from diverse Lophotrochozoans and outgroups.

We will investigate the subset of all possible topologies that are supported by each of these 1000 loci to gauge the level of discordance between genes. We will compare the results of reconstructing trees using standard analyses of concatenated data with those from coalescence aware methods.

Using extensive simulation based on the MSC model parameters measured from our data, we will explore the robustness of different approaches to violations of model assumptions (variable substitution rates both globally and across branches; drifts in amino acid composition across sites and across branches and non homogeneous patterns of substitution across sites). Simulation will allow us to understand the problems of ILS and untangle the lophotrochozoan polytomy.

Planned Impact

Our work will have the most immediate impact in two obvious areas of biology:

Specific major question in Zoology/Evolution/Palaeontology.
We are addressing an important and long standing question in the evolution of the animal phyla. The Lophotrochozoan clade contains roughly half of all animal phyla but the relationships between these groups is entirely unclear. If we are able to solve this problem then an accurate phylogeny of the Lophotrochozoa will mean we have resolved the relationships between a significant portion of the animal tree.

The impact of this stems not simply from understanding this phylogenetic question. An accurate phylogeny is the essential underlying framework for studying all aspects of the evolution of this group. To understand the evolution of the varied morphologies that exist in this very diverse group, knowledge of the phylogenetic background is essential. The same is true for comparative analyses of genetics, genomes, genome structure etc. Finally, there are many Cambrian fossils currently recognised as stem lophotrochozoan species but impossible to interpret further due to the impossibility of relating them to modern groups. Understanding how living lophotrochozoans are related and the likely pathways of their evolution will help interpret the morphological characters of the fossil members of this group.

General problem in resolving the relationships across the tree of life.
Rapid radiations are a feature of many groups across the tree of life. The difficulty in resolving these stems from lack of informative characters and this may be seriously compounded by the problem of deep coalescence. We are studying this phenomenon in one ancient radiation and the lessons we learn and the approaches we develop will be much more broadly applicable. Familiar instances include the radiations of mammals, birds and land plants but the phenomenon of rapid radiations is very common across the tree of life.

The methods developed in this project, for understanding and resolving rapid radiations and deep coalescence, will provide powerful tools for phylogenetic analysis of genomic/transcriptomic datasets. The methods will be widely applicable and results obtained from such analyses, applied across the tree of life may be critical to effective decision making concerning any down stream application of an accurately resolved phylogeny - biodiversity studies, palaeontology, comparative genomics, comparative morphology.
 
Description We have conducted an experiment using simulation to test a key idea. We have shown that inequalities of the evolutionary process, especially inequalities in rate of evolution, would be expected to result in specific artefacts as observed using real data. This has been published in Science Advances

We worked on the phylogeny of the deuterostomes as a prelude (simpler problem) to the Lophotrochozoa problem. We have published a paper on this work in Science Advances

We also looked at the use of gene presence absence as phylogenetic characters and showed, using simulation, problems with published papers. This is published.

We have written a review paper outlining the use of large data sets for phylogenetics.

We have written a primer giving a detailed explanation of the problem of systematic error in tree reconstruction.

We have submitted a paper on the use of nucleotides versus amino acids for phylogeny reconstruction.

We have assembled the data set required to answer the Lophotrochozoa problem. The data set is more extensive than originally envisioned.

We have worked on the methods required to answer the Lophotrochozoa problem

We expect to publish papers on the Lophotrochozoa problem this year as well as additional papers on metazoan phylogeny supported by this grant (work by postdoc Paschalia Kapli)
Exploitation Route The papers published are already well cited and therefore of interest to other researchers.
Sectors Education

 
Description The unreliable clade Deuterostomia and implications for bilaterian evolution
Amount £215,006 (GBP)
Funding ID RPG-2021-433 
Organisation The Leverhulme Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 03/2022 
End 03/2025
 
Title Data from manuscript: "Lack of support for Deuterostomia prompts reinterpretation of the first Bilateria." 
Description Data and scripts from paper "Lack of support for Deuterostomia prompts reinterpretation of the first Bilateria." 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact NA 
URL https://github.com/MaxTelford/MonoDeutData
 
Title Supplementary data for: DNA sequences are as useful as protein sequences for inferring deep phylogenies 
Description Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences, based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis strategies to an empirical dataset to infer the metazoan phylogeny. Our results from both simulated and real data suggest that DNA sequences may be as useful as proteins for inferring deep phylogenies and should not be excluded from such analyses. Analysis of DNA data under nucleotide models has a major computational advantage over protein-data analysis, potentially making it feasible to use advanced models that account for among-site and among-lineage heterogeneity in the nucleotide-substitution process in inference of deep phylogenies. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://datadryad.org/stash/dataset/doi:10.5061/dryad.sbcc2fr85
 
Title The folder contains results and code produced in the framework of the study "Topology dependent asymmetry in systematic error affects Ctenophora and Xenacoelomor 
Description The folder contains results and code produced in the framework of the study "Topology dependent asymmetry in systematic error affects Ctenophora and Xenacoelomor 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact na 
URL https://github.com/MaxTelford/XenoCtenoSims
 
Title This repository contains code and results produced in the framework of the study "Systematic errors in orthology inference: a bug or a feature for evolutionary analyses?" 
Description This repository contains code and results produced in the framework of the study "Systematic errors in orthology inference: a bug or a feature for evolutionary analyses?" 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact Has prompted new collaboration that will use our data. 
URL https://github.com/MaxTelford/Gainsandlosses
 
Description Article in the Conversation on line magazine. 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Digest of our new research publication for general public.
Year(s) Of Engagement Activity 2020
URL https://theconversation.com/is-our-most-distant-animal-relative-a-sponge-or-a-comb-jelly-our-study-p...
 
Description Interviewed as part of BBC world service programme on animal evolution 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Interviewed as part of BBC world service programme on animal evolution.
Year(s) Of Engagement Activity 2018
URL https://www.bbc.co.uk/programmes/w3cswhkp
 
Description Research talk to IGNITE training workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact IGNITE training network of researchers. A lecture as part of a graduate school.
Year(s) Of Engagement Activity 2021