Overcoming the morphology problem of phylogenetics

Lead Research Organisation: University of Manchester
Department Name: Earth Atmospheric and Env Sciences

Abstract

Phylogenetic trees underpin reconstructions of evolutionary history and tests of evolutionary hypotheses. As such, they address major questions, such as, how and when did our modern biota come into being, and what is the shape and distribution of biodiversity and extinction? Advances in the acquisition and analysis of genetic sequence data have led to an increasing emphasis and reliance on molecular phylogenies, yet phenotypic evidence (morphology) remains vital. It enables us to make the link between organisms and their environment and thus demonstrate the mechanisms of evolutionary change. It is also the only way to include extinct taxa, and therefore provide a deep time perspective; fossils break up large gaps between the depauperate modern fauna, unlock sequences of evolutionary change (so called 'missing links') and provide a timescale for estimating evolutionary rates, including calibration of molecular clocks. Although morphology is acknowledged to be essential for phylogeny, it is also widely recognized as intrinsically problematic. Developmental and functional linkage can result in suites or modules of non-independent morphological characters and thus misleading patterns with respect to phylogeny reconstruction. These problematic phenomena are acknowledged to exist, but they have been largely ignored; morphological data are routinely taken at 'face value' and are treated as equivalent by both molecular and palaeontological studies. Furthermore, we have no idea about the distribution or influence of these phenomena. As such, our understanding of a range of evolutionary events is undermined, and our ability to reconstruct evolutionary history is limited.

To directly address the limitations of morphology we will: 1) Quantify the extent and distribution of the problem across the tree of life and across morphological modules; 2) develop methodological toolkits for reliable phylogenetic inference using morphology, with independent molecular and stratigraphic data acting as benchmarks; 3) apply those methods to important evolutionary events that rely on interpretations of morphology but have thus far proved intractable or equivocal, for example, human origins. The outcomes will address specific evolutionary hypotheses, and provide powerful tools, workflows and guides for future analyses by providing new ways of working. This project will bring morphology out of the 19th century and into the 21st.

Technical Summary

Morphological data are essential for phylogenetic reconstruction, but are acknowledged to be intrinsically problematic. Character independence is violated due to over-saturation, and functional and developmental linkage. Furthermore, homoplasy is widespread due to convergence, and subjectivity of character definition.

This project aims to identify the scale and distribution of these problematic phylogenetic phenomena and establish new ways of working that minimize their effects. Identification of misleading morphological moieties will be achieved by 1) investigating the performance of logical a priori subdivisions and modules of morphology (e.g. cranium vs post-cranium, hard vs soft, appendage vs non-appendage, flower vs leaf), and 2) investigating the distribution and prevalence of cliques of inter-dependent and ecologically linked characters as identified a posteriori. The performance of different phylogenetic techniques will be directly compared and assessed: 1) implied weighting which minimizes homoplasy, 2) clique removal to minimize character over-saturation, 3) and Bayesian inference to overcome the problems associated with traditional parsimony analysis. Congruence with independent data will be used as a benchmark - primarily molecular trees, which are comparatively well-validated and based on abundant data. No single instance of congruence is authoritative, but meta-analysis of combined data from different sources, clades, and authors mitigates the effect of spurious congruence that might occur in any one dataset. Meta-analysis therefore provides a very powerful tool to identify legitimate broad scale patterns that are consistent for data from a range of sources, authors and clades. The application of these meta-analyses will provide an invaluable guide for future phylogenetic research.

Planned Impact

The research findings, namely new ways of working with phylogeny, will have direct impact in a range of areas from evolution biology, to biodiversity and conservation. Much of this impact will be achieved via academic intermediaries following uptake of those guidelines, application to morphological data, and then publication and utilization of the resulting phylogenies. To maximize this, research findings will be disseminated to a broad range of phylogenetic practitioners through varied conferences and publication in high-profile and specialized journals. Outside of this, more direct impact will be achieved in the field of systematics. Studies of biodiversity and conservation are entirely reliant on taxonomy (i.e. species classification), which is in turn based on interpretation of phenotype and morphological characteristics. As such, our research findings will have a direct impact on taxonomic practices, and thus study and preservation of biodiversity. Our roadmap of aspects of morphology that better reflect evolutionary history will therefore be informative for taxonomy. We will directly target the communities of academic and non-academic taxonomists (museum professionals, curators, collectors etc.) through dedicated meetings and articles.

More broadly, the research will be a great vehicle to improve public understanding of science. As evidenced by extensive media coverage of palaeontology, the fossil record and evolution are of great interest to the public. This is especially true of human origins, which is one of the key case studies to be analysed in this project. The results from this part of the project will be prioritized for outreach, both through exhibition in collaboration with the Manchester Museum, as well as other events and press releases. Both the PI and Co-I have a strong track record of public engagement through participation in events, online presence, and broader dissemination.

Publications

10 25 50

 
Description Initial investigations have revealed that different partitions of anatomical data (morphology) can convey different kinds of evolutionary information. In mammals, teeth are found to be poor representatives for reconstructing evolutionary history, which is unfortunate given their previous use in this context. Hard and soft characters (i.e. bones and teeth versus soft, less fossilizable tissues like muscles and nerves) are found to convey different phylogenetic signals, and these differences vary in scale and direction across different animal groups. More recent work has tested different methods for reconstructing phylogenetic trees using different techniques (i.e. Bayesian and parsimony methods). The investigations have used empirical data collected from the literature and new bespoke computer simulations to create data evolutionary realistic data. The results are mixed depending on which criteria are used. We have also compiled a database of empirical datasets which will have used to compare differences in molecular and morphological rates of evolution, and contrast different methods of phylogenetic reconstruction.
Exploitation Route The results provide guidance about the best way to reconstruct evolutionary trees from morphological data (from both living groups and fossils).
We have also created a new piece of open software available for those that what to generate their own phylogenetic data.
We will also be providing a database of empirical data sets that can be used by other researchers.
Sectors Culture

Heritage

Museums and Collections

Other

 
Title Data from: Dental data perform relatively poorly in reconstructing mammal phylogenies: morphological partitions evaluated with molecular benchmarks 
Description  
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
 
Title Data from: Differences between hard and soft phylogenetic data 
Description When building the tree of life, variability of phylogenetic signal is often accounted for by partitioning gene sequences and testing for differences. The same considerations however are rarely applied to morphological data, potentially undermining its use in evolutionary contexts. Here we apply partition heterogeneity tests to 59 animal datasets to demonstrate that significant differences exist between the phylogenetic signal conveyed by 'hard' and 'soft' characters (bones, teeth and shells versus myology, integument etc). Furthermore, the morphological partitions differ significantly in their consistency relative to independent molecular trees. The observed morphological differences correspond with missing data biases, and as such their existence presents a problem not only for phylogeny reconstruction, but also for interpretations of fossil data. Evolutionary inferences drawn from clades in which hard, readily-fossilizable characters are relatively less consistent and different from other morphology (mammals, bivalves) may be less secure. More secure inferences might be drawn from the fossil record of clades that exhibit fewer differences, or exhibit more consistent hard characters (fishes, birds). In all cases it will be necessary to consider the impact of missing data on empirical data, and the differences that exist between morphological modules. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
URL https://datadryad.org/stash/dataset/doi:10.5061/dryad.541pt
 
Title Data from: Parsimony, not Bayesian analysis, recovers more stratigraphically congruent phylogenetic trees 
Description Reconstructing evolutionary histories requires accurate phylogenetic trees. Recent simulation studies suggest that probabilistic phylogenetic analyses of morphological data are more accurate than traditional parsimony techniques. Here we use empirical data to compare Bayesian and parsimony phylogenies in terms of their congruence with the distribution of age ranges of the component taxa. Analysis of 167 independent morphological data matrices of fossil tetrapods finds that Bayesian trees exhibit significantly lower stratigraphic congruence than the equivalent parsimony trees. As such, taking stratigraphic data as an independent benchmark indicates that parsimony analyses are more accurate for phylogenetic reconstruction of morphological data. The discrepancy between simulated and empirical studies may result from historic data peaking practises or some complexities of empirical data as yet unaccounted for. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
URL https://datadryad.org/stash/dataset/doi:10.5061/dryad.f9v3778
 
Title Supplemental material for: Morphological phylogenetics evaluated using novel evolutionary simulations 
Description Evolutionary inferences require reliable phylogenies. Morphological data has traditionally been analysed using maximum parsimony, but recent simulation studies have suggested that Bayesian analyses yield more accurate trees. This debate is ongoing, in part, because of ambiguity over modes of morphological evolution and a lack of appropriate models. Here we investigate phylogenetic methods using two novel simulation models - one in which morphological characters evolve stochastically along lineages and another in which individuals undergo selection. Both models generate character data and lineage splitting simultaneously: the resulting trees are an emergent property, rather than a fixed parameter. Standard consensus methods for Bayesian searches (Mki) yield fewer incorrect nodes and quartets than the standard consensus trees recovered using equal weighting and implied weighting parsimony searches. Distances between the pool of derived trees (most parsimonious or posterior distribution) and the true trees - measured using Robinson-Foulds (RF), subtree prune and regraft (SPR), and tree bisection reconnection (TBR) metrics - demonstrate that this is related to the search strategy and consensus method of each technique. The amount and structure of homoplasy in character data differs between models. Morphological coherence, which has previously not been considered in this context, proves to be a more important factor for phylogenetic accuracy than homoplasy. Selection-based models exhibit relatively lower homoplasy, lower morphological coherence, and higher inaccuracy in inferred trees. Selection is a dominant driver of morphological evolution, but we demonstrate that it has a confounding effect on numerous character properties which are fundamental to phylogenetic inference. We suggest that the current debate should move beyond considerations of parsimony versus Bayesian, towards identifying modes of morphological evolution and using these to build models for probabilistic search methods. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL http://datadryad.org/stash/dataset/doi:10.5061/dryad.4b8gtht8h
 
Title Supplementary Information from Differences between hard and soft phylogenetic data 
Description Details of the morphological data matrices and molecular trees 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
URL https://rs.figshare.com/articles/Supplementary_Information_from_Differences_between_hard_and_soft_ph...
 
Title Supplementary Information from Differences between hard and soft phylogenetic data 
Description Details of the morphological data matrices and molecular trees 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
URL https://rs.figshare.com/articles/Supplementary_Information_from_Differences_between_hard_and_soft_ph...
 
Title Supplementary Table from Parsimony, not Bayesian analysis, recovers more stratigraphically congruent phylogenetic trees 
Description Supplementary Results 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
URL https://rs.figshare.com/articles/dataset/Supplementary_Table_from_Parsimony_not_Bayesian_analysis_re...
 
Title Supplementary Table from Parsimony, not Bayesian analysis, recovers more stratigraphically congruent phylogenetic trees. 
Description Supplementary Results 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
URL https://rs.figshare.com/articles/dataset/Supplementary_Table_from_Parsimony_not_Bayesian_analysis_re...
 
Title Supplementary Table from Parsimony, not Bayesian analysis, recovers more stratigraphically congruent phylogenetic trees. 
Description Supplementary Results 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
URL https://rs.figshare.com/articles/dataset/Supplementary_Table_from_Parsimony_not_Bayesian_analysis_re...
 
Title palaeoware/trevosim: TREvoSim v2.0.0 
Description This is the release of a new version (v2.0.0) of TREvoSim. The first release and the underlying model was described in detail in the following paper: Keating, J.N., Sansom, R.S., Sutton, M.D., Knight, C.G. & Garwood, R.J. 2020. Morphological phylogenetics evaluated using novel evolutionary simulations. Systematic Biology 69(5): 897-912. doi:10.1093/sysbio/syaa012 Version 2.0.0 accompanies the preprint and paper below: Mongiardino Koch, N., Garwood, R.J. & Parry, L.A. Preprint. Fossils improve phylogenetic analyses of morphological characters. bioRxiv. doi: 10.1101/2020.12.03.410068v1 Mongiardino Koch, N., Garwood, R.J. & Parry, L.A. 2021. Fossils improve phylogenetic analyses of morphological characters. Proceedings of the Royal Society B: Biological Sciences The code is archived on zenodo.org: Documentation: TREvoSim Online Documentation Change log: The changes in v2.0.0 are described and discussed fully in the associated paper. In brief, these allow TREvoSim v2.0.0 trees and data to achieve benchmarks calculated from twelve total evidence analyses, as well as respresenting ongoing development of the package. Changes are: Addition of multiple playing fields option. Playing fields can have independent or identical environments. The option to overwrite a random individual when a juveniles is returned to the playing field (instead of the least fit one). User control of the fitness target in the fitness algorithm (see Keating et al. 2020). A fitness histogram functionality to assess the fitness landscape in the simulation. User control of the strength of selection (see Mongiardino Koch et al. 2021 for discussion). Multiple environments per playing field. Organism fitness is assessed against each environment, and the fitness of an organism is defined by the environment they are best suited to. The code has been refactored (the biggest change being to the underlying data structures/classes), and the simulation now uses Qt QRandomGenerator tools rather than incorporating random data. A user-accessible test suite has been added. Release information: Windows A zip containing all required binaries can be downloaded from the assets below. Alternatively an installer is provided. See notes below: Note 1: The .zip archive contains an executable TREvoSim_2.0.0.exe. The .zip can be extracted and the program run by double clicking this.exe file in the ./bin folder. All the required libraries have been included and are found in the ./bin folder. Mac A zip containing TREovSim can be downloaded from the assets below. To install the software, drag and drop the required .app folder(s) into the Applications folder. You may be required to the approve the software in security and privacy settings before it will launch. Linux Any Linux users willing to test a Linux build should contact palaeoware@gmail.com. 
Type Of Technology Software 
Year Produced 2021 
URL https://zenodo.org/record/3619355
 
Description Guest academic lecture series 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Visiting regional schools (around 5) to deliver a presentation to sixth form and year 11 students on evolution and how my research relates to evolution
Year(s) Of Engagement Activity 2018,2019
 
Description Pint of Science talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Contributed a talk and discussion session for the Manchester Pint of Science programme to around 50 members of the general public.
Year(s) Of Engagement Activity 2015,2018
URL https://pintofscience.co.uk/event/the-rotten-side-of-ancient-life/