The Arthropod Supertree of Life: An Online Interactive Resource for Testing Patterns in Arthropod Evolution and Biodiversity

Lead Research Organisation: The Natural History Museum
Department Name: Life Sciences


Around 80% of all animal species are arthropods: the group that includes insects, crabs and spiders. From their rapid radiation over 550 million years ago, they evolved to fill almost every habitat and exploit most imaginable lifestyles. Today, arthropods underpin virtually all ecological communities and food webs. They are of immense economic and medical importance to humans: as sources of food, crop pests and vectors of disease.

In order to understand the biodiversity of arthropods, to investigate the mechanisms by which they evolved, and to plan for their conservation, it is vitally important that we have a clear picture of their evolutionary relationships. There are many thousands of published evolutionary trees for particular arthropod groups at a shallow level (e.g., species within families) as well as many that attempt to resolve the more ancient branching events. These published trees represent an enormously rich resource, but one that largely remains locked within the pages of journals. This project will digitise 5,000 or more trees from across the arthropods, and make them available to all researchers electronically online.

Unfortunately, there are serious difficulties when researchers try to compare published trees: partly because they are derived from many different types of data (anatomy, molecules, genomes and fossils) and partly because they are analysed in an even greater variety of ways. More problematically, they often imply contradictory patterns of evolution. How, then, can we bring all of this information together to yield the giant, all-inclusive trees that evolutionary biologists and conservationists need, and do so without cherry-picking the data? Supertree methods are presently the most tractable approach, resolving conflict and finding overlap between the source trees using objective and repeatable rules. Such approaches have yielded the largest trees ever published.

Unfortunately, again, the construction of supertrees is presently very time-consuming and labour-intensive. Moreover, once constructed, it is extremely difficult or impossible to add new trees, to sub-sample the data (e.g., molecules or morphology), or to generate supertrees using different methods. Another core objective of this project is therefore to develop a set of software tools that will largely automate the process, providing inexperienced users with the ability to construct a supertree for any arthropod group at any taxonomic level (e.g., species, genera, families, etc.), and using multiple filtering criteria (e.g., only the most robust or recent source trees). We will then embed these tools in the website containing our data.

Existing, fast supertree methods are not without their problems, and another key objective of the project will therefore be to realise and program novel approaches (new Quartet Joining, Maximum Likelihood, Conservative and Bayesian methods are all under development by members of the team and our collaborators). The properties of these new methods need characterisation, and our arthropod dataset will offer the perfect test case against which to benchmark their performance.

We will then use our supertrees to ask a range of important questions in the study of arthropod biodiversity. Which evolutionary relationships are well-understood, and which are most uncertain and in need of further research? Which arthropod groups have an evolutionary branching sequence that matches the order in which they appear as fossils (such groups are useful for calibrating 'molecular clocks')? Is there a relationship between the age of arthropod groups and their present day diversity? We will also explore the utility of supertrees for addressing conservation priorities. Species that are alone on isolated branches of the supertree have greater than average 'evolutionary distinctiveness'. Where these are also imminently endangered, a powerful case can be mounted to prioritise their preservation.

Technical Summary

1. We will construct the largest ever supertrees of arthropods by synthesising 5,000+ peer-reviewed cladograms from the literature. The Researcher Co-I and Data Clerk will archive these in Newick format along with rich metadata in XML (character type, analysis type, branch support measures and complete bibliographic information) that will add significant value. A SynTax-funded prototype and proof of principle for crabs is already online.

2. We will develop and implement new supertree algorithms, including quartet joining, conservative, maximum likelihood and Bayesian methods. These will be incorporated into new versions of the open-source Supertree Toolkit (STK) alongside MRP variants, making it the most versatile supertree software available. We will also include tools to test for adequate overlap; a necessary prerequisite for efficient analyses. We will additionally incorporate 'taxonomic awareness' enabling trees to be produced at various hierarchical levels with no recoding of the source trees. We will also explore and program measures of supertree support, of congruence/conflict between data partitions, and of congruence between our trees and stratigraphic data. The arthropod case-study will be used for benchmarking.

3. All data and tools will be embedded online, and linked to analytical software written in Python and released under GNU GPL. A user-friendly GUI will enable anyone to produce supertrees easily but rigorously from any sub-sample of the data, and by multiple methods. Users will also be able to upload their own trees and metadata, enabling our resource to grow organically.

4. We will conduct several pilot studies for several focal clades (crabs, crayfish, bumblebees, dung beetles and butterflies). Specifically, we will collaborate with conservationists (Ben Collen and Richard Grenyer) to identify EDGE species, and to investigate the relationship between measures of phylogenetic spread and biogeographical distribution.

Planned Impact

Academic Impact

This project stands as a proof of principle for managing, curating and maximising the impact of a much larger database of published trees than assembled hitherto, along with its associated metadata. The project entails the development and implementation of important new methods for supertree construction, with applications for evolutionary biology, ecology, behavioural science and conservation. It will create a lasting legacy for the wider academic community in the form of the revised Supertree Toolkit (STK) and its associated website. The latter will comprise data, software and in-built data processing capabilities, all of which will benefit the wider biological community in future projects.
All of the new quartet joining, conservative, Bayesian and likelihood algorithms within the updated Supertree Toolkit will be released under an open source license, enabling other theoreticians and programmers to build upon its functionality. The front-end of the Toolkit will be easy for any researcher to use, and we envisage a lasting legacy from its redeployment on other groups of organisms.

This project is keenly supported by researchers on all major arthropod clades, for whom our resources will offer a comprehensive synthesis of the state of published knowledge. It will also highlight where disparate sources of data concur, and where there is significant conflict necessitating further research. Collaborative links have already been established with: Jonathan Coddington, Smithsonian (Arachnida); Jason Dunlop, Museum für Naturkunde (Chelicerata); Bill Shear, Hampden-Sydney College VA (Chilopoda); Adam Slipinski, CSIRO (Coleoptera); Geoff Boxshall, NHM (Copepoda and all Crustacea); Keith Crandall, Brigham Young (Decapoda); Greg Edgecombe, NHM (Diplopoda); Rudolf Meier, University of Singapore (Diptera); Richard Brusca (Isopoda); Stefan Richter, University of Rostock (Malacostraca); John Trueman (Odonata); Darren Mann, Oxford (Scarabaeida).
This project will generate robust supertrees for use in secondary analyses by conservationists, ecologists, ethologists and evolutionary biologists. More importantly, these workers will be able to produce their own trees using any desired data filtering and processing criteria, as well as using powerful new supertree methods. We have established links with Richard Grenyer (Geography, Oxford) and Ben Collen (Head of Indicators and Assessment Unit, ZSL) in order to design our resources with this objective in view.

Economic and Societal Impact

Conservative estimates of the economic costs of biodiversity loss are around £40 billion per annum, although these figures are not currently included in estimates of GDP. An equivalent loss of 7% of GDP is predicted by 2050 if current rates continue. Approaches to conservation that simply count species are crude; the additional information imparted by large phylogenies allows evolutionary distinctiveness to be factored into policy-making decisions. This project will hugely simplify the synthesis of existing phylogenetic information for all groups by providing new methods and tools. It will specifically and immediately enhance our understanding of arthropod biodiversity; a clade containing 80% of all animal species. If our resource helps to slow the decline by just one thousandth of one percent over the next ten years (a modest claim), it's value might conservatively be placed at £4 million.
Public interest in biodiversity loss is enormous. The scale of this project, and the sheer size and inclusiveness of the trees that we will generate will make our work of great public interest. By adding an accessible 'public front end' to our website (linked to 'ARKive' images and species notes), we will improve public understanding of phylogeny and evolution, and raise awareness of the importance of, and applications for, systematics in general. This is of vital importance at a time when teaching of the discipline is declining.


10 25 50

publication icon
Akanni WA (2015) Horizontal gene flow from Eubacteria to Archaebacteria and what it means for our understanding of eukaryogenesis. in Philosophical transactions of the Royal Society of London. Series B, Biological sciences

publication icon
Haggerty LS (2014) A pluralistic account of homology: adapting the models to the data. in Molecular biology and evolution

publication icon
Wilkinson M (2016) Comments on detecting rogue taxa using RogueNaRok in Systematics and Biodiversity

Description We have implemented the loose supertree method.
We have implemented and published a simple Maximum Likelihood supertree method and developed and implemented associated statistical tests of inferred trees.
We have implemented, published a Bayesian supertree method and applied it some high profile case studies.
We have developed and implemented and published methods for identifying ineffective overlap and rogue taxa in input trees and phylogenomic data sets.
We have addressed through experiment alternative approaches to incorporating previously unsampled taxa into phylogenies.
Exploitation Route We have developed general tools that can be used by biologists needing to build phylogenies, focussed on issues of accuracy and efficiency.
Sectors Agriculture, Food and Drink,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology

Description We have made use of them to investigate patterns of horizontal gene transfer in the Eubacteria and Archaea. The supertree methods we developed and implemented have been used by others in studying the origins of land plants (Puttick, M.N., Morris, J.L., Williams, T.A., Cox, C.J., Edwards, D., Kenrick, P., Pressel, S., Wellman, C.H., Schneider, H., Pisani, D. and Donoghue, P.C., 2018. The Interrelationships of Land Plants and the Nature of the Ancestral Embryophyte. Current Biology) and the evolution of Fungi (McCarthy, C.G. and Fitzpatrick, D.A., 2017. Multiple Approaches to Phylogenomic Reconstruction of the Fungal Kingdom. In Advances in genetics (Vol. 100, pp. 211-266). Academic Press.).
First Year Of Impact 2016
Sector Education
Impact Types Cultural

Description Collaboration with Chufei Tang and Professor Ding Yang from China Agricultural University 
Organisation China Agricultural University (CAU)
Country China 
Sector Academic/University 
PI Contribution We are producing trees of Mycetophilidae, Asilidae and Culicidae, as well as undertaking analyses of partitioned data sets for Diptera as a whole.
Collaborator Contribution Chufei has come to the UK for 6 months to carry out this work, and is providing taxonomic expertise, inputting data and running analyses.
Impact We have two papers in preparation. Chufei was awarded a grant that paid for her travel to the UK.
Start Year 2017
Description Collaboration with Sammy de Grave, Oxford 
Organisation University of Oxford
Department Oxford University Museum of Natural History
Country United Kingdom 
Sector Academic/University 
PI Contribution We are producing supertrees of Caridea and all Decapoda. Katie Davis has generated trees.
Collaborator Contribution Sammy is providing taxonomic expertise and writing the paper with us.
Impact Papers are in preparation
Start Year 2016
Title Concatabominations 
Description This software implements a approach to detecting rogue taxa and ineffective overlap in phylogenetic and phylogenomic data sets. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact This work is generating a lot of interest and has led to several seminar invitations. 
Title LUs.t. 
Description This is an implementation of a Maximum Likelihood supertree method 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact Proof of concept 
Description Effective Overlap talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Scientific seminars on rogue taxa and effective overlap in phylogenetic and phylogenomic studies. These have been given at the University of Greifswald Phylogenetics Meeting (2014), the University of Frankfurt (2015), the Systematics Association Biennial at Oxford (2016), the Museum Alexander Koenig, Bonn (2015), an EMBO short course in Phylogenomics in Iquitos, Peru (2016) and the University of Michigan (2016),
Year(s) Of Engagement Activity 2014,2015,2016
Description Outreach activity day for local school children 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact The Milner Centre for Evolution at the University of Bath celebrated its launch on the 21 September 2018 by inviting 120 local school children to come to the labs and learn more about evolution.
Year(s) Of Engagement Activity 2018
Description School Visit - St Augustine's - The Evidence for Evolution. Year 9 and 10 (120 pupils). Three talks. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact School Visit - St Augustine's - The Evidence for Evolution. Year 9 and 10 (120 pupils). Three talks.
Year(s) Of Engagement Activity 2019
Description Talk at Bath Royal Scientific and Literary Institution 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Re-running the Tape of Life. Is Evolution Predictable?
Is evolution an essentially open-ended process of unlimited potential, or is its outcome predictable? If we could re-run the Tape of Life would small perturbations to starting conditions yield radically different outcomes, or would the course of evolution follow a familiar path, differing only in details? Matthew Wills will explore how major animal groups have evolved according to a common template, seeking evidence for actively driven evolutionary trends in morphological complexity and possible rules governing mass extinctions.
Year(s) Of Engagement Activity 2017
Description Talk to pupils from the Social Mobility Foundation charity. The visitors are 34 students who are supported by the Social Mobility Foundation charity in raising their aspirations to apply to top universities. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Friday 2nd August, 1230 - 1330.

The visitors are 34 students who are supported by the Social Mobility Foundation charity in raising their aspirations to apply to top universities.
Year(s) Of Engagement Activity 2019