Phylogenetic combinatorics: A mathematical theory for the analysis of phylogenetic trees and networks

Lead Research Organisation: University of East Anglia
Department Name: Computing Sciences

Abstract

According to Charles Darwin's theory of evolution, the present day species can be related by an evolutionary tree, much in the same way as members of a family can be related by a family tree. One of the central problems in biology is to work out what this tree is (or, usually small parts of this tree), since this can be helpful, for example, in understanding how organisms work. The flourishing area of phylogenetics is concerned with solving this problem. However, since it is by no means an easy problem, advanced mathematical theories are required to help discover its solution. The main aim of this project is to investigate and develop such mathematical theories. We expect that these theories will not only enable biologists to better understand their data, but that they will almost certainly lead to new research directions in pure mathematics, much in the same way that physics inspired new mathematics in the 20th century.

Publications

10 25 50
publication icon
Dress A (2009) Barriers in metric spaces in Applied Mathematics Letters

publication icon
Dress A (2010) Species, clusters and the 'Tree of life': a graph-theoretic perspective. in Journal of theoretical biology

publication icon
Dress A (2008) Compatible decompositions and block realizations of finite metrics in European Journal of Combinatorics

publication icon
Dress A (2011) Blocks and Cut Vertices of the Buneman Graph in SIAM Journal on Discrete Mathematics

publication icon
Dress A (2010) An Algorithm for Computing Cutpoints in Finite Metric Spaces in Journal of Classification

publication icon
Dress AW (2012) 'Lassoing' a phylogenetic tree I: basic properties, shellings, and covers. in Journal of mathematical biology

publication icon
Grünewald S (2009) Characterizing weak compatibility in terms of weighted quartets in Advances in Applied Mathematics

publication icon
Grünewald S (2009) Maximum parsimony for tree mixtures. in IEEE/ACM transactions on computational biology and bioinformatics

publication icon
Huber KT (2008) The complexity of deriving multi-labeled trees from bipartitions. in Journal of computational biology : a journal of computational molecular cell biology

 
Description According to Charles Darwin's theory of evolution, present day species can be related by an evolutionary tree, the so-called "Tree-of-life", much in the same way as members of a family can be related by a family tree. One of the central problems in biology is to work out what this evolutionary tree is (or, more often, small parts of this tree), since this can be helpful, for example, in understanding how genes, genomes and, ultimately, organisms function. The flourishing area of phylogenetics is concerned with solving this problem. However, since it is by no means an easy problem, advanced mathematical theories are required to help find its solution.



In this project we investigated and developed such mathematical theories. New findings include (i) novel ways to construct phylogenetic trees and networks, (ii) new discoveries concerning the structure of optimal realizations of metric spaces, (iii) a new theoretical framework to help better understand biodiversity and its conservation, and (iv) improved ways to decompose distances that commonly arise in evolutionary studies by proving new mathematical results concerning their tight-spans. These results have been presented in 19 papers that have already appeared in international, refereed journals. It has also led to new, freely-available software for computing phylogenetic networks which can be used by biologists to uncover complex evolutionary patterns relating organisms such as plants and viruses. Moreover, many of the results have been presented by the project participants in, for example, (i) international conferences and workshops, (ii) the Isaac Newton Institute for Mathematical Sciences' "Phylogenetics" programme in autumn 2007, (iii) a post-graduate course in Phylogenetic Combinatorics run by the project participants in South Korea in 2008, and (iv) a contribution by the principal investigator to a series of radio programs on Charles Darwin at a local, Norwich-based radio station.



The funding also provided important resources which helped develop and strengthen the UEA Phylogenetics group through funding of (i) a PhD studentship and a post-doctoral researcher position, (ii) various visits to/from project collaborators in New Zealand, South Korea and China, which strengthened the collaboration with their groups, and (iii) attendance of several workshops and conferences in phylogenetics. As a result (i) two post-doctoral researchers have been successfully trained in phylogenetics and have now both found employment in academic institutions where they will continue their work in this area, and (ii) the PhD student has co-authored two papers that have appeared in refereed, international journals, and is due to submit a thesis in June 2010 concerning new mathematical theories underpinning phylogenetic networks and phylogenetic diversity.



In summary, all of the original objectives of this project were successfully achieved, and some fascinating new directions in phylogenetics have been discovered which are being followed up in new collaborations. The theories and software developed in this project will not only enable biologists to better understand their data, but are already leading to exciting new results in mathematics and computer science.
Exploitation Route Evolutionary biologists and environmental scientists can use the tecnhiques that we have developed to construct evolutionary trees and networks, to understand biodiversity, for comparative genomics and other applications such as understanding the origins and evolution of viruses.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Education,Environment

 
Description The findings have been used to understand evolutionary relationships.
First Year Of Impact 2010
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment
Impact Types Societal

 
Title QNet 
Description QNet is a software package written in Java which encompasses methods for generating phylogenetic networks from quartet data. The method takes as input the set of all quartet tree topologies for a given data set, together with a branch length for each topology. These branch lengths must be precomputed by the user. Included in the package are two simple applications for generating quartet weight data. GWeight utilizes statistical geometry to generate quartet weights. LengthWeighter is a front-end for the Tree-Puzzle software for generating likelihood quartet weights. Using the quartet data, QNet employs an agglomerative approach to construct a planar split network, that summarises the relations described by the quartets.The resulting networks are output in Nexus format for easy visualization using the SplitsTree software package. 
Type Of Technology Webtool/Application 
Impact The software has been used by biologists to analyse evolutionary data. 
URL http://www.uea.ac.uk/computing/qnet