PRPTree: Novel bioinformatics software for improved understanding of the temporal context of evolutionary divergence

Lead Research Organisation: University of Reading
Department Name: Sch of Biological Sciences

Abstract

In the life sciences, the field of bioinformatics is concerned with compiling large databases of information on organisms - such as DNA or gene sequences - and with developing methods for analysing those databases. Because all living things are connected to each other by their common ancestry, one of the most common bioinformatics tasks is to develop an evolutionary tree or 'phylogeny' of a group of species. A phylogeny is like a family tree but traces the historical lines of descent from older to younger species rather than from parents to their offspring. Scientists reconstruct them by studying the similarities and differences among species in their DNA sequences. They then use the phylogenies to describe the probable course of evolution over millions of years, to see how species evolve and adapt, to study the ways genes and proteins evolve and to reconstruct the timings and pace of change of past events.

The Principal Investigator of this project has derived a new statistical method that can potentially greatly improve our ability to estimate the dates and timings of past events of evolution on phylogenetic trees. The proposed research is to i) implement and test this new method, ii) to compile a database of the rates at which species evolve and iii) to make the new method and the database available to others as an easy-to-use bioinformatics tool.

Regarding the first goal, the new method we will develop improves our ability to estimate the timings of past events of evolution by allowing investigators to measure more accurately the rate or speed with which DNA sequences acquire differences. This improved dating makes the phylogenies more useful for scientific research. In turn, improved phylogenies will allow scientists to provide better answers to questions like when did the ancestor to humans and chimpanzees live? Or when and where did the Zika or Ebola viruses emerge?

The second goal of our work is to compile a database of the rates of evolution observed across a variety of species including plants, animals, bacteria and viruses. Our preliminary work suggests that some extraordinarily highly elevated rates of evolution can occur - 10 or more times the background rate. Knowledge of these rates is important for dating phylogenies, but also for understanding questions like how quickly can bacteria acquire resistance to antibiotics, or viruses outwit our immune systems or whether species can evolve fast enough to respond to climate change.

Our third goal is to make our new method freely available to other researchers as a bioinformatics tool known as PRPTree. We produce a software package, as do several other groups, for inferring phylogenetic trees, and we will share our methods with those other developers.

Technical Summary

Our objectives are to implement, test and then make available as easy-to-use bioinformatics software a new Bayesian phylogenetic inference method -- PRPTree -- for improved dating of phylogenetic trees. At least 7,000 articles per year use phylogenetic trees, and dated trees are now a rapidly-growing proportion. Biological scientists routinely need to establish the timings of events on phylogenies to measure rates of evolution and adaptation of genes, proteins and phenotypes, to date events such as gene-duplications and inversions and to link these processes to each other and to the generation of biodiversity.

But a limitation of methods for inferring dated phylogenies is that they frequently yield great uncertainty about dates or timings of events. They suffer this uncertainty because, without outside information on the rates of gene or protein sequence evolution (i.e., substitutions per unit time), they cannot distinguish fast rates of evolution over short time periods from slow rates over long time periods - this is the problem of identifiability.

The new method we will develop incorporates a 'Poisson rate prior' (probability distribution of rates), derived by the PI, that provides an exact statistical description of neutral rate variation. This improves the estimation of rates of gene or protein sequence evolution throughout the phylogenetic tree by admitting a narrower range of plausible rates than conventional approaches, thereby reducing uncertainty about inferred times. The method shows that current models allow improbably high rates, and preliminary trials with simulated data show that the Poisson-prior can improve the prediction of the true historical time.

We will incorporate the method into our BayesPhylogenies statistical package as an easy-to-use bioinformatics application PRPTree, capable of handling large datasets in a bioinformatics pipeline, and make it freely available to users and to authors of other phylogenetic inference packages

Planned Impact

Non-academic groups that could benefit from the proposed research over different timescales?

Phylogenetic trees that properly account for variation in rates of evolution (the major thrust of our work) can improve the accuracy of phylogenetic inference as well as improve estimates of the timings of past events (Case for Support). PRPTree can also improve the estimation of rates of evolution. These features make PRPTree potentially valuable even to organisations that otherwise might rely on un-dated trees.

Thus, beneficiaries will be national and international organisations whose programmes make use of information on the relatedness of organisms (species and individuals), on the timings of events linking those individuals and on information describing the pace of change at the genetic levels. These will include members of the European Association for Zoos and Aquaria (EAZA) the European Endangered Species Programmes (EEP), the European Studbooks (ESB) and the Regional Collection Plans (RCP). Specific captive breeding programmes in zoos include those at Paignton, Chester, Edinburgh, London and Whipsnade zoos; conservation biology organisations such as the Nature Conservancy, Royal Society for the Protection of Birds (RSPB), British Trust for Ornithology (BTO), and the World-Wide Fund (WWF), plus numerous smaller and regional/local wildlife trusts; the Bio-Pharma and agricultural industries such as Syngenta or Rothamsted Research; public health and epidemiological organisations such as Public Health England or the Blood Borne Virus Unit; and Natural History Museums (see also Pathways to Impact).

How might the potential beneficiaries benefit?

Captive breeding programmes

These programmes seek suitable matches to maintain healthy levels of genetic diversity in offspring and to avoid inbreeding. Phylogenetic trees assess relatedness among individuals drawn from collections of captive populations dispersed potentially around the world. PRPTree can improve the inference of those trees, the dates or timings of events on those trees, and provide better estimates of the rates of evolution.

Conservation biology.

Which species should be given priority for conservation efforts? One approach is to choose the set of species that somehow maximizes genetic diversity. To do this requires a phylogenetic tree of the candidate species, with well-estimated branch lengths. Maximising genetic diversity is then a matter of choosing some subset of species whose collective branch lengths are greatest.

Bio-Pharmaceutical and Agricultural research industries.

Some Bio-pharamaceutical companies seek to identify compounds likely to be useful in the treatment of disease. Phylogenetic trees of, for example, plant species can be used to identify species closely related to ones of known medicinal value, on the assumption that the relatives will also be likely to contain useful compounds. Improved phylogenies are better able to identify closest relatives.

Agricultural scientists frequently look for new crop types to be developed from wild strains. Phylogenetic trees that include current crop varieties and wild-types can identify which of the wild-types might have characteristics similar to those that have already been successfully domesticated.

Public Health/Epidemiology.
Rapidly evolving viruses such as Zika and Ebola can be tracked and analysed phylogenetically. Phylogenetic studies pinpointed the source of HIV in West Africa, showing it arose about a century ago, and that it was almost certainly acquired from a Chimpanzee in the form of a 'simian immunodeficiency virus'.

Natural History Museums.

Natural history museums use phylogenies in their public displays to trace the history of life. Dated phylogenetic trees in combination with geographical information can also trace the emergence and movements of organisms over millions of years. PRPTree can improve the accuracy of these stories.

Publications

10 25 50
 
Description Our work on this grant was severely curtailed by health issues suffered by the postdoc on the grant as a result of the covid epidemic. As a consequence we have been granted an extension to the award that will continue through August of this year and we hope by that time to have a fully working model as described in the initial grant application.
Exploitation Route Too early to say (award is still active)
Sectors Environment

 
Description The evolution of SARS COV-2 
Organisation University of Montana
Country United States 
Sector Academic/University 
PI Contribution We provided the methods used in this partnership and we provided consultation on the paper that has been written
Collaborator Contribution Our partners collected all the data, ran all of the analyses and wrote the paper.
Impact Our partners, using our methods, discovered that early in its evolution, SARS COV-2 experienced punctuational bouts of evolution during which the genome changed rapidly rather than acquiring new mutations gradually.
Start Year 2021