A temporal hidden Markov model approach to investigating the evolutionary fate of duplicated genes

Lead Research Organisation: University of Manchester
Department Name: Life Sciences

Abstract

Evolutionary theory ties together experimental observations from different organisms, providing a framework for understanding the natural world and how life came to be. Basic research in molecular biology and genetics, coupled with evolutionary theory, is directly responsible for many economic and medically important discoveries, ranging from predicting the strains of virus responsible for influenza epidemics to the engineering of organisms and proteins involved in industrial processes. This proposal uses evolutionary theory to study the origins of new genes, which are one of the main driving forces behind biological adaptation. The process of gene duplication is a major source of new genes, and is known to be responsible for a significant portion of the functional innovation and diversity seen across genomes. Duplication plays a crucial role in phenotypic diversity, speciation, and development. The proposed research will investigate the whole genome duplication that took place in yeast c100 million years ago, which offers a fantastic opportunity for learning more concerning the evolutionary mechanisms affecting gene duplication. The usual outcome of gene duplication is that one of the two copies of the gene is lost because they both initially have the same function. This proposal examines the portion of duplicates that are not lost post-duplication because they develop different functions and/or patterns of expression. Sometimes one of the duplicates acquires a novel function, resulting in the organism being better able to exploit its environment. Alternatively, existing functions and/or patterns are distributed between the pair of duplicates, meaning that both need to be kept for the organism to survive. The forces of natural selection responsible for these two outcomes remain to be fully elucidated, and understanding these forces is fundamental to addressing questions about the origins of gene function. The proposed research builds upon novel computational methods created by the PI, by creating tools that allow new questions to be addressed about these crucial evolutionary events. These methods use sophisticated statistical approaches to identify changes in the selective forces acting on genes during their evolutionary history, and where in the gene the selective forces have changed. The development of computational tools rather than laboratory experiment means that the whole genome of yeast can be examined, and the specific selective forces responsible for maintaining particular duplicates after the whole genome duplication can be identified. The results of huge numbers of laboratory studies on yeast species are housed in computer databases across the world. This proposal will link together these functional studies and the evolutionary events inferred from the new methodology. This will cast light on what biological factors affect the chances of both genes being maintained after duplication. Furthermore, combining knowledge in this manner has the potential to provide the complete story of how some specific genes have been maintained post-duplication. When the proposed programme of work is completed, the mathematical theory and computer programs used for these analyses could be applied to many other evolutionary and functional problems, including questions relating to the role of genetic diversification in speciation, the origins of the metazoan body plan (Hox genes), and how pathogenic organisms and their hosts interact.

Technical Summary

This proposal will use computational methods to study the evolutionary origins of genes. Understanding how genes are created and acquire new functions impacts many research areas, including understanding speciation and appreciating phenotypic diversity in the natural world (conservation), insight into how gene expression and physiological development evolves (developmental biology), and how directed evolution and biochemistry can be used to create genes of new function (synthetic biology). Gene duplication, which plays a major role in the creation of new genes, usually leads to: i. Non-functionalisation, where one duplicates becomes non-functional; ii. Neo-functionalisation, where one of the duplicate acquires new function, becoming a valuable asset for the organism; iii. Sub-functionalisation, where each of the duplicate pair ends up with a subset of the functions and gene expression patterns of the original gene, meaning neither can be lost from the organism. This proposal uses new statistical modelling procedures developed by the PI (temporal hidden Markov models), to investigate and discriminate between the selective forces acting during neo- and sub-functionalisation. The new technology promises to provide much better resolution of these factors than existing methods, and has many promising applications in addition to this proposal. The methods developed will be applied to the yeast whole genome duplication, which took place c100 MYA. This large-scale duplication event, occurring in a powerful and well studied model organism, offers fantastic opportunities of studying gene duplication. This proposal will use existing resources, such as the Yeast Genome Order Browser and yeastgenome.org, to source sequence and experimental data. The new methodology will identify the location on the tree and the position in a gene where selective events have taken place during evolution. Combining existing and new results will offer new insights into the origins of genes.

Publications

10 25 50
 
Description This grant successfully created tools for studying how the evolutionary process changes over time, but also highlighted that these changes are highly dependent on the types of data studied. Due to these issues the aims of the grant shifted focus towards methods for identifying some of these errors and how they also affect many other types of analysis.
Exploitation Route The findings relating to multiple sequence alignment are currently being used by my lab and others for improving data analysis tools and finding about the evolutionary forces affecting insertion and deletion mutations.
Sectors Creative Economy

Digital/Communication/Information Technologies (including Software)

Education

 
Title MetAl 
Description MetAl is a command-line utilty for calculating metric distances between alternative alignments of the same sequences. The current version is 1.1. See DOI doi: 10.1093/bioinformatics/btr701 for a manuscript describing both the importance and innovations in the tool. 
Type Of Material Improvements to research infrastructure 
Year Produced 2012 
Provided To Others? Yes  
Impact Metrics between alignments (MetAl) is the first true distance metric for comparing multiple sequence alignments. It has been used and cited by my and other research groups working in this area, including the creators of major alignment programs, such as MAFFT and T-COFFEE. 
URL https://github.com/benb/MetAl