Computing a yeast Tree of Life

Lead Research Organisation: University of East Anglia
Department Name: Graduate Office

Abstract

Yeasts are hugely important organisms used in beer and bread making, in addition to a wide variety of bioindustries. To optimise our exploitation of the 4000+ yeast strains in the National Collection of Yeast Cultures (NCYC), we need an accurate yeast Tree of Life that encompasses the NCYC strains. However, no such tree is currently in existence.

We have recently sequenced the genomes of approximately 600 NCYC strains. The resulting dataset, potentially the largest of its kind worldwide, will enable us to compare current computational approaches to developing such a yeast Tree of Life and to further develop new Next Generation Sequencing (NGS)-based approaches for evolutionary tree reconstruction.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/M011216/1 01/10/2015 31/03/2024
1786363 Studentship BB/M011216/1 01/10/2016 31/12/2020 Ann-Marie Keane
 
Description The genomic data (raw reads and draft assemblies) of fourteen yeast strains was generated in this project. This was added to previously generated yeast datasets and publically available yeast datasets to undertaken a phylogenetic methodology comparison using a large (75 species) key yeast dataset.

The species of this datset include strains of academic, industrial and medical importance. An investigation into a good quality subset of this dataset showed key genomic similarities and differences between these species. A number of these strains have no genome or genomic data publicly availble.
.
A number of phylogenetic methodologies were compared using a key yeast NGS dataset with clear differences in accuracy found between the approaches.
An alignment-free approach (FFP) which uses whole genome datsets was investigated further with a simulation study which showed a sequence length bias and a GC bias present in the method.

The base program for a piece of alignment-free phylogenetic software was developed with the intention of overcoming the biases seen in the FFP software but awaits further testing.This is available on GitHub (See below).

My thesis is now submitted and I am planning two or more publications from the above findings in the coming year.
Exploitation Route Two or more papers will be published on the findings of the project including making available the good quality datsets generated in this study. Making the datasets public could be of use for academic, health and industrial research.

More work could be done to investigate the genomic similarities and differences of this key yeast dataset. This would require the resequencing of a number of species' genomes which have sub-optimal assembly quality for a more detailed comparative genomics study and for genome annotation. This could lead to the identification of genes present in non-convential yeast which may be use for industy.

The software was not finished but a piece of software could be developed and completed which harnessed the efficient method seen in the FFP approach but also took into account the biases identified in the software.
Sectors Manufacturing, including Industrial Biotechology

URL https://github.com/aKeaneScientist/jellyphy
 
Description Midsummer Phylogenetics meeting- talks given at UEA 2018 and 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Other audiences
Results and Impact I gave a presentation of my research findings to date in this annual phylogenetics meeting at the University of East Anglia. A mixture of academics and students at different points in their career, in related fields, attended the meeting. Talks were followed by questions and debates regarding the current findings in the field.
Year(s) Of Engagement Activity 2018,2019
 
Description Oral presentations at British Yeast Group Meeting 2017 and 2018, poster at 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact I have presented my research project and findings to date at the annual British Yeast Group Meeting for the last three years. I gave talks at the 2017 and 2018 meetings and presented a poster of my work at the 2019 meeting. This is an audience of over 100 research and industry scientists working with Yeast. All have led to questions and discussions regarding the findings and ideas for further research have also been gained.
Year(s) Of Engagement Activity 2017,2018,2019
 
Description Poster Presentation at ISMB conference 2018 and 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact I presented a poster of my research project and findings at the International Society of Computational Biology's annual international conference, Intelligent Systems in Molecular Biology (ISMB) in 2018, in Chicago, USA and 2019 in Basel, Switzerland. Both allowed for much discussion on my findings and ideas for further research.
Year(s) Of Engagement Activity 2018,2019
 
Description Poster presentation at ISSY 2017 conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact I presented a poster of my project and findings to academic and industry researchers at the International Specialised Symposium on Yeast (ISSY) in 2017 in Cork, Ireland.
Year(s) Of Engagement Activity 2017