Ribosomal DNA variation in multi-locus systems

Lead Research Organisation: Earlham Institute
Department Name: Research Faculty

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

The ribosomal DNA (rDNA) evolves under a balance of heterogeneity-inducing point mutations acting against homogeneity-inducing concerted evolutionary processes. This balance is now known to be imperfect, leading to intra- and inter-organism variation in rDNA copy number and unit sequence. This situation is made even more complex by the presence, in many organisms, of multiple rDNA loci that are believed to be homogenised by processes such as gene conversion. Evidence is growing rapidly that rDNA variation is tightly linked to phenotype. Furthermore, the rDNA has now been implicated in vital biological phenomena such as genome integrity and ageing. We therefore urgently need to discover how rDNA variation is organised within a genome, how it is expressed and how it underpins the functionality of an organism.

We have developed VariantLister, a software tool that enables us to systematically characterise rDNA variation in organisms with a single rDNA locus. However, to date we have been unable to attribute rDNA variants to a specific locus in organisms with multiple rDNA loci. Through analysis of single chromosome datasets and further VariantLister development, we will carry out such an analysis for the first time, in yeast and wheat. We will also carefully assess whether clustering of rDNA variants called from whole genome sequence datasets can be accurately ascribed to distinct loci for organisms with multiple rDNA loci. Finally, by analysis of transcriptome datasets, we will discover which of the identified rDNA variants are expressed and how a variant's expression fate depends on its locus and the environment.

The results of these tasks will provide vital new knowledge on the organisation and expression of rDNA variants in two key eukaryotes, which will underpin future investigations of rDNA function and ultimately species improvement programs. Finally we will disseminate all project software and data on a dedicated project website.

Planned Impact

This project has considerable promise to impact on the UK society and economy, the general public and the project participants. While economic and societal impact will be derived in the long-term, benefits to the general public and project participants are expected to be realised both during and following the project.

1) The UK society and economy
Our society is currently facing significant challenges stemming from threats ranging from climate change to a growing and ageing population. We have an urgent need to secure and optimise future food production while also utilising food and agricultural waste in the replacement of petroleum as sustainable sources of key chemicals. This project will impact on both of these needs, to the benefit of our society and economy.

a) Crop breeders/Agri-food industry
Wheat is the UK's most important cereal crop, yielding 16.68 million tonnes in 2015, and a vital component of the UK diet. New knowledge of rDNA variation and expression in bread wheat will be communicated to crop breeders and the agri-food industry, to be used for the development of new varieties of wheat tailored to specific environmental conditions. In particular, analysis of RNA-Seq datasets will kick-start the identification of rDNA variants that are preferentially expressed under stress, including conditions of high temperature and low water. Dr Davey's position in the wheat community, including the BBSRC Wheat ISP, will be key to effective knowledge dissemination in the pursuit of continued food security.

b) Industrial biotechnology/Biopharma
The vast quantities of wheat straw left over from food production (e.g. 6.3 million tonnes in 2007), in particular in the East of England close to the project's location, is a key target substrate for secondary biorefining in the UK. Here, sugars released from the straw are fermented by yeast to produce a wide spectrum of platform chemicals and fuels. Harnessing the vast biodiversity of yeast is a fast emerging area of interest to a wide range of UK companies and NCYC has recently developed a new collaboration on yeast natural products with Croda, a FTSE 250 company. Yeast rDNA variants and expression profiles discovered in this project are expected to lead to the development of new strains that efficiently produce optimal quantities of a required chemical product. Consortia such as Sc2.0 and existing relationships with key companies such as Croda will ensure broad communication of our results.

2) The general public
There is a growing public appetite for scientific knowledge, with a wide recognition of the enormous impact that science has on our prosperity and continued well-being. The project team are highly committed to public scientific outreach, each tending to focus on a different part of this broad sector. Within this project, we will engage directly with members of the public, from schoolchildren to our society's most senior members, to educate them in its most important aspects. In particular, we will use our existing contacts within local organisations such as the SAW Trust and BBC Look East to introduce concepts such as genetic variation, synthetic biology and industrial biotechnology, to explain why we are carrying out this project and what benefits we anticipate it will bring to the local population and to the wider UK community.

3) The project participants
The three project investigators are all highly skilled in the training of new members of their field, and combined they have passed on a wide range of scientific and transferable skills to dozens of scientists in the UK and beyond, many of them now holding senior scientific positions of their own. Within this project, the post-doctoral research assistant will benefit from this expertise, gaining excellent inter-disciplinary training, with the project focus ensuring they possess the skill sets essential to the next generation of UK scientists.

Publications

10 25 50
 
Description We have developed a variation simulator in order to test how various polymorphisms are represented in the yeast rDNA. This new tool, PARSLEY ROOT (Pipeline for Analysis of Ribosomal Locus Evolution in Yeast (Reusable for Other Organisms Though)), discovers variants in the Illumina reads from variant yeast strains against a reference sequence of ribosomal DNA and giving output in VCF format.

The postdoctoral researcher on the project, Dr Ziauddin Ursani, was in post for 24 months. In that time, Dr Ursani developed two computational pipelines for the prediction of rDNA sequence variants from Next Generation Sequencing datasets. The first pipeline, a "linear" pipeline, is complete and has been tested successfully on yeast genomic datasets, detecting a greater breadth and accuracy of variants than we had been able to do previously, most importantly without the need for (time) costly manual checking of results. A Python software program named Parsley that encapsulates this pipeline - and which includes additional facilities such dataset simulation and evaluation of software accuracy - has been published at https://github.com/ziaursani/parsley_root and is currently being used to investigate rDNA sequence and copy number variation in a range of real datasets, with the results of these investigations expected to lead to a series of publications. The second pipeline, a "graphical" pipeline, is in prototype stage. Dr Ursani showed in simulations that the graphical pipeline is capable of detecting an even greater number of sequence variants than the linear pipeline but it is more complex to use. In future, as graphical variant calling becomes more embedded in everyday bioinformatics, it is likely that this pipeline will be ported into a future version of the Parsley software.

Due to the short timeframe of the grant, our outputs will be upcoming following more development on the PARSLEY tool and application to new datasets being generated in collaboration with Jane Usher in Exeter.
Exploitation Route We will test the Parsley software on wheat single chromosome and yeast transcriptome datasets in the coming months. This will generate new knowledge on the mechanisms of rDNA evolution and the potential functional effects of rDNA variation, which could impact on areas such as (for example) plant breeding. Several publications are anticipated to arise from this new knowledge.

The Parsley software developed within this project is also now freely available to all. The software is open source and can be reused with easy-to-configure parameters to model variation in yeast rDNA. Members of the biological and bioinformatics communities can therefore now use the software for the analysis of ribosomal DNA in any eukaryotic organism from Next Generation Sequencing datasets.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software)

URL https://github.com/ziaursani/parsley_root
 
Title The PARSLEY ROOT Variant Simulator 
Description A python toolkit to simulate variants within the yeast rDNA. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact This software was presented at the recent Genome Informatics 2019 conference at Cold Spring Harbor 
URL https://github.com/ziaursani/parsley_root
 
Description Genome Informatics 2019 - poster presentation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Zia Ursani presented the poster "A computational pipeline for precision variation discovery in repetitive DNA"
Year(s) Of Engagement Activity 2019