Deciphering the RNA degradome: A new tool for small RNA target discovery

Lead Research Organisation: University of East Anglia
Department Name: Computing Sciences


RNA silencing is a complex and highly conserved regulatory mechanism that is now known to be involved in diverse processes such as development, pathogen control, genome maintenance and response to environmental changes. Since its recent discovery, RNA silencing has become a fast moving area of research of great importance in both plant and animal molecular biology. Research in this field has greatly profited from new developments in novel high-throughput sequencing technologies (such as 454 and Solexa/Illumina) which, in a single run, can generate several datasets each containing millions of small RNA (sRNA) molecules, the key players in all RNA silencing phenomena. A key challenge in sRNA research at present is to understand the function of the millions of sRNAs that have been recently sequenced and deposited in public databases. A first step in meeting this challenge is to find which (if any) genes are being regulated by each sRNA. Several computational approaches have been devised for sRNA target prediction in plants and animals but these are only predictions and cannot be relied upon without experimental validation. However, in the last year or so a new high-throughput experimental technique has been described for sequencing of the 5'-ends of uncapped mRNAs including those transcripts that are targeted by sRNAs and subjected to endonucleolytic cleavage. In plants, degraded mRNA fragments provide evidence of the interaction between sRNAs and their complimentary mRNA targets that lead to cleavage and degradation of the mRNA. Thus the possibility of sequencing the ''RNA degradome' of an organism in this manner is set to revolutionise target validation in plants since it permits the genomic scale sequencing of cleaved mRNAs for the first time. Currently there is only one tool available for degradome analysis and it has several limitations which make it unsuitable for large-scale analysis of such data. In this proposal we will develop and implement a novel approach to high-throughput analysis of degradome data which will allow users to validate targets across the entire sRNAome and produce a network of sRNA/mRNA interactions based on degradome evidence. The tool will be made available online and for download through the UEA Plant sRNA Toolkit, a collection of user-friendly tools allowing the analysis of high-throughput plant sRNA datasets with high-performance computing without the need for expert knowledge or dedicated bioinformatics support.

Technical Summary

Novel sequencing technologies, such as 454 and Solexa/Illumina, have become importants tool for researchers in the field of RNA silencing, due to the fact that they can sequence millions of sRNAs in a single experiment. Over recent years researchers have been using this technology to catalogue entire 'sRNAomes' from a variety of organisms but, at present, we only know the function of only relatively few sRNAs. In the past year or so a new high-throughput experimental technique has been described that allows researchers to sequence 5'-ends of uncapped mRNAs including all transcripts targeted by sRNAs and subjected to endonucleolytic cleavage. In plants, degraded mRNA fragments provide evidence of the interaction between sRNAs and their complementary mRNA targets that lead to cleavage and degradation of the mRNA. Thus the possibility of sequencing the ''RNA degradome' of an organism in this manner is set to revolutionise target validation in plants since it permits the genomic scale sequencing of cleaved mRNAs for the first time. In this proposal we will develop a new bioinformatics method for plant degradome analysis which will allow the identification of all sRNAs that are able to target and cleave mRNAs. This method will take as input degradome and sRNA data generated using next generation sequencing technologies together with the relevant transcriptome. The tool will output all possible interactions between sRNAs and mRNAs in a format that can be viewed using a standard open source network visualisation tool. To facilitate accessibility of the tool to wet-lab biologists we will integrate it into the existing UEA Plant sRNA Tools website which will enable users to run jobs on a 100+ node compute cluster, thus removing any requirement for users to have access to high-performance computing facilities. In addition a standalone command-line version of the tool will be made freely available for use by researchers having appropriate resources.

Planned Impact

Over the past year, The UEA small RNA Toolkit (UPsT) has been used extensively by plant small RNA researchers both in the UK and worldwide with an average of 30-50 analyses performed per week. It has an attractive website, is easy to use, and is the only complete solution for processing and analysing plant high-throughput sRNA data both in the UK and internationally. We regularly receive emails from users requesting extra tools and features such as those that we intend to implement in this project. Moreover, there are several UK groups working with next generation sequencing sRNA data, for example, in miRNA discovery. It is therefore clear that user demand for tools such as those to be developed in this project is strong. Moulton's group has a strong track record in publicly releasing and promoting both the algorithms and software that it develops. Indeed, all websites and software created as part of the project will be provided free to all under open source licenses. This will allow both academic and commercial users to directly benefit from the resources generated in this project. The Genome Analysis Centre (TGAC) will also promote the use of tools developed in this project in the context of high-throughput annotation of plant genomes that are being sequenced in this recently established genomics research institute. This will also allow us to investigate the potential of commercial exploitation of our new methodologies through links that TGAC is developing with SME's working in the area of plant breeding. Previously the UPsT resource has been promoted through national and international conferences, publications and through linking from other relevant sites and resources. In addition, members of the Moulton lab have demonstrated the application of the current tools at numerous workshops and conferences, and the helped users of UPsT in the provision of tutorials on the tools at various national and international workshops. We will continue to use a similar strategy to promote the tools developed in this project. The future maintenance of the UPsT tools requires minimal dedicated resources and it will be possible for future development of new features to be driven by short-term projects. For instance, a PhD project could lead to the development of new tool(s) required by the community. Such projects will facilitate training of PhD students in the process of developing and promoting cutting-edge research to a broad user base. The degradome assay is a powerful tool to experimentally identify targets of plant short RNAs at the genomic scale but, due to a lack of appropriate computational tools, at the moment plant scientists cannot fully analyse such data. The proposed software will enable plant scientists to fully utilise this powerful technique, which will generate a wealth of knowledge on the function of sRNAs. In this way, we expect that the proposed tools will ultimately contribute towards improving quality of life through enabling plant and molecular biologists to understand and exploit key molecular pathways in important crop plants such as tomato and grape.
Description The building blocks of DNA are four nucleotides, usually referred to as A, T, C and G. A gene is a sequence of several thousand nucleotides. In DNA for plants and animals there are usually several thousand genes, although genetic material can make up a relatively small proportion of the entire genome. The remainder is called non-coding DNA.

At any given moment, only some genes are usually active in an organism. A gene is active when it is being converted, or expressed, as a protein. The gene expression process involves the breaking apart of the double-stranded DNA molecule followed by the conversion of the genes on each strand into proteins. The first part of this process is called transcription and the second is called translation.

Some small sequences of non-coding DNA (around 20-25 nucleotides in length) which are called small RNAs (sRNAs) have been found to affect gene expression in plants, and in some cases to suppress the expression of a gene completely. The gene is suppressed when the sRNA cleaves the transcribed gene into fragments. The regulation machinery may be activated during growth, attack by a virus, due to some environmental factor or some other effect. Gene regulation therefore plays a pivotal role during an organism's life-cycle. It is therefore of paramount importance to understand the regulatory role that sRNAs play during gene expression.

Interactions between a few sRNAs and the genes that they target are known. However the full picture of interactions is far from clear or even achievable with the prediction methods currently available. This grant focused on developing a new and fast software tool, called PAREsnip, which analyses results from biological investigations across the whole genome for plants, providing a comprehensive set of possible sRNA/gene interactions.This tool is freely available for download within the UEA small RNA workbench.

To search for regulatory interactions the software requires three inputs; a database of transcribed genes, a list of sRNAs to be tested and the data set of cleaved gene transcript fragments (also called the "degradome"). Thousands of genes and millions of sRNAs are publicly available in databases. The recently developed high-throughput technique known as Parallel Analysis of RNA Ends (PARE) is used to obtain the degradome. The gene fragments in the degradome are matched to the genes to give a database of targets for the search. Each candidate sRNA/target duplex is then tested using a well established set of rules to check whether the sRNA could have cleaved the gene. Each duplex which passes this test is then assessed to see how likely it occurred by chance. The sRNA/target duplex is reported if this chance is lower than a prescribed threshold.

PAREsnip can perform an analysis on over 10,000 sRNAs in a few minutes and a million in a few hours on a desktop PC. No other tool available at present can claim similar performance. PAREsnip therefore facilitates large-scale identification of sRNA/target interactions which should permit the discovery of new regulatory networks.
Exploitation Route The PAREsnip tool could be used by researchers to find new small RNA targets. This is useful for researchers in RNA interference and has applications in areas such as crop plant research.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Pharmaceuticals and Medical Biotechnology

Description The tool has been used by various groups to understand the roles of small RNAs. Examples include how miRNAs mediate SnRK1-dependent energy signaling in Arabidopsis and the discovery of an endogenous microRNA target in C. elegans.
First Year Of Impact 2012
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software)
Impact Types Societal,Economic