Predicting plant microRNAs based on functional and biogenesis data

Lead Research Organisation: University of East Anglia
Department Name: Biological Sciences

Abstract

How do sharks and roses develop from a single cell? This question has intrigued scientists for a long time because all plants and animals derive from a single cell, the fertilised egg. That single cell develops into an entire organism through many-many cell divisions but the genetic information does not change during those cell divisions. Therefore all our cells contain the same genetic information. However, there are many different tissues with specialised functions in our body. These tissues are different from each other because a different set of proteins are present in the cells that make up a certain tissue. The reason for this is that only a certain set of genes are active in each cell. Gene expression is a complex process therefore it can be regulated at several levels. First the chromosomal DNA is transcribed into mRNA and this step is regulated by various mechanisms. The mRNAs are then processed and translocated to the cytoplasm where they are translated into proteins. Accumulation of a protein can also be regulated by various mechanisms. One of the most recently discovered regulatory layers involves short RNAs to regulate the translation efficiency of mRNAs. A group of these short RNAs are called microRNAs (miRNAs) since these molecules are very short, only 21-24 nucleotides. Research in the last ten years found that miRNAs play a very important role in normal development.
miRNAs are generated from a longer precursor molecule, which is folded into a characteristic stem-loop shape. This shape and a few other features of the precursor and mature miRNAs, which are all linked to how the miRNAs are produced (biogenesis) can be identified by computer programs. Several computer programs were developed that can predict miRNAs from a large number of short sequences captured from different tissues. These programs rely on different parameters such as the length of the stem and loop and many others but the value of these parameters are subjective and not experimentally proven. Naturally, programs using more stringent parameters predict fewer miRNAs than programs using more relaxed parameters. Until now, stringent parameters have been applied to ensure a low rate of false positive predictions. However, due to the strict criteria, miRNAs with a slightly shorter stem, bigger loop, etc. might have been missed.
miRNAs recognise specific sequences on mRNAs they regulate and in plants they cause a cleavage between the 10th and 11th position within that sequence. The cleaved fragments can be captured providing functional information in addition to biogenesis features. Recently we developed a program that can compare all cleaved mRNA fragments to all small RNAs found in a sample. This program found more than 4000 fragments that were potentially cleaved by about 3500 small RNAs in the model plant Arabidopsis thaliana. Hundreds of those small RNAs showed similar features to miRNAs but marginally missed the stringent criteria of the prediction programs. Based on these we hypothesise that many miRNAs have been missed by computer programs and these could be identified using functional and biogenesis data together.
The aim of the proposal is to test this hypothesis by developing a new program that can consider both biogenesis and functional data and identify many new miRNAs in the model species Arabidopsis and the crop species tomato. In the presence of functional data, we propose to use slightly less stringent biogenesis parameters to confidently predict new miRNAs. We will generate specific small RNA and cleaved mRNA data from normal plants and also from plants that contain reduced level of miRNAs or cleaved mRNA fragments. These will be used to identify experimentally validated parameters.

Technical Summary

One of the most recently discovered gene expression regulation pathway involves small RNAs. The best characterised small RNAs are the microRNAs (miRNA), which are generated from a longer precursor molecule with a stem-loop structure. Most miRNAs were identified through deep sequencing of cDNA libraries for small RNAs by computer programs. These predict miRNAs based on their biogenesis features and use different parameters such as the minimum free energy of the stem, size of the bulges and many others but the value of these parameters are somewhat arbitrary. Currently stringent parameters are applied to avoid false positives; however there is no information about the false negatives.
Plant miRNAs cause a cleavage on target mRNAs between the 10th and 11th position of the target sequence and these cleaved fragments can be sequenced through a genome wide 5' RACE. Recently we developed a program that can compare all cleaved mRNA fragments to all small RNAs found in a sample. We found more than 4000 fragments that were potentially cleaved by about 3500 small RNAs in the model plant Arabidopsis thaliana suggesting that there are many more miRNAs than the 338 deposited to miRBAse.
The aim of the proposal is to test the hypothesis that computer programs using stringent biogenesis parameters miss many functional miRNAs. We propose to develop a new program that will consider both biogenesis and functional data and in the presence of functional data, less stringent biogenesis parameters will be used to confidently predict new miRNAs. We will generate specific small RNA and cleaved mRNA data from wild type and mutant plants with reduced miRNA or cleaved target levels. miRNAs will be predicted using the new program and the optimal parameters will be identified through an iterative process by relaxing the parameters and testing the predictions in the mutant plants. Predictions by the new program will be experimentally tested in a model (Arabidopsis) and a crop (tomato) species.

Planned Impact

Who will benefit from this research?

MicroRNAs (miRNAs) are involved in vary diverse processes. They are essential for normal plant development and also for optimal responses to environmental changes. Discovering new miRNAs can benefit a wide range of groups since the new miRNAs may be involved in all kinds of developmental processes such as root, leaf and fruit development. Understanding the development of these tissues has a direct impact on food security since these tissues are consumed by people and also farm animals. In addition, new miRNAs may be involved in both biotic (affecting resistance to viruses, bacteria and fungi) and abiotic stress responses. Understanding stress responses of plants impact on food security since a significant percentage of crop loss is due to biotic and abiotic stresses. Therefore non-academic groups who would benefit from the work described in this proposal and potentially use its outputs include: plant biotechnology and breeding companies and the agricultural community, including farmers (both in the UK and worldwide) and potentially the consumers of the food industry.

How will they benefit from this research?

This project addresses a basic scientific question, i.e. the discovery of new miRNAs. Obviously discovering new miRNAs would not lead to new products in the short term. However, identification of new miRNAs may have an impact on breeding plants with bigger yields, higher tolerance to changing environments, pathogen resistant crop plants since miRNAs are involved with all these processes. Plant biotechnology and breeding companies could possibly use the generated knowledge in their future programs. These would ultimately benefit the consumers of the food industry.

Publications

10 25 50
 
Description We have generated all the libraries that we proposed and they have been all sequenced. We have identified a number of new miRNAs through this approach and submitted a manuscript last week to Bioinformatics.
Exploitation Route it's still a bit too early, after the paper will be published, others will probably start using this approach to identify microRNAs
Sectors Agriculture, Food and Drink,Pharmaceuticals and Medical Biotechnology