RNAcentral, the RNA sequence database

Lead Research Organisation: University of Manchester
Department Name: School of Biological Sciences

Abstract

In molecular biology, the central dogma explains that the genes in DNA code for RNA. RNA molecules are then translated into proteins that are the mini-machines that carry out the main processes in the cell. Recently it has become apparent that potentially many thousands of human genes code for RNAs that are not translated into proteins, but rather carry out important functions in the cell as RNA. These molecules are often known as non-coding RNAs. Much of the focus in biology over the past thirty years of research has been on DNA and proteins, but recently there has been a surge of interest in non-coding RNAs. In fact, the core of the machine that makes proteins from RNA, called the ribosome, has itself been shown to be made of RNA. Non-coding RNAs have also been shown to be widely involved in regulating the levels of other genes and may be useful in making treatments for patients with a variety of diseases. The role of non-coding RNAs in plant and animal development is evident, but a deeper understanding of the biology is essential, thereby allowing their modulation to enhance features such as yield or resistance to diseases. Unsurprisingly, aberrant expression of non-coding RNAs has also been implicated in numerous disease states.

Research and innovation in the area of non-coding RNAs, and in molecular biology more generally, is hampered by the lack of an authoritative and complete resource collecting together all known non-coding RNAs. There are over 30 different online databases that contain information about different types of RNA molecules. Each of these resources makes their information available in different ways. The scattered nature of these resources has made it nearly impossible for biologists to discover what is known about non-coding RNAs related to their research area. To address this problem we created a resource called RNAcentral that brings together information from all the different RNA databases in one place. The most important information stored in RNAcentral is called the sequence of the RNA. Many existing RNA resources (called RNAcentral Expert Databases) have provided their data to RNAcentral. In this proposal we will add further more detailed information about the structure and function of RNAs into RNAcentral. We will work closely with one specific expert database, called miRBase, based at the University of Manchester, who will test out the system for searching the RNAcentral sequence database on specific subsets of RNAs.

By the end of this project, researchers from around the UK and the rest of the world will have access to an increased set of information about RNAs. This information will be freely available in a variety of ways including via a website and as a downloadable database. Having access to this information will help researchers connect RNAs into their work better to help them make new discoveries sooner.

Technical Summary

Under this proposal, we will continue the development of RNAcentral, an international database of non-coding RNA sequences, currently made up of sequence data contributed by 15 member databases. To make RNAcentral more comprehensive, we will import 21 additional ncRNA databases and carry out regular data releases. In addition to the core sequence data, our users care most about functional annotation of ncRNAs. We will therefore focus on incorporating additional types of annotations, such as high-quality secondary structures, inter-molecular interactions, GO and SO terms, and textual annotation from Wikipedia. We will map RNAcentral sequences onto appropriate reference genomes, and provide new functionality such as exploring overlapping sequences in the same species. New visualisations will be developed to display these new data, taking advantage of modern web technologies. In order to increase the sustainability of RNA databases worldwide, we will develop prototype RNAcentral infrastructure elements that we will make available to RNAcentral database contributors. To this end, we will develop an improved sequence search facility in collaboration with the miRBase database, and make this search available to them to search their sequence data and display the results on their own website using a RESTful API. This functionality will subsequently be made available to other RNAcentral databases. To disseminate information about RNAcentral, we will engage in outreach and training activities by hosting workshops, holding annual SAB meetings, and publishing biennial papers in the NAR Database Issue. RNAcentral, as a comprehensive repository of ncRNAs, will underpin a global effort to unravel the functions of ncRNAs.

Planned Impact

Non-coding RNAs are found in every living organism, and advances in ncRNA research, reflected in and supported by RNAcentral, will contribute to new applications in biotechnology, therapeutics, agriculture, and ecology. RNAcentral, as a comprehensive database of ncRNA sequences, indirectly contributes to all BBSRC strategic objectives: food security, biofuels, industrial biotechnology and human health. RNAcentral will be used by bioinformaticians and wet-lab scientists in both academia and industry working on all aspects of ncRNA Biology. As sequencing technologies become more advanced and new RNA structure probing technology emerge, there is a growing need to maintain a comprehensive and well-annotated collection of all ncRNAs.

By capturing and disseminating this valuable knowledge, we will be addressing the BBSRC's enabling theme of innovation, allowing industrial partners to make more rapid discoveries and inventions of benefit to society. RNAs hold great hope for ever-wider clinical and biotechnological applications. For example, microRNAs have been implicated as diagnostic signatures for cancer, snoRNAs in the major Prader-Willi phenotypes, bacterial small RNAs in pathogenicity, plant small RNAs in hybrid necrosis, and ribozymes in the cleavage of specific target RNAs. Again, improved annotation of and access to RNA data will improve the discovery and utilization of novel RNA targets for diagnostics and drug targets. There is intense research in the field of RNA based therapeutics and they hold some promise to improve health and welfare internationally. In the area of plant sciences we expect our annotations to be of use in genome engineering to improve disease resistance and crop yields. In addition, the ability to make RNAs in very large quantities has raise the idea of using RNA directly as a weed and pest control measure through crop spraying.

A number of commercial organisations manufacture experimental resources, for example microarrays, based on up-to-date gene annotation. Some resources have also been made available for specific classes of non-coding RNA gene; for example, several companies make microRNA detection kits. The companies themselves will therefore benefit from improved annotation of non-coding RNAs, and these resources underpin experimental studies in commercial and academic organisations. Along with the more clinical aspects described above RNAcentral helps to foster wealth creation through innovative application of RNA sequence information.

Non-coding RNAs such as ribosomal RNAs have long been used as a tag to identify species. Application of high throughput sequencing has opened up opportunities to understand biodiversity on an unprecedented scale. By better understanding biodiversity and how it is being changed will enhance our ability to manage and conserve the world's great natural genetic resources.

Having all known non-coding RNA sequences in a single resource gives a much easier overview of the growth and impact of RNA data. For example, one can compare the number of RNA genes versus protein coding genes in a genome. This will allow policy makers and funders to better gauge the scale of support needed to maximise output compared to other priorities.

Publications

10 25 50
publication icon
The RNAcentral Consortium (2019) RNAcentral: a hub of information for non-coding RNA sequences. in Nucleic acids research

publication icon
The RNAcentral Consortium (2017) RNAcentral: a comprehensive database of non-coding RNA sequences. in Nucleic acids research

 
Description The RNAcentral resource has had 17 public releases. 40 different databases contribute their RNA data. The number of users has been steadily increasing. RNAcentral has developed methods and interfaces to search the millions of sequences in the database. In this grant, we have developed a process by which other RNA databases and resources can use these RNAcentral searches in their own web pages. The miRBase database (funded under BB/M011275/1) has implemented this search interface.
Exploitation Route RNAcentral is used by RNA researchers around the world. Its continued availability is vital to many different research programmes in academia and industry.
Sectors Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://rnacentral.org/
 
Description This grant funds the development of a sequence database, RNAcentral (http://rnacentral.org/). RNAcentral is used by researchers around the world as a primary sequence database of RNA gene sequences, and is the only resource of its kind. The uses are wide-ranging, but include commercial and pharmaceutical organisations with interests in non-coding RNA sequence and function. The papers describing RNAcentral have been cited over 320 times (Google Scholar).
Sector Education,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Title RNAcentral 
Description RNAcentral aims to collect all known non-protein-coding RNA sequences in a single resource. 
Type Of Material Database/Collection of data 
Year Produced 2014 
Provided To Others? Yes  
Impact RNAcentral has brought together over 40 independent expert RNA databases. There have been 8 releases of the database. 
URL http://rnacentral.org/