RNA proposal title - The RNAcentral database of non-coding RNAs

Lead Research Organisation: EMBL - European Bioinformatics Institute
Department Name: Ensembl Genomes

Abstract

In molecular biology, the central dogma says that the genes in DNA code for RNA. RNA molecules are then translated into proteins that are the mini-machines that carry out the main processes in the cell. It is only recently apparent that many genes code for RNAs, which are not translated into proteins, and which carry out important functions in the cell as RNA. These molecules are often known as non-coding RNAs. Much of the focus in biology has been on DNA and proteins, but recently there has been a surge of interest in non-coding RNAs. In fact, the mini-machine that makes proteins from RNA, called the ribosome, has itself been shown to be an RNA-based machine. Non-coding RNAs have also been shown to be widely involved in regulating the levels of other genes.

Research and innovation in the area of non-coding RNAs, and in molecular biology more generally, is hampered by the lack of an authorative and comprehensive resource collecting together all known non-coding RNAs. There are over 20 different online databases that contain information about different types of RNA molecules. Each of these resources makes their information available in different ways. The scattered nature of these resources makes it nearly impossible for biologists to discover what is known about non-coding RNAs related to their research area. In this proposal, we will create a new online resource to collect together information about non-coding RNAs. This resource, called RNAcentral, will be a central warehouse for holding many types of information. The most important information stored is called the sequence of the RNA. Many existing RNA resources (called RNAcentral Expert Databases) will provide their data to RNAcentral using software and interfaces that will be created as part of this proposal. One specific expert database, called miRBase, based at the University of Manchester, will test out the systems for providing data to RNAcentral. RNAcentral will hold the common information about each type of RNA. For more specialised information, RNAcentral will provide links back to the RNAcentral Expert Databases.

In order to make the RNAcentral resource cost effective, we will be reusing and modifying code that is already in use by the European Nucleotide Archive and Ensembl Genomes. These two databases are based at the European Bioinformatics Institute near Cambridge. By the end of this project, researchers from around the UK and the rest of the world will have access to a single resource of RNA sequence information. This information will be freely available in a variety of ways including via a website and as a downloadable database.

Technical Summary

We will create a federated database and associated web portal, RNAcentral, to accession, store and represent non-coding RNA sequence data. A database repository (using the Oracle Relational Database Management System) will be constructed as an extension to the European Nucleotide Archive, and new tools developed to facilitate the submission of RNA sequence. In addition to direct submission, the repository will also be populated through the development of import pipelines based on agreed standards for data representation with expert databases who have agreed to support the project (initially gtRNAdb, HGNC, lncRNAdb, miRBase, Modomics, piRNAbank, Pombase, Refseq, Rfam, the Ribosomal Database Project, RNAdb, sRNAmap, SRPDB, tmRDB, the tmRNA website and VEGA). A web portal will be developed (using the Drupal open-source content management system) providing access to the submitted and imported sequences, and providing links out to the expert resources' own sites. A data warehouse (using a common biological data warehousing tool such as BioMart or InterMine) will also be developed, and bulk downloads of sequence sets will be provided. In the second period of the project, we will develop further pipelines to identify redundancy among submissions, assign submitted sequences to defined families, and (with the aid of prediction tools such as Rfam and RNAmmer, and in collaboration with genomic and model organism resources) systematically provide complete sets of non-coding RNA annotations across all complete genomes. The resources developed under this proposal will serve as the core infrastructural component of a wider international initiative to coordinate work on functional RNAs.

Planned Impact

RNAcentral will provide an underpinning resource contributing indirectly to all BBSRC strategic objectives: food security, biofuels, industrial biotechnology and human health. It will be used by members of diverse life science research communities, ranging from bioinformaticians, to experimental biologists, to academic clinicians. RNAcentral will have an important impact in applications such as biotechnology, therapeutics, agriculture and ecology. The need for RNAcentral has become critical through the huge growth in discovery of non-coding RNAs from next generation sequencing. By capturing and disseminating this valuable knowledge, we will be directly addressing the BBSRC's enabling themes, data driven biology, systems approaches to biosciences and synthetic biology. A fundamental part of the latter two themes is a complete "parts list" for each genome, and RNAcentral will help move science towards that goal and allow researchers to find all RNA genes in an organism easily.

RNAs hold great hope for ever-wider clinical and biotechnological applications. For example, microRNAs have been implicated as diagnostic signatures for cancer, snoRNAs in the major Prader-Willi phenotypes, bacterial small RNAs in pathogenicity, plant small RNAs in hybrid necrosis, and ribozymes in the cleavage of specific target RNAs. Again, improved annotation of and access to RNA data will improve the discovery and utilization of novel RNA targets for diagnostics and drug targets. There is intense research in the field of RNA based therapeutics and they hold some promise to improve health and welfare internationally.

A number of commercial organisations manufacture experimental resources, for example microarrays, based on up-to-date gene annotation. Some resources have also been made available for specific classes of non-coding RNA gene; for example, several companies make microRNA detection kits. The companies themselves will therefore benefit from improved annotation of non-coding RNAs, and these resources underpin experimental studies in commercial and academic organisations. Along with the more clinical aspects described above RNAcentral will help to foster wealth creation through innovative application of RNA sequence information.

Non-coding RNAs such as ribosomal RNAs have long been used as a tag to identify species. Application of high throughput sequencing has opened up opportunities to understand biodiversity on an unprecedented scale. By better understanding biodiversity and how it is being changed will enhance our ability to manage and conserve the world's great natural genetic resources.

Having all known non-coding RNA sequences in a single resource will give allow for a much easier overview of the growth and impact of RNA data. For example one will be able to compare the number of RNA genes versus protein coding genes in a genome. This will allow policy makers and funders to better gauge the scale of support needed to maximise output compared to other priorities.

Publications

10 25 50
publication icon
Pakseresht N (2014) Assembly information services in the European Nucleotide Archive. in Nucleic acids research

publication icon
RNAcentral Consortium (2015) RNAcentral: an international database of ncRNA sequences. in Nucleic acids research

publication icon
Silvester N (2018) The European Nucleotide Archive in 2017. in Nucleic acids research

publication icon
The RNAcentral Consortium (2017) RNAcentral: a comprehensive database of non-coding RNA sequences. in Nucleic acids research

 
Description A new resource, RNAcentral (http://rnacentral.org) has been developed for accessing information about non coding RNAs (i.e. functional RNA molecules that do not encode for proteins). Over the course of 6 releases, information was integrated from 23 collaborating databases, and over 12,000,000 RNAs sequences were included in the resource. Tools have been developed for clustering similar RNAs, exploring taxonomic distributions, searching for data by sequence or description, and for visualising non-coding RNA genes on the genome. As of January 2019, the database holds 14,476,418 sequences.
Exploitation Route Non-coding RNAs (ncRNAs) have a vital role in biology but hitherto there have been few resources dedicated to them and it has been hard for researchers to find, search and download all relevant data. With the increasing prevalence of whole-genome and transcriptome sequencing, however, the number of known ncRNAs has significantly increased. RNAcentral makes this data available to researchers through an integrated portal for the first time, making exploration of all relevant data for the first time.
Sectors Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://rnacentral.org
 
Description The bioinformatics company Era7 uses RNAcentral to build a reference sequence database for metagenomics analysis, which is available in open-source and paid-for versions.
First Year Of Impact 2017
Sector Environment,Healthcare
Impact Types Economic

 
Description Bioinformatics and Biological Resources
Amount £843,027 (GBP)
Funding ID BB/N019199/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 01/2017 
End 12/2019
 
Title RNAcentral 
Description A new database providing access to data about non-coding RNAs. The website has been continuously improved during 2016 with new features including a redesigned homepage, more relevant search results, and a lightweight genome browser. Over 1,000,000 new sequences were added in 2016. 
Type Of Material Database/Collection of data 
Year Produced 2014 
Provided To Others? Yes  
Impact RNAcentral currently integrates data from 26 RNA-centric resources, with 4 new databases added in 2017. Having established itself as the leading resource for RNA sequence data, RNAcentral now also provides additional analysis by annotating all sequences with Rfam families and using these annotations for quality control. RNAcentral is regularly updated with 8 releases made available since its launch in 2014. The annual RNAcentral consortium meeting brings together numerous individual databases and provides a forum for discussion of common interests and priorities. The RNAcentral website had approximately 20,000 users in 2017 (measured by unique IP address). 
URL http://rnacentral.org
 
Description A presentation on "RNAcentral: The Non-coding RNA Sequence Database" to the UK RNA meeting. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A presentation on "RNAcentral: The Non-coding RNA Sequence Database" was made to the UK RNA meeting.
Year(s) Of Engagement Activity 2016
URL https://rnauk2016.wordpress.com
 
Description Presentation in EMBL-EBI workshop on "Databases for microRNA and lncRNA Biology" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact RNAcentral was presented and demonstrated at the EMBL-EBI workshop on "Databases for miRNA and lncRNA Biology".
Year(s) Of Engagement Activity 2016
URL http://blog.rnacentral.org/2016/09/upcoming-workshop-databases-for-mirna.html
 
Description Presentation on "Bioinformatic Resources for ncRNA Analysis: RNAcentral and Rfam" at ELIXIR meeting in Aarhus. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A oresentation on "Bioinformatic Resources for ncRNA Analysis: RNAcentral and Rfam" at ELIXIR meeting at the Technical University of Denmark, Aarhus.
Year(s) Of Engagement Activity 2016
 
Description Presentation on "Bioinformatic Resources for ncRNA Analysis: RNAcentral and Rfam" at St. Petersburg State University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A presentation on "Bioinformatic Resources for ncRNA Analysis: RNAcentral and Rfam" was delivered at St. Petersburg State University, Russia.
Year(s) Of Engagement Activity 2016
URL https://bioseminars.wordpress.com/2016/04/11/bioinformatucs-ncrna/
 
Description Presentation on "RNAcentral and Rfam: databases for non-coding RNA sequences and RNA families" at the Adam Mickiewicz University. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A seminar was delivered on "RNAcentral and Rfam: databases for non-coding RNA sequences and RNA families" at the Adam Mickiewicz University in Poznan, Poland.
Year(s) Of Engagement Activity 2016
URL http://know-rna.amu.edu.pl/lecture-dr-petrov/
 
Description Presentation on "RNAcentral and Rfam: databases for non-coding RNA sequences and RNA families" at the University of Toronto. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A seminar was given on "RNAcentral and Rfam: databases for non-coding RNA sequences and RNA families" at the University of Toronto.
Year(s) Of Engagement Activity 2016
URL https://torbug.org/schedule
 
Description Presentation on "Rfam and RNAcentral: Tools for understanding the RNA universe" at the University of Cambridge 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A seminar was delivered on "Rfam and RNAcentral: Tools for understanding the RNA universe" at the University of Cambridge.
Year(s) Of Engagement Activity 2016
URL http://talks.cam.ac.uk/talk/index/64142
 
Description Talk and demonstration on "Introduction to RNAcentral, a non-coding RNA sequence database" at Mahidol University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A talk and demonstration were given on the subject, "Introduction to RNAcentral, a non-coding RNA sequence database" at Mahidol University, Bangkok, Thailand.
Year(s) Of Engagement Activity 2016
 
Description Talk and interactive session on "Searching and accessing non-coding RNA sequences with RNAcentral and Rfam" at EMBL-EBI workshop on "Exploring Biological Sequence Data" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A talk and interactive workshop on "Searching and accessing non-coding RNA sequences with RNAcentral and Rfam" were delivered at the EMBL-EBI workshop on "Exploring Biological Sequence Data"
Year(s) Of Engagement Activity 2016
URL http://www.ebi.ac.uk/training/events/2016/exploring-biological-sequence-data
 
Description Talk on RNAcentral at the Benasque meeting on Computational Analysis of RNA Structure and Function 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A presentation on RNAcentral was made at the Benasque meeting on Computational Analysis of RNA Structure and Function
Year(s) Of Engagement Activity 2015
URL http://benasque.org/2015rna/cgi-bin/talks/allprint.pl
 
Description Talk/demonstration on RNAcentral at EBI training course on Online resources for Non-Coding RNA 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A talk and demonstration on RNAcentral at the EBI training course on Online resources for Non-Coding RNA
Year(s) Of Engagement Activity 2015
 
Description Webinar on RNAcentral: an international databases of ncRNA sequences 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A Webinar on "RNAcentral: an international databases of ncRNA sequences". The webinar remains available online.
Year(s) Of Engagement Activity 2015
URL http://www.ebi.ac.uk/training/online/course/rnacentral-webinar