The RNAcentral database of non-coding RNAs

Lead Research Organisation: EMBL - European Bioinformatics Institute
Department Name: Protein Data Bank in Europe

Abstract

In molecular biology, the central dogma says that the genes in DNA code for RNA. RNA molecules are then translated into proteins that are the mini-machines that carry out the main processes in the cell. It is only recently apparent that many genes code for RNAs, which are not translated into proteins, and which carry out important functions in the cell as RNA. These molecules are often known as non-coding RNAs. Much of the focus in biology has been on DNA and proteins, but recently there has been a surge of interest in non-coding RNAs. In fact, the mini-machine that makes proteins from RNA, called the ribosome, has itself been shown to be an RNA-based machine. Non-coding RNAs have also been shown to be widely involved in regulating the levels of other genes.

Research and innovation in the area of non-coding RNAs, and in molecular biology more generally, is hampered by the lack of an authorative and comprehensive resource collecting together all known non-coding RNAs. There are over 20 different online databases that contain information about different types of RNA molecules. Each of these resources makes their information available in different ways. The scattered nature of these resources makes it nearly impossible for biologists to discover what is known about non-coding RNAs related to their research area. In this proposal, we will create a new online resource to collect together information about non-coding RNAs. This resource, called RNAcentral, will be a central warehouse for holding many types of information. The most important information stored is called the sequence of the RNA. Many existing RNA resources (called RNAcentral Expert Databases) will provide their data to RNAcentral using software and interfaces that will be created as part of this proposal. One specific expert database, called miRBase, based at the University of Manchester, will test out the systems for providing data to RNAcentral. RNAcentral will hold the common information about each type of RNA. For more specialised information, RNAcentral will provide links back to the RNAcentral Expert Databases.

In order to make the RNAcentral resource cost effective, we will be reusing and modifying code that is already in use by the European Nucleotide Archive and Ensembl Genomes. These two databases are based at the European Bioinformatics Institute near Cambridge. By the end of this project, researchers from around the UK and the rest of the world will have access to a single resource of RNA sequence information. This information will be freely available in a variety of ways including via a website and as a downloadable database.

Technical Summary

We will create a federated database and associated web portal, RNAcentral, to accession, store and represent non-coding RNA sequence data. A database repository (using the Oracle Relational Database Management System) will be constructed as an extension to the European Nucleotide Archive, and new tools developed to facilitate the submission of RNA sequence. In addition to direct submission, the repository will also be populated through the development of import pipelines based on agreed standards for data representation with expert databases who have agreed to support the project (initially gtRNAdb, HGNC, lncRNAdb, miRBase, Modomics, piRNAbank, Pombase, Refseq, Rfam, the Ribosomal Database Project, RNAdb, sRNAmap, SRPDB, tmRDB, the tmRNA website and VEGA). A web portal will be developed (using the Drupal open-source content management system) providing access to the submitted and imported sequences, and providing links out to the expert resources' own sites. A data warehouse (using a common biological data warehousing tool such as BioMart or InterMine) will also be developed, and bulk downloads of sequence sets will be provided. In the second period of the project, we will develop further pipelines to identify redundancy among submissions, assign submitted sequences to defined families, and (with the aid of prediction tools such as Rfam and RNAmmer, and in collaboration with genomic and model organism resources) systematically provide complete sets of non-coding RNA annotations across all complete genomes. The resources developed under this proposal will serve as the core infrastructural component of a wider international initiative to coordinate work on functional RNAs.

Planned Impact

RNAcentral will provide an underpinning resource contributing indirectly to all BBSRC strategic objectives: food security, biofuels, industrial biotechnology and human health. It will be used by members of diverse life science research communities, ranging from bioinformaticians, to experimental biologists, to academic clinicians. RNAcentral will have an important impact in applications such as biotechnology, therapeutics, agriculture and ecology. The need for RNAcentral has become critical through the huge growth in discovery of non-coding RNAs from next generation sequencing. By capturing and disseminating this valuable knowledge, we will be directly addressing the BBSRC's enabling themes, data driven biology, systems approaches to biosciences and synthetic biology. A fundamental part of the latter two themes is a complete "parts list" for each genome, and RNAcentral will help move science towards that goal and allow researchers to find all RNA genes in an organism easily.

RNAs hold great hope for ever-wider clinical and biotechnological applications. For example, microRNAs have been implicated as diagnostic signatures for cancer, snoRNAs in the major Prader-Willi phenotypes, bacterial small RNAs in pathogenicity, plant small RNAs in hybrid necrosis, and ribozymes in the cleavage of specific target RNAs. Again, improved annotation of and access to RNA data will improve the discovery and utilization of novel RNA targets for diagnostics and drug targets. There is intense research in the field of RNA based therapeutics and they hold some promise to improve health and welfare internationally.

A number of commercial organisations manufacture experimental resources, for example microarrays, based on up-to-date gene annotation. Some resources have also been made available for specific classes of non-coding RNA gene; for example, several companies make microRNA detection kits. The companies themselves will therefore benefit from improved annotation of non-coding RNAs, and these resources underpin experimental studies in commercial and academic organisations. Along with the more clinical aspects described above RNAcentral will help to foster wealth creation through innovative application of RNA sequence information.

Non-coding RNAs such as ribosomal RNAs have long been used as a tag to identify species. Application of high throughput sequencing has opened up opportunities to understand biodiversity on an unprecedented scale. By better understanding biodiversity and how it is being changed will enhance our ability to manage and conserve the world's great natural genetic resources.

Having all known non-coding RNA sequences in a single resource will give allow for a much easier overview of the growth and impact of RNA data. For example one will be able to compare the number of RNA genes versus protein coding genes in a genome. This will allow policy makers and funders to better guage the scale of support needed to maximise output compared to other priorities.

Publications

10 25 50
publication icon
RNAcentral Consortium (2015) RNAcentral: an international database of ncRNA sequences. in Nucleic acids research

publication icon
The RNAcentral Consortium (2017) RNAcentral: a comprehensive database of non-coding RNA sequences. in Nucleic acids research

 
Description We have developed a comprehensive non-coding RNA sequence database resource called RNAcentral. This resource provides a set of common identifiers across all types of non-coding RNAs. This will make it much easier for researchers to carry out large scale analyses of all types of RNAs. In addition having consistent identifiers will enable new research. For example it is now possible to annotate RNA sequences with GO terms using RNAcentral identifers.
Exploitation Route RNAcentral is like UniProt for RNA. A wide range of researchers will be able to use data in RNAcentral to create new software tools and databases. Biologists can use the data to better understand the RNA component of a genome and thus make new discoveries.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Energy,Environment,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Description RNAcentral is already used by the industry as a source of non-coding RNA sequence data. Since 2016 a Spain-based company Era7 Bioinformatics has been using RNAcentral to build a reference database for a custom computational pipeline called MG7 that can provide taxonomic assignments for large metagenomic datasets. More details can be found at https://era7bioinformatics.com/en/page.cfm?id=464
First Year Of Impact 2016
Sector Digital/Communication/Information Technologies (including Software),Pharmaceuticals and Medical Biotechnology
 
Description BBSRC BBR
Amount £685,407 (GBP)
Funding ID BB/N019199/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 06/2016 
End 06/2019
 
Title RNAcentral 
Description RNAcentral is a database of non-coding RNA sequences. Since its launch in September 2014 it has had 4 public releases and now stores over 8 million non-coding RNA sequences. 
Type Of Material Database/Collection of data 
Year Produced 2014 
Provided To Others? Yes  
Impact We have currently had more than 20,000 unique users from 85 countries use the resource since its first public release in September 2014. The Gene Ontology consortium has adopted RNAcentral identifiers as a standard to which gene ontology terms are annotated. In addition, the Intact database of molecular interactions has adopted RNAcentral identifiers for non-coding RNA molecules. 
URL http://rnacentral.org/
 
Description Dictybase 
Organisation Northwestern University
Country United States 
Sector Academic/University 
PI Contribution RNAcentral integrated the dictyBase set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution dictyBase provided a set of ncRNA sequences and annotations for Dictyostelium discoideum, which is an important model organism.
Impact PMID:25352543
Start Year 2015
 
Description GreenGenes 
Organisation University of Colorado
Country United States 
Sector Academic/University 
PI Contribution RNAcentral integrated the GreenGenes set of rRNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution The GreenGenes database provided RNAcentral with a set of rRNA sequences.
Impact PMID:25352543
Start Year 2015
 
Description LNCipedia 
Organisation University of Ghent
Country Belgium 
Sector Academic/University 
PI Contribution RNAcentral integrated the LNCipedia set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution LNCipedia provided RNAcentral with a set of human lncRNA sequences and annotations.
Impact PMID:25352543
Start Year 2015
 
Description Modomics 
Organisation International Institute of Molecular and Cell Biology
Country Poland 
Sector Public 
PI Contribution RNAcentral integrated the MODOMICS set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral. The locations of modified nucleotides are visualised on the RNAcentral pages, for example http://rnacentral.org/rna/URS00003833FF/9606
Collaborator Contribution MODOMICS provided RNAcentral with a set of rRNA and tRNA sequences containing modified nucleotides.
Impact PMID:25352543
Start Year 2015
 
Description NONCODE 
Organisation Chinese Academy of Sciences
Department Institute of Biophysics
Country China 
Sector Academic/University 
PI Contribution RNAcentral integrated the NONCODE set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution The NONCODE database provided a set of lncRNA sequences and annotations to RNAcentral.
Impact PMID:25352543
Start Year 2015
 
Description RDP 
Organisation Michigan State University
Country United States 
Sector Academic/University 
PI Contribution RNAcentral integrated the RDP set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution Ribosomal Database Project (RDP) provided RNAcentral with a high-quality subset of rRNA sequences.
Impact PMID:25352543
Start Year 2014
 
Description RefSeq 
Organisation National Center for Biotechnology Information (NCBI)
Country United States 
Sector Public 
PI Contribution RNAcentral integrated the RefSeq set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution RefSeq is a database of reference sequences maintained at NCBI. Since 2014 RefSeq has been providing RNAcentral with a set of non-coding RNA sequences and literature annotations.
Impact PMID:25352543
Start Year 2014
 
Description SGD 
Organisation Stanford University
Country United States 
Sector Academic/University 
PI Contribution RNAcentral integrated the SGD set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution The Saccharomyces Genome Database (SGD, https://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. SGD provided RNAcentral with ncRNA sequences and annotations.
Impact PMID:25352543
Start Year 2015
 
Description SILVA 
Organisation Jacobs University Bremen
Country Germany 
Sector Academic/University 
PI Contribution RNAcentral integrated the SILVA set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution The SILVA database provided RNAcentral with a set of rRNA sequences and annotations.
Impact PMID:25352543
Start Year 2015
 
Description TAIR 
Organisation Phoenix Bioinformatics Corporation
PI Contribution RNAcentral integrated the TAIR set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution TAIR provided RNAcentral with a set of ncRNA sequences from Arabidopsis thaliana, which is an important model organism.
Impact PMID:25352543
Start Year 2015
 
Description snoPY 
Organisation University of Miyazaki
Country Japan 
Sector Academic/University 
PI Contribution RNAcentral integrated the snoPY set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution snoPY provided RNAcentral with snoRNA sequences and annotations for multiple species.
Impact PMID:25352543
Start Year 2015
 
Description tmRNA Website 
Organisation Sandia Laboratories
Country United States 
Sector Private 
PI Contribution RNAcentral integrated the tmRNA Website set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution The tmRNA Website is a specialist database providing high-quality annotations of transfer-messenger RNAs. The tmRNA Website provided RNAcentral with a set of ncRNA sequences.
Impact PMID:25352543
Start Year 2014
 
Description Booth at RNA Society in Kyoto 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We hosted an exhibition booth at the RNA Society meeting held in Kyoto, Japan. The conference provided an opportunity to engage with existing users and attract new audiences.
Year(s) Of Engagement Activity 2016
 
Description Dark Daily Interview 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact The DARK Daily website that covers laboratory medicine and laboratory management news reached out to the RNAcentral team to provide a comment about the launch of RNAcentral.
Year(s) Of Engagement Activity 2015
URL https://www.darkdaily.com/with-launch-of-rnacentral-database-pathologists-now-have-unprecedented-acc...
 
Description RNAcentral online training course 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact An online training course was launched to show how to use the RNAcentral database, allowing anyone to learn about RNAcentral at their own pace from anywhere in the world.
Year(s) Of Engagement Activity 2016
URL https://www.ebi.ac.uk/training/online/course/rnacentral-exploring-non-coding-rna-sequences
 
Description RNAcentral press release 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact The EMBL-EBI published a press release announcing the first official version of RNAcentral, the unified resource for non-coding RNA sequences. The press release was used by several blogs and media outlets to raise awareness of the new RNA database.
Year(s) Of Engagement Activity 2014
URL https://www.ebi.ac.uk/about/news/press-releases/rnacentral-launch
 
Description RNAcentral talk in Odense 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Anton Petrov gave a talk about RNAcentral in Odense, Denmark at the Danish Bioinformatics Conference. This sparked a discussion with Prof Jan Gorodkin (University of Copenhagen) about integrating the Elixir RNA Tools registry with RNAcentral.
Year(s) Of Engagement Activity 2016
 
Description RNAcentral webinar 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The EMBL-EBI Training team hosted a live webinar with Anton Petrov, the RNAcentral developer, who gave an overview of the RNAcentral project and answered questions from the audience.
Year(s) Of Engagement Activity 2015
URL https://www.youtube.com/watch?v=SSLEgu5R6qw
 
Description Talk at Benasque 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Anton Petrov presented a talk and live demonstration of the newly launched RNAcentral website at the Computational Analysis of RNA Structure and Function meeting held in Benasque, Spain. The meeting attracts prominent RNA scientists, and the talk made them aware of RNAcentral and how it can be used in their work.
Year(s) Of Engagement Activity 2015
URL http://benasque.org/2015rna/cgi-bin/talks/allprint.pl
 
Description Talk at Cambridge RNA Club 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Dr Anton Petrov gave a talk at Cambridge RNA Club at Gurdon Institute, Cambridge. The event provided an opportunity to interact with postgraduate and undergraduate students from University of Cambridge and raise awareness of RNAcentral.
Year(s) Of Engagement Activity 2016
 
Description Training course 2 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Held on October 19th, 2016, the event focused on resources for long non-coding RNAs and microRNAs and explored what data are available in databases like GENCODE, Vega, miRBase, RNAcentral, and others. The interactive sessions consisted of short presentations explaining how and when to use each resource, followed by hands-on exercises and an opportunity to ask questions of the database developers.
Year(s) Of Engagement Activity 2016
URL http://blog.rnacentral.org/2016/09/upcoming-workshop-databases-for-mirna.html
 
Description Training workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A one-day training course was held at the Wellcome Genome Campus near Cambridge, UK, which introduced the audience to online resources for non-coding RNA, including RNAcentral, miRBase, Rfam, and GENCODE. The course gave an overview of the types of ncRNA data that are available in each resource and introduced the tools for searching and exploring the data. The course provided an opportunity for direct interaction between the users and database developers.
Year(s) Of Engagement Activity 2015
URL http://blog.rnacentral.org/2015/04/new-training-course-online-resources.html
 
Description Twitter 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact RNAcentral Twitter channel is an important channel for engagement with the users and general public. Launched in 2014, the account attracted hundreds of followers.
Year(s) Of Engagement Activity 2014,2015,2016
URL https://twitter.com/rnacentral