RNAcentral, the RNA sequence database

Lead Research Organisation: EMBL - European Bioinformatics Institute
Department Name: Sequence Database Group

Abstract

In molecular biology, the central dogma explains that the genes in DNA code for RNA. RNA molecules are then translated into proteins that are the mini-machines that carry out the main processes in the cell. Recently it has become apparent that potentially many thousands of human genes code for RNAs that are not translated into proteins, but rather carry out important functions in the cell as RNA. These molecules are often known as non-coding RNAs. Much of the focus in biology over the past thirty years of research has been on DNA and proteins, but recently there has been a surge of interest in non-coding RNAs. In fact, the core of the machine that makes proteins from RNA, called the ribosome, has itself been shown to be made of RNA. Non-coding RNAs have also been shown to be widely involved in regulating the levels of other genes and may be useful in making treatments for patients with a variety of diseases. The role of non-coding RNAs in plant and animal development is evident, but a deeper understanding of the biology is essential, thereby allowing their modulation to enhance features such as yield or resistance to diseases. Unsurprisingly, aberrant expression of non-coding RNAs has also been implicated in numerous disease states.

Research and innovation in the area of non-coding RNAs, and in molecular biology more generally, is hampered by the lack of an authoritative and complete resource collecting together all known non-coding RNAs. There are over 30 different online databases that contain information about different types of RNA molecules. Each of these resources makes their information available in different ways. The scattered nature of these resources has made it nearly impossible for biologists to discover what is known about non-coding RNAs related to their research area. To address this problem we created a resource called RNAcentral that brings together information from all the different RNA databases in one place. The most important information stored in RNAcentral is called the sequence of the RNA. Many existing RNA resources (called RNAcentral Expert Databases) have provided their data to RNAcentral. In this proposal we will add further more detailed information about the structure and function of RNAs into RNAcentral. We will work closely with one specific expert database, called miRBase, based at the University of Manchester, who will test out the system for searching the RNAcentral sequence database on specific subsets of RNAs.

By the end of this project, researchers from around the UK and the rest of the world will have access to an increased set of information about RNAs. This information will be freely available in a variety of ways including via a website and as a downloadable database. Having access to this information will help researchers connect RNAs into their work better to help them make new discoveries sooner.

Technical Summary

Under this proposal, we will continue the development of RNAcentral, an international database of non-coding RNA sequences, currently made up of sequence data contributed by 15 member databases. To make RNAcentral more comprehensive, we will import 21 additional ncRNA databases and carry out regular data releases. In addition to the core sequence data, our users care most about functional annotation of ncRNAs. We will therefore focus on incorporating additional types of annotations, such as high-quality secondary structures, inter-molecular interactions, GO and SO terms, and textual annotation from Wikipedia. We will map RNAcentral sequences onto appropriate reference genomes, and provide new functionality such as exploring overlapping sequences in the same species. New visualisations will be developed to display these new data, taking advantage of modern web technologies. In order to increase the sustainability of RNA databases worldwide, we will develop prototype RNAcentral infrastructure elements that we will make available to RNAcentral database contributors. To this end, we will develop an improved sequence search facility in collaboration with the miRBase database, and make this search available to them to search their sequence data and display the results on their own website using a RESTful API. This functionality will subsequently be made available to other RNAcentral databases. To disseminate information about RNAcentral, we will engage in outreach and training activities by hosting workshops, holding annual SAB meetings, and publishing biennial papers in the NAR Database Issue. RNAcentral, as a comprehensive repository of ncRNAs, will underpin a global effort to unravel the functions of ncRNAs.

Planned Impact

Non-coding RNAs are found in every living organism, and advances in ncRNA research, reflected in and supported by RNAcentral, will contribute to new applications in biotechnology, therapeutics, agriculture, and ecology. RNAcentral, as a comprehensive database of ncRNA sequences, indirectly contributes to all BBSRC strategic objectives: food security, biofuels, industrial biotechnology and human health. RNAcentral will be used by bioinformaticians and wet-lab scientists in both academia and industry working on all aspects of ncRNA Biology. As sequencing technologies become more advanced and new RNA structure probing technology emerge, there is a growing need to maintain a comprehensive and well-annotated collection of all ncRNAs.

By capturing and disseminating this valuable knowledge, we will be addressing the BBSRC's enabling theme of innovation, allowing industrial partners to make more rapid discoveries and inventions of benefit to society. RNAs hold great hope for ever-wider clinical and biotechnological applications. For example, microRNAs have been implicated as diagnostic signatures for cancer, snoRNAs in the major Prader-Willi phenotypes, bacterial small RNAs in pathogenicity, plant small RNAs in hybrid necrosis, and ribozymes in the cleavage of specific target RNAs. Again, improved annotation of and access to RNA data will improve the discovery and utilization of novel RNA targets for diagnostics and drug targets. There is intense research in the field of RNA based therapeutics and they hold some promise to improve health and welfare internationally. In the area of plant sciences we expect our annotations to be of use in genome engineering to improve disease resistance and crop yields. In addition, the ability to make RNAs in very large quantities has raise the idea of using RNA directly as a weed and pest control measure through crop spraying.

A number of commercial organisations manufacture experimental resources, for example microarrays, based on up-to-date gene annotation. Some resources have also been made available for specific classes of non-coding RNA gene; for example, several companies make microRNA detection kits. The companies themselves will therefore benefit from improved annotation of non-coding RNAs, and these resources underpin experimental studies in commercial and academic organisations. Along with the more clinical aspects described above RNAcentral helps to foster wealth creation through innovative application of RNA sequence information.

Non-coding RNAs such as ribosomal RNAs have long been used as a tag to identify species. Application of high throughput sequencing has opened up opportunities to understand biodiversity on an unprecedented scale. By better understanding biodiversity and how it is being changed will enhance our ability to manage and conserve the world's great natural genetic resources.

Having all known non-coding RNA sequences in a single resource gives a much easier overview of the growth and impact of RNA data. For example, one can compare the number of RNA genes versus protein coding genes in a genome. This will allow policy makers and funders to better gauge the scale of support needed to maximise output compared to other priorities.

Publications

10 25 50
publication icon
Harrison PW (2019) The European Nucleotide Archive in 2018. in Nucleic acids research

publication icon
The RNAcentral Consortium (2019) RNAcentral: a hub of information for non-coding RNA sequences. in Nucleic acids research

 
Description RNAcentral provides a comprehensive collection of non-coding RNA sequences that has already been used in two commercial applications. The Era7 Bioinformatics company uses RNAcentral to build a reference sequence database for metagenomics analysis (https://era7bioinformatics.com/en/page.cfm?id=464&title=microbiomes:-mg7). In addition ThermoFisher Scientific uses RNAcentral for designing and interpreting the results obtained using the human Clariom D microarrays (https://assets.thermofisher.com/TFS-Assets/LSG/brochures/EMI07313-2_DS_Clariom-D_solutions_HMR.pdf).
First Year Of Impact 2018
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Title RNAcentral 
Description RNAcentral is a database of non-coding RNA sequences. Since its launch in September 2014 it has had 4 public releases and now stores over 8 million non-coding RNA sequences. 
Type Of Material Database/Collection of data 
Year Produced 2014 
Provided To Others? Yes  
Impact We have currently had more than 20,000 unique users from 85 countries use the resource since its first public release in September 2014. The Gene Ontology consortium has adopted RNAcentral identifiers as a standard to which gene ontology terms are annotated. In addition, the Intact database of molecular interactions has adopted RNAcentral identifiers for non-coding RNA molecules. 
URL http://rnacentral.org/
 
Description Dictybase 
Organisation Northwestern University
Country United States 
Sector Academic/University 
PI Contribution RNAcentral integrated the dictyBase set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution dictyBase provided a set of ncRNA sequences and annotations for Dictyostelium discoideum, which is an important model organism.
Impact PMID:25352543
Start Year 2015
 
Description FlyBase 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution RNAcentral integrated the FlyBase set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution The FlyBase database provided RNAcentral with a set of ncRNA sequences and annotations from several Drosophila species.
Impact This will be described in a new RNAcentral paper published in the 2019 Database Issue of NAR.
Start Year 2017
 
Description GreenGenes 
Organisation University of Colorado
Country United States 
Sector Academic/University 
PI Contribution RNAcentral integrated the GreenGenes set of rRNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution The GreenGenes database provided RNAcentral with a set of rRNA sequences.
Impact PMID:25352543
Start Year 2015
 
Description LNCipedia 
Organisation University of Ghent
Country Belgium 
Sector Academic/University 
PI Contribution RNAcentral integrated the LNCipedia set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution LNCipedia provided RNAcentral with a set of human lncRNA sequences and annotations.
Impact PMID:25352543
Start Year 2015
 
Description Modomics 
Organisation International Institute of Molecular and Cell Biology
Country Poland 
Sector Public 
PI Contribution RNAcentral integrated the MODOMICS set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral. The locations of modified nucleotides are visualised on the RNAcentral pages, for example http://rnacentral.org/rna/URS00003833FF/9606
Collaborator Contribution MODOMICS provided RNAcentral with a set of rRNA and tRNA sequences containing modified nucleotides.
Impact PMID:25352543
Start Year 2015
 
Description NONCODE 
Organisation Chinese Academy of Sciences
Department Institute of Biophysics
Country China 
Sector Academic/University 
PI Contribution RNAcentral integrated the NONCODE set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution The NONCODE database provided a set of lncRNA sequences and annotations to RNAcentral.
Impact PMID:25352543
Start Year 2015
 
Description RDP 
Organisation Michigan State University
Country United States 
Sector Academic/University 
PI Contribution RNAcentral integrated the RDP set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution Ribosomal Database Project (RDP) provided RNAcentral with a high-quality subset of rRNA sequences.
Impact PMID:25352543
Start Year 2014
 
Description RefSeq 
Organisation National Center for Biotechnology Information (NCBI)
Country United States 
Sector Public 
PI Contribution RNAcentral integrated the RefSeq set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution RefSeq is a database of reference sequences maintained at NCBI. Since 2014 RefSeq has been providing RNAcentral with a set of non-coding RNA sequences and literature annotations.
Impact PMID:25352543
Start Year 2014
 
Description SGD 
Organisation Stanford University
Country United States 
Sector Academic/University 
PI Contribution RNAcentral integrated the SGD set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution The Saccharomyces Genome Database (SGD, https://www.yeastgenome.org) is the community resource for the budding yeast Saccharomyces cerevisiae. SGD provided RNAcentral with ncRNA sequences and annotations.
Impact PMID:25352543
Start Year 2015
 
Description SILVA 
Organisation Jacobs University Bremen
Country Germany 
Sector Academic/University 
PI Contribution RNAcentral integrated the SILVA set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution The SILVA database provided RNAcentral with a set of rRNA sequences and annotations.
Impact PMID:25352543
Start Year 2015
 
Description TAIR 
Organisation Phoenix Bioinformatics Corporation
PI Contribution RNAcentral integrated the TAIR set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution TAIR provided RNAcentral with a set of ncRNA sequences from Arabidopsis thaliana, which is an important model organism.
Impact PMID:25352543
Start Year 2015
 
Description snoPY 
Organisation University of Miyazaki
Country Japan 
Sector Academic/University 
PI Contribution RNAcentral integrated the snoPY set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution snoPY provided RNAcentral with snoRNA sequences and annotations for multiple species.
Impact PMID:25352543
Start Year 2015
 
Description tmRNA Website 
Organisation Sandia Laboratories
Country United States 
Sector Private 
PI Contribution RNAcentral integrated the tmRNA Website set of non-coding RNAs, assigned unique identifiers and made them searchable in RNAcentral.
Collaborator Contribution The tmRNA Website is a specialist database providing high-quality annotations of transfer-messenger RNAs. The tmRNA Website provided RNAcentral with a set of ncRNA sequences.
Impact PMID:25352543
Start Year 2014
 
Description Booth at RNA Society in Prague 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We hosted an exhibition booth where we engaged with the users and conducted interactive demos of the RNAcentral website using an iPad. This activity is useful to raise awareness about RNAcentral and get feedback directly from the users.
Year(s) Of Engagement Activity 2017
 
Description Meet the Scientist event organised by Social Mobility Foundation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Dr Anton Petrov participated in a Meet the Scientist event organised by the Social Mobility Foundation. The activity reached ~20 school students who learned about non-coding RNA and career in research.
Year(s) Of Engagement Activity 2018
 
Description RNAcentral ISMB talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Anton Petrov gave a talk about RNAcentral at the ISMB meeting in Prague.
Year(s) Of Engagement Activity 2017
 
Description RNAcentral PAG talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Blake Sweeney gave a talk about RNAcentral at the PAG conference in San Diego, CA.
Year(s) Of Engagement Activity 2018
 
Description RNAcentral RNAtion workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Dr Blake Sweeney led an interactive workshop about RNAcentral at the RNAtion conference in Poznan, Poland.
Year(s) Of Engagement Activity 2017
 
Description RNAcentral poster at ISMB in Chicago 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We presented a poster at a major international conference in Chicago which helped us engage with our users in the USA.
Year(s) Of Engagement Activity 2018
 
Description RNAcentral poster at RNA Society in Berkeley 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The poster presentation helped us engage with our users and tell them about the latest RNAcentral functionality.
Year(s) Of Engagement Activity 2018
 
Description RNAcentral poster at RNA UK 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact We presented a poster about the recent RNAcentral developments at the RNA UK meeting which brings together scientists working on RNA from across the UK.
Year(s) Of Engagement Activity 2018
 
Description RNAcentral talk at Non-coding Genome 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Blake Sweeney gave a talk at the Non-coding Genome conference in Heidelberg, Germany.
Year(s) Of Engagement Activity 2017
 
Description RNAcentral talk in Benasque 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We presented a talk about RNAcentral at an international meeting of RNA scientists and PIs held once in 3 years in Benasque, Spain. The talk helped us reach an expert audience and get valuable feedback.
Year(s) Of Engagement Activity 2018
 
Description Twitter 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact RNAcentral Twitter channel is an important channel for engagement with the users and general public. Launched in 2014, the account attracted hundreds of followers.
Year(s) Of Engagement Activity 2014,2015,2016
URL https://twitter.com/rnacentral