Rfam: the community resource for RNA families

Lead Research Organisation: University of Manchester
Department Name: School of Biological Sciences

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Established in 2002, Rfam is a database of RNA families that contains manually curated multiple sequence alignments and covariance models that can be used to find RNAs in genomic sequences. Rfam data has been widely used by the RNA community for genome annotation and algorithm development. In this proposal, we will develop Rfam to address the needs of three important and diverse user communities. First, we will enhance the annotations of all RNA families with known 3D structures by incorporating more accurate consensus secondary structures and pseudoknots based on experimentally determined structures. We will also employ the newly released R-scape software to improve secondary structures based on covariation analysis even in the absence of 3D data. Second, we will collaborate with the miRBase microRNA database to develop a comprehensive set of microRNA precursor families. This will enable miRBase to use Rfam to maintain the microRNA family classification, and Rfam will be able to annotate sequences with microRNA families from miRBase. Third, we will work with the European Viral Bioinformatics Center to expand the coverage of conserved viral RNA structures. By creating a comprehensive set of viral RNA families, we will enable scientists to detect viral sequences (this is particularly applicable to metagenomic datasets), as well as improve our understanding of viral recombination. Altogether, these efforts will expand the number of families by over 80%. The improvements gained by collaborating closely with these three user communities will be beneficial to Rfam users overall. We will disseminate information about the latest Rfam developments by engaging in outreach and training activities, including Docker-based tutorials using containers to simplify access to the Rfam software. These combined new developments will enable Rfam to spearhead a global effort aimed at understanding the biological functions and roles of ncRNAs.

Planned Impact

Rfam is a resource that contributes to researchers involved in all BBSRC strategic priorities but primarily data driven biology and systems approaches to the biosciences. Rfam will be used extensively by the life sciences community, including bioinformaticians, wet-lab researchers, and clinicians. The huge growth in data produced by new sequencing technologies means that it is now more important than ever to empower researchers with tools and resources to help them interpret their data to provide a complete listing of all biological entities found within it.

Rfam is the only resource currently capable of identifying a wide range of non-coding RNA homologs in sequence data, which will be of great benefit to scientists analysing newly sequenced genomes and to all model organism databases, from Flybase, PomBase, to the even more comprehensive Ensembl and Ensembl Genomes. Moreover, a subset of Rfam models are also being used within the field of metagenomics, for annotating rRNAs at scale (e.g. MGnify). Many of the resources benefiting from the Rfam data are based in the UK, thus contributing to the UK's international reputation as a leader in bioscience.

In addition to benefiting all Rfam users by continuing the development of a widely used community resource, the specific changes proposed in this project will have a beneficial impact on 3 specialised Rfam user communities. First, Rfam is used for developing and testing of new algorithms for RNA 2D and 3D structure prediction. The improvements in Rfam annotations using the information from RNA 3D structure will translate to the improvements in the accuracy of software developed using Rfam. Second, thousands of miRBase users will benefit from an enhanced classification of microRNAs powered by Rfam. In addition, the new Rfam microRNA annotations will be used by the resources that rely on Rfam for genome annotation such as Ensembl, Ensembl Genomes, and NCBI Eukaryotic Gene Annotation pipelines. Third, the expansion of viral RNA families in Rfam will benefit the European Viral Bioinformatics Center, including its UK members, and the rest of the virology research community. Conserved viral RNA structures are essential for various stages of viral life cycle, for protection against exonucleases and avoiding immune response (for example, an alternatively folding RNA structure in 3'-UTR of dengue virus modulates immune reactions in both humans and insects). Having a comprehensive library of viral families in Rfam will enable the detection of these RNA structures in viral and metagenomic sequences.

Rfam data can ensure scientists have a more complete picture of the "parts list" involved in constructing each genome and better understand the roles that ncRNA play in gene regulation. We have only recently begun to understand the role that ncRNAs play in health and disease. For example, microRNAs are deregulated in cancer, snoRNAs are silenced in Prader-Willi syndrome while plant microRNAs play important roles in immune responses against viruses. There are also significant research efforts into RNA-based therapeutics, which are promising tools to improve health and welfare. The Innovate UK Medicines Discovery Catapult has an ongoing project to identify novel therapeutics targets, one of which is specifically aimed at RNA families. Rfam is a crucial resource for such studies, allowing similarities in RNAs between organisms to be studied and providing researchers with search tools to identify previously unknown ncRNA homologs.

Publications

10 25 50
 
Description This award funded work to improve entries in the Rfam database to represent a class of RNA molecules called microRNAs. MicroRNAs are curated by a second database, called miRBase. miRBase and Rfam have worked together under this award to improve and synchronise their data, and to produce systems to automatically keep both resources up-to-date.
Exploitation Route The entries corresponding to microRNA families are available to all through the Rfam website (https://rfam.xfam.org/), and will soon be available through miRBase (https://mirbase.org/).
Sectors Pharmaceuticals and Medical Biotechnology

 
Title Rfam database of RNA families 
Description Rfam is a collection of multiple sequence alignments and covariance models representing families of non-protein coding RNA sequences. 
Type Of Material Database/Collection of data 
Provided To Others? Yes  
Impact Rfam is a core RNA bioinformatics resource, used by thousands of RNA researchers around the world. 
URL https://rfam.org/