miRBase: the microRNA database

Lead Research Organisation: University of Manchester
Department Name: School of Biological Sciences

Abstract

MicroRNAs are tiny RNA sequences that are found in the cells of all animals and plants. Their role is to regulate the production of proteins from other genes in the genome - it is predicted that the 2500 known microRNAs in the human genome regulate around two thirds of the 20000-25000 human genes, for example. MicroRNAs have been shown to be important in essentially all functions in the cell and in a large number of diseases, including cancer and neurodegenerative disorders.

MicroRNAs were only discovered to be so widespread in 2001. The miRBase database was established soon after to catalog microRNAs as they are discovered, and to provide a range of services to the microRNA community. For example, miRBase assigns names to new microRNA discoveries, to ensure that the same microRNA is referred to by the same name in different publications. miRBase also acts as the central resource from which the most up-to-date set of microRNAs can always be retrieved, together with their locations in the genome, the methods by which they were found, the scientific publications that describe them, and links to other databases that contain related information. These functions have helped the microRNA field to grow at a tremendous rate. In 2013 alone, there were 8400 scientific publications about microRNAs. miRBase currently contains entries on 35828 microRNAs in 223 species.

miRBase is used by essentially every microRNA researcher in the world. These users include those who study the roles of microRNAs in disease and other cellular processes, but also companies who make kits and resources to allow experimental biologists to detect and manipulate microRNAs. The miRBase website is used by around 40000-50000 different users each month, and miRBase has been mentioned in over 8000 scientific publications.

The work that will be carried out under this proposal will ensure the continued availability of the essential functions of the miRBase database. It will also provide for the next phase of development of the resource. In particular, we will develop web tools that allow microRNA researchers to access and use the huge quantities of microRNA data that are being produced by so-called deep sequencing methods. We will also expand the database to include information about the genes that are regulated by microRNAs, again from deep sequencing data.

Technical Summary

miRBase is an online database that catalogs all published microRNA sequences, and is responsible for the assignment of gene names to novel microRNA discoveries. The database (then called the MicroRNA Registry) was founded in 2002 to provide these and other functions to the growing microRNA community. It quickly became the essential resource for all microRNA researchers, providing the gold standard set of microRNA gene annotations, access to the underlying deep sequencing datasets that support those annotations, genome coordinates, and links to other resources and the primary literature. The enormous growth of the microRNA field has been made possible by miRBase's role in organising and facilitating access to the wealth of available microRNA data, and the community adoption of miRBase as the arbiter of gene names and the primary source of microRNA sequence data has been absolute. Many derived microRNA databases and resources exist, and a number of commercial organisations produce experimental tools such as microRNA arrays, qPCR assays, and microRNA mimics and inhibitors; all depend on the miRBase database.

This proposal provides for the continued availability and maintenance of the miRBase core functions, and the next phase of development. In particular, we will expand our use of aggregated publicly available deep sequencing datasets to allow users to perform analyses of microRNA expression, for example in disease and healthy tissues, and between different species or developmental stages. We will also re-purpose our existing deep sequencing analysis pipelines to analyse and make available datasets from new sequencing techniques (CLIPseq) that experimentally identify microRNA target sites. We will adopt a versioning system that allows users to more easily track and understand changes to microRNA annotations over time, and investigate the use of text mining approaches to automatically collect microRNA functional information from the scientific literature.

Planned Impact

The primary impact of the miRBase database derives from its core activities: assigning gene names to novel microRNA discoveries, and providing a complete and up-to-date catalog of all published microRNA sequences. As such, the miRBase database directly or indirectly impacts all microRNA work world-wide, including derived databases and resources, gene nomenclature bodies, vendors of microRNA experimental tools, and down-stream clinical applications, in addition to academic research groups.

Commercial organisations

A number of commercial companies (we track over 20) make and sell experimental resources for the detection and manipulation of microRNAs, including qPCR assays, microarrays and microRNA mimics and inhibitors. These companies include Life Technologies, Exiqon, LC Sciences, Sigma-Aldrich, CBC, and Agilent. All use miRBase as the source of sequences from which to build their products. The companies themselves therefore directly benefit from the availability of the complete microRNA set from miRBase. Many companies follow a model where they sell off-the-shelf products for microRNAs that are in miRBase at a cheaper price than custom products for microRNAs that are not. For example, Life Technologies' TaqMan assays cost around £120 for sequences in miRBase, and £220 for sequences not in miRBase. Curation and regular release of microRNA sequences in the miRBase database therefore reduces costs for the customers.

Several pharmaceutical companies have active research streams investigating the use of microRNAs as biomarkers, and drugs that target microRNAs are showing clinical promise. For example, miRagen Therapeutics has drugs for chronic heart failure (miR-208), post-myocardial infarction remodelling (miR-15/195) and cardiometabolic disease (miR-378) in pre-clinical development. Santaris Pharma has a phase IIa clinical trial drug (miravirsen) that acts by inhibiting miR-122. Again, miRBase is the point of reference for annotation that is the starting point for development of any microRNA clinical application.

Influencing policy

The maintenance of the microRNA gene nomenclature system influences other publicly funded bodies. For example, miRBase is recognised as the arbiter of microRNA gene names by the Human (HGNC) and Mouse (MGI) Gene Nomenclature Committees, by the Refseq database at NCBI, and by the International Union of Pharmacology (NC-IUPHAR).

Public impact

The public impact of the miRBase database is indirect. However, the development of every down-stream application of microRNA technologies, including the clinical applications described above, can ultimately be traced back to the presence of a microRNA sequence in miRBase. The public will therefore benefit in the long-term from the advances enabled by the miRBase database, including in quality of life and health. We will introduce a "for the public" section of the website to describe these links and the importance of both microRNA technologies and biological sequence database in down-stream applications.

Publications

10 25 50
 
Description We have developed new methods and interfaces to extract functional information about microRNAs from the scientific literature. Sentences that contain microRNA gene names are extracted and scored for functional information, and can be viewed and sorted on the miRBase website. These developments are described in Kozomara et al, Nucleic Acids Res, 2019.
Exploitation Route Functional information about microRNAs is useful to all microRNA researchers, including in academic, clinical, pharmaceutical, and other commercial settings.
Sectors Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL http://mirbase.org/
 
Description The miRBase database is used by >20000 researchers each month, from both academia and commercial sectors. The papers describing miRBase have been cited >16000 times (Google Scholar). The uses are wide-ranging, but include commercial development of kits and tools for microRNA research, and investigation of clinical applications of microRNA work. All microRNA studies start from miRBase sequence data.
First Year Of Impact 2003
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Economic,Policy & public services

 
Title miRBase 
Description miRBase is the primary database of microRNA sequences and annotation. http://mirbase.org/ 
Type Of Material Database/Collection of data 
Provided To Others? Yes  
Impact miRBase is used by microRNA researchers worldwide. The website has more than 20000 unique users per month, and the papers describing miRBase have been cited more than 15000 times. 
URL http://mirbase.org/