Towards a new genomic standard for describing our complete genome collection

Lead Research Organisation: NERC Centre for Ecology and Hydrology
Department Name: Hails


There is increasing recognition that the scientific community at large would benefit from the development of a new standard to capture a richer set of information describing complete genomes. This would ensure that those generating the genomes contributed to the quality and quantity of information (metadata) available. Rapidly-evolving high throughput genomic sequencing technologies are generating data at an exponentially-increasing rate. This poses new opportunities, but also new challenges. The traditional approach to genomic sequencing has been on a per 'species' basis. However, an increasing number of 'genomes' are now being sequenced that represent not only individuals of cultivated and uncultivated organisms, but also populations and communities from environmental samples. Clearly, for adequate interpretation of this type of data simply recording only the most basic information is no longer sufficient. The time to act is now, as this deluge of data is only set to increase, especially with the emergence of ultra-high throughput sequencing capabilities. Community-driven standards have the best chance of success if developed within the auspices of international working groups. The Genomic Standards Consortium (GSC) has recently been formed (Sept, 2005) to tackle this issue. Participants in the GSC include biologists, computer scientists, those building genomic databases and conducting large-scale comparative genomic analyses, and those with experience of building community-based standards. The mission of this GSC is to work with the wider community towards: (1) the implementation of a new genomic standard (2) methods of capturing and exchanging metadata (3) harmonization of metadata collection and analysis efforts across the wider genomics community Consensus-building activites are most powerful when combined with the ability to implement the resulting recommendations of a group. In this proposal we are seeking funding to begin implementation activities on behalf of the GSC community. Specifically, we are requesting funding for a programmer to build the GSC's proposed Genome Catalogue and populate it with genome descriptions compliant with the 'Minimal Information about a Genome Sequence' specification. We also aim to simplify metadata capture and exchange by working towards defining a rich vocabulary for the description of genomes (in the form of contributions to existing and novel ontologies). Finally, we will design and implement a Genomic Metadata Exchange Format (GnoME) and use it to improve our existing GenomeMine database to serve as a community archive of contributed datasets of curated and calculated metadata describing one or more genomes. Combined these efforts will pave the way towards the creation of significant new research tools and will facilitate future comparative genomic, and in particular, comparative eco-genomic, studies.


