Building a genome analytic resource for the lepidopteran community

Lead Research Organisation: University of York
Department Name: Biology

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Top-tier databases such as ENSEMBL Genomes do not have the resource, domain-specific expertise and reach to nurture high-quality databasing of emerging genomes. It is thus proposed that focused Tier 2 databases are established that act as community aggregative databases, delivering focused support to their user groups, and also feed quality controlled data up to the central aggregative databases such as ENSEMBL Genomes. Here we propose the establishment of a Tier 2 database for Lepidoptera, LepBase, that will capitalize on the leading position of UK research groups (largely funded by BBSRC) in the rapidly expanding field of lepidopteran genomics.

We will develop a range of tools and resources that will benefit wider research communities using Lepidoptera as model species or where better-organised lepidopteran genomic data can make a difference. Code and pipelines developed during the project are likely to be of much wider utility, and LepBase will serve as a model of Tier 2 aggregative genome databases.

We will first install and test the ENSEMBL code base, and develop 'standard' ENSEMBL instances for Heliconius melpomene and other published lepidopteran genomes. We will use both the community supplied annotations and standardized optimal annotation pipelines within ENSEMBL to deliver richly annotated genome portals.

We will use these first genomes as exemplars to write, test and deploy code for ENSEMBL for several novel modalities of data, including population genetic measures, geospatial analyses and clade-specific orthology and synteny.

We will install and deploy a community annotation portal (CAP) that will allow experts in the communities to comment on, vary and add annotation.

We will expand into additional genomes as they become available to us, and promote the resource to the lepidopteran community and interested external stakeholders (public and industrial) through meetings, visits, training workshops and web-based media.

Planned Impact

This proposal aims to deliver a common internet-access portal onto the many lepidopteran genomes being generated, and to develop new tools to interrogate these genomes in the wider context of the whole order.
The main beneficiaries will be lepidopteran genomics researchers, who will have a unified portal in which to contextualise and analyse their own data. We will engage this community by direct communication, encouraging groups to submit data to the project, and to assist us in getting their genome sequences represented. This will be achieved by attendance at community meetings (the Kansas Arthropod Genomics Workshop) and through blogs and twitter feeds from the project. We will maintain a project blog, describing the architecture of the site, the decisions made in development, successes and problems and prospects. The corporate twitter feed will be used to communicate database updates and improvements, and to pass on important news from the world of lepidopteran genomics.

A second group of beneficiaries will be lepidopteran biologists in general. A wide range of specialisms use lepidoptera as target organisms, from neurobiology. through evolutionary genetics. to behavioural ecology. We will engage this community by making the portal easy to use for non-genomics specialists, providing data summaries of utility to research teams focused on one or a few genes and pathways. Again, our blog and twitter feed will be used to keep this community informed, and we will make sure our team has representation at the key meetings and workshops where lepidopteran research is presented (for example the annual Evolution meetings).

A third key stakeholder group are the companies and research teams who are developing new tools to combat lepidopteran pests. Our database will be useful in defining possible drug and biocontrol targets, and in revealing the diversity and conservation of these targets across the order. Similarly, the biotechnology industry has keen interest in biomaterials from Lepidoptera - such as silks and new semiochemicals. The development of pathway-oriented annotations and the breadth of species collected in LepBase will permit more rational selection of lead enzymes and products. We will keep these organisations and individuals informed of developments through blog and twitter feeds, and presentations at relevant meetings.

Arthropod genomics is burgeoning, and the i5k initiative is coordinating a hoped-for 5000 genomes (in the first instance). The wider arthropod genomics community will benefit directly from use of LepBase, and also from our pushing LepBase genomes and updates into ENSEMBL Genomes. We will ensure that the i5k site and the wider arthropod genomics community is kept informed through the blog, twitter feed and direct emailing to interest groups.

As a model Tier 2 database, LepBase will be of interest to those developing similar systems for their taxa of interest. We will open our code development and ideas to colleagues running similar initiatives worldwide, and make sure we keep up with their work. Our code will, hopefully be integrated into the core ENSEMBL codebase, but meanwhile (and in addition) we will make it available on github.

We will strive to publish the database in the annual NAR Databases issue, highlighting updates and enhancements. Other publications, in open access journals, will also communicate to our key audiences.

The general public has strong interest in butterflies and moths as charismatic species. We will maintain general interest pages on the web presence of LepBase describing our work, and make available for download factsheets describing each species and the core biology the genome is revealing. These will be made available to butterfly farms, natural history museums and other interested parties. The database will be publicised at open days and science fairs in the three home institutions as available.

Publications

10 25 50
 
Description We have developed, and continue to develop LepBase. The current version that is publicly available is Lepbase release 4. Lepbase is a a platform that integrates genome data from published Lepidopterna genomes. It focuses on the specific needs of the Lepidopteran research community to open up this diverse clade to comparative analysis.
Exploitation Route Lepidopteran comparative genomics.
Sectors Agriculture, Food and Drink,Education,Environment

URL http://lepbase.org/
 
Title LepBase 
Description LepBase is a multi genome database for the Lepidoptera. The aim is to develop and deploy a genome analysis and interrogation environment for Lepidoptera. LepBase will support the lepidopteran community by integrating their data across species. It will make lepidopteran genome data accessible to all biologists, and facilitate secure long-term archiving of the rich accumulated functional annotations in ENSEMBL Genomes. 
Type Of Material Database/Collection of data 
Year Produced 2014 
Provided To Others? Yes  
Impact LepBase is now publicly available. Lepbase release 2 went live on 13th February 2016 with 21 annotated assemblies across 17 species. 
URL http://lepbase.org/
 
Description Heliconius Consortium 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution Providing biological samples. Setting up and running Royal Society Summer Exhibition. Resequencing of Heliconius genomes. Access to field station in Peru.
Collaborator Contribution Providing biological samples. Setting up and running Royal Society Summer Exhibition. Resequencing of Heliconius genomes. Provision of RAD adapters.
Impact Royal Society Summer Exhibition 2014: From Jungles To Genomes Keightley PD, Pinharanda A, Ness RW, Simpson F, Dasmahapatra KK, Mallet J, Davey JW, Jiggins CD (in press) Estimation of the spontaneous mutation rate in Heliconius melpomene. Molecular Biology and Evolution. Rosser N, Dasmahapatra KK, Mallet J (in press) Stable Heliconius butterfly hybrid zones are correlated with a local rainfall peak at the edge of the Amazon basin. Evolution. Martin SH, Dasmahapatra KK, Nadeau NJ, Salazar C, Walters JR, Simpson F, Blaxter ML, Manica A, Mallet J, Jiggins CD (2013) Genome-wide evidence for speciation with gene flow in Heliconius butterflies. Genome Research 23: 1817-1828. Briscoe AD, Muños AM, Kozak KM, Walters JR, Yuan F, Jamie GA, Martin SH, Dasmahapatra KK, Ferguson LC, Mallet J, Jacquin-Joly E, Jiggins CD (2013) Female behaviour drives expression and evolution of gustatory receptors in butterflies. PLOS Genetics 9(7): e1003620. Supple MA, Hines HM, Dasmahapatra KK, Lewis JL, Nielsen DM, Lavoie C, Ray DA, Salazar C, McMillan WO, Counterman BA (2013) Genomic architecture of adaptive color pattern divergence and convergence in Heliconius butterflies. Genome Research 23: 1248-1257. Mérot C, Mavárez J, Evin A, Dasmahapatra KK, Mallet J, Lamas G, Joron M (2013) Genetic differentiation without mimicry shift in a pair of hybridising Heliconius species (Lepidoptera: Nymphalidae). Biological Journal of the Linnean Society 109: 830-847. Nadeau NJ, Martin SH, Kozak KM, Salazar C, Dasmahapatra KK, Davey JW, Baxter SW, Blaxter ML, Mallet J, Jiggins CD (2013) Genome-wide patterns of divergence and gene flow across a butterfly radiation. Molecular Ecology 22: 814-826.
Start Year 2012
 
Description 10th Interational Heliconius meeting in Panama 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact 1.5 hour workshop on Lepbase providing an overview of why Lepbase was started and our goals. Walkthrough of the browser-based features and tools, and demonstration of how to use the Perl API to mine the Ensembl MySQL database underpinning Lepbase. Lots of feedback on how people imagined using Lepbase and other features they would like to see.
Year(s) Of Engagement Activity 2015
 
Description Edinburgh Doors Open Day 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Information on Lepidopteran genetics and genomics, and looking and pinned specimens. The purpose was to inform the general public about our research using LepBase.
Year(s) Of Engagement Activity 2015
URL http://www.doorsopendays.org.uk/opendays/area_programmes.aspx?areaid=16