WormBase ParaSite

Lead Research Organisation: EMBL - European Bioinformatics Institute
Department Name: Ensembl Genomes

Abstract

Parasitic worms (helminths) cause a massive economic burden, with agricultural losses in the UK exceeding £100 million per annum. Across the globe helminths are also
responsible for long term, chronic diseases in humans. The UK is a leader in research and development targeting helminths, despite global investment being disproportionately low compared with the impact of infections.

Helminth are diverse - the term covers both round worms and flatworms - and no single model species can capture the range of disease-causing mechanisms involved. Researchers are therefore inherently interested in making comparisons between species. The genomes of more than 30 species are now published and many more available. Alongside their genomes, large scale functional genomics datasets have been produced describing key life cycle transitions for more than 10 species.

To drive helminth research into the genomic era, we established WormBase ParaSite in 2014. The resource now contains more than 100 draft genomes of helminths. Genes and genomes can be explored, enabling a greater understanding of helminth biology to accelerate the development of new strategies for helminth control.

In its first two years, there have been 8 public releases of the resource and in 2015 the website was accessed by 29000 unique users. In addition to accessing gene structures and functional annotation for draft genomes, users are able to examine evolutionary relationships between genes and look for the differences and similarities between species that may underpin differences and similarities in helminth biology. The resource provides fast and intuitive interfaces for browsing and searching and contains an interface for extracting custom datasets. Several workshops have been organised to provide training in its use.

This proposal will fund the maintenance and improvement of WormBase ParaSite. We intend to incorporate all publicly available nematode and flatworm genome assemblies as they become available. Due to changes in sequencing technology, research groups will produce new and better versions of existing gene sequences. However, these sequences will in many cases not be annotated, so we will provide an automated way to annotate naked genomes with consistent gene structures and functional descriptions.

Defining gene families will remain a critically important activity. However, we will increase the speed, accuracy and scalability in which evolutionary histories can be inferred. We will also greatly improve the way in which data from large-scale studies on gene expression or genome variation are included into the resource. In particular, a new Gene Expression Atlas will be included for interactive exploration of gene expression data. To help identify new drug targets or to identify re-use possibilities for existing drugs, WormBase ParaSite will include links to targets and chemistry data (by linking to the ChEMBL database). We will also enable users to query available phenotypic data.

In addition to the new features, we will frequently update the site to provide rapid access to new data. We will continue to provide training on the use of the resource and maintain a live and responsive helpdesk.

Technical Summary

WormBase ParaSite is a database that provides rapid access to new high-throughput genomic and related data from parasitic flatworms and roundworms (helminths). These data include genome sequence, gene expression data, and regulatory data, and are generally produced using massively parallel nucleotide sequencing strategies, and need to be integrated and interpreted to inform parasitology. A major challenge is to provide structural and functional annotation on the genome assemblies, to automatically update this as new experimental evidence becomes available and maintain tracking between successive versions such that researchers can continue their work as the reference data sets improve.

ParaSite is mostly implemented through the re-use and (where necessary, the) extension of database technologies developed elsewhere, including the MAKER pipeline (and other tools like RepeatMasker and RFAM) for genome annotation, tools derived from lepbase for representation of genome quality the Ensembl software stack for genome data management and preparation, and the BioMart data warehousing tool that provides high-performance data discovery and retrieval for common use cases centred on genes. Both Ensembl and BioMart provide an interface through the use of the mod-perl programming language embedded in an Apache webserver, while utilising MySQL (a common relational database management system) as the underlying data store. Increasingly, we are supporting the direct incorporation of data stored in binary, indexed file formats (e.g. BAM, CRAM for sequence alignments), simplifying the database build process and improving performance. We are using the emerging Track hub technology to arrange these files to ensure that users can locate and filter data of interest appropriately.

Planned Impact

Across the globe parasitic worms (helminths) cause a massive economic burden and are responsible for long term, chronic diseases. Helminths are therefore studied with the aim of killing or controlling them. For pathogens with smaller genomes, particularly viruses, bacteria, and protozoa, access to genome data has transformed the way research is conducted and has led to major insights into spread of infections and drug resistance, and has led to the development of new drugs and vaccine candidates. A similar transformation is starting to take place in helminth research; rapid changes in sequencing technologies have driven down costs and large scale data on genome and gene expression are becoming available. WormBase ParaSite was established in 2014, to enable the helminth research field to accelerate by exploiting the rapid growth in available data. Through assisting helminth researchers, ParaSite will impact governments, NGOs and companies with an interest in disease control.

Amongst the downstream beneficiaries from helminth research will be those suffering from infections - more than a billion people worldwide. Human infections, mainly amongst the poorest communities, can result in abdominal pain, haemophilia, stunted growth and mental development, malnutrition, fatigue, disfigurement, blindness, circulatory disorders, or liver and bladder pathologies. Some anthelmintic drugs do exist but with an over-reliance on a small repertoire, the development and spread of drug resistance is an ever-present danger.

The global agriculture industry will also benefit from new helminth control measures. In the UK, potato farming is badly affected by potato cyst nematode, and livestock are affected by gastrointestinal nematodes and liver flukes.

WB-PS was launched to exploit the rapid increase in available helminth sequence data (genomes and gene expression data). Through the organisation, analysis and dissemination of these data, WormBase ParaSite aims to: (i) provide a clear, annotated representation of the functional regions of genome sequences; (ii) transfer knowledge from well-annotated to less well-annotated genomes and (iii) allow comparisons between helminths so that differences between genomes can be correlated with the evolution of pathogenic traits. Automatic pipelines integrate new data to ensure that users can access an up-to-date interpretation of all available data, and the use of standard data query and retrieval interfaces reduces time that would otherwise be wasted in finding and re-formatting data to make it interoperable.

The new application will up-scale WormBase ParaSite - to ensure that the expected flood of new data (more numerous and more contiguous genome assemblies; new expression and variation data) can be processed and made useful to helminth researchers. Another objective is to ensure rapid releases such that this data is quickly disseminated to the community; another is to provide training, in situ at prominent nodes of helminth research, to ensure maximise the familiarity of researchers with the available data and tools.

A new portal within ParaSite will be aimed directly at researchers developing drug treatments. We will use sequence similarity to identify homologues to known drug targets from other species (as curated in the ChEMBL resource). We will provide filters to allow users to select genes from parasites whose homologues have properties such as inhibition by a drug that has reached clinical trials but has no known toxicology warnings, or aggregated scores that reflect physico-chemical properties of a compound or drug. To predict new, exploitable target-compound combinations, users will be able to combine their results with relevant gene expression data (e.g. expressed in a mammalian-infective stage), absence of an orthologue in the parasite's host.

Publications

10 25 50
 
Description We have developed and improved WormBase Parasite (http://parasite.wormbase.org), a resource currently providing access to 202 genomes from 163 nematode and flatworm species. Data available includes genome assemblies, annotations, comparative genomics, and functional analysis, and a range of query interfaces and tools, including genome browsers and a data mining platform. During the funded period we made 6 releases of the resource and added a number of new features to the platform, including a mechanism for the capture of free-text comments on genes from the research community, and a sub portal for exploring helminth gene expression data. We also provided training for the resource (with interactive training workshops) at national and international conferences
Exploitation Route For those studying parasite-mediated pathologies, ParaSite provides an organised way to efficiently access the data; and information about similarities and differences between genes, and species, that will potentially provide the information needed to develop new strategies for control and treatment. A variety of interfaces (interactive and programmatic) are provided to facilitate data access.
Sectors Agriculture, Food and Drink,Chemicals,Pharmaceuticals and Medical Biotechnology

URL http://parasite.wormbase.org
 
Description Ensembl 
Organisation Ensembl
Country United Kingdom 
Sector Academic/University 
PI Contribution The Ensembl team at EMBL-EBI develop software and infrastructure for the storage and display of genomic data for selected species. WormBase ParaSite have deployed their software and infrastructure, with the specific goal of enabling genomics for the helminth research community.
Collaborator Contribution The Ensembl team at EMBL-EBI develop software and infrastructure for the storage and display of genomic data for selected species. WormBase ParaSite have deployed their software and infrastructure, with the specific goal of enabling genomics for the helminth research community.
Impact Continued WormBase ParaSite releases are reliant on Ensembl software.
Start Year 2014
 
Description WormBase consortium 
Organisation WormBase (Biology and Genome of C.Elegans)
Country United States 
Sector Charity/Non Profit 
PI Contribution WormBase Consortium is led by Paul Sternberg of CalTech, Kevin Howe of the EBI, Matt Berriman of the Wellcome Sanger Institute, and Lincoln Stein of the Ontario Institute for Cancer Research. The consortium runs a model organism database containing data from research on C. elegans and other nematodes. WormBase Parasite provides searching and data access capabilities that are not available through the WormBase website
Collaborator Contribution WormBase curates reference genomes which are then imported into WormBase Parasite and provide important functional information for understanding the genomes of comparator species.
Impact Provision of annotated genomes for C. elegans and Brugia malayi
Start Year 2014
 
Description Presentation and training at annual "Molecular and Cellular Biology of Helminth Parasites" meeting 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Poster presentation and extensive demonstration and feedback gathering with a cohert of ~100 helminth researchers.
Year(s) Of Engagement Activity 2018
 
Description Wellcome Advanced Course in Helminth Genomics - WormBase ParaSite component 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact In collaboration with Wellcome Genome Campus Connecting Science and the Wellcome Sanger Institute Parasite genomics group, we developed a week-long comprehensive training course on helminth genomics, covering topics such as genome assembly and annotation, and population genomics. A significant component of the course (1.5 days) was devoted to WormBase ParaSite. We delivered the course for the first time in September 2019 to a cohort of African helminth biologists in Ghana.
Year(s) Of Engagement Activity 2019
 
Description Workshop at the British Society of Parasitology Spring meetings 2018 and 2019 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Practical workshop demonstrating common use-cases for WormBase ParaSite tools
Year(s) Of Engagement Activity 2018,2019