Wormbase ParaSite

Lead Research Organisation: Wellcome Sanger Institute
Department Name: Pathogen Variation

Abstract

Parasitic worms (helminths) cause a massive economic burden, with agricultural losses in the UK exceeding £100 million per annum. Across the globe helminths are also
responsible for long term, chronic diseases in humans. The UK is a leader in research and development targeting helminths, despite global investment being disproportionately low compared with the impact of infections.

Helminth are diverse - the term covers both round worms and flatworms - and no single model species can capture the range of disease-causing mechanisms involved. Researchers are therefore inherently interested in making comparisons between species. The genomes of more than 30 species are now published and many more available. Alongside their genomes, large scale functional genomics datasets have been produced describing key life cycle transitions for more than 10 species.

To drive helminth research into the genomic era, we established WormBase ParaSite in 2014. The resource now contains more than 100 draft genomes of helminths. Genes and genomes can be explored, enabling a greater understanding of helminth biology to accelerate the development of new strategies for helminth control.

In its first two years, there have been 8 public releases of the resource and in 2015 the website was accessed by 29000 unique users. In addition to accessing gene structures and functional annotation for draft genomes, users are able to examine evolutionary relationships between genes and look for the differences and similarities between species that may underpin differences and similarities in helminth biology. The resource provides fast and intuitive interfaces for browsing and searching and contains an interface for extracting custom datasets. Several workshops have been organised to provide training in its use.

This proposal will fund the maintenance and improvement of WormBase ParaSite. We intend to incorporate all publicly available nematode and flatworm genome assemblies as they become available. Due to changes in sequencing technology, research groups will produce new and better versions of existing gene sequences. However, these sequences will in many cases not be annotated, so we will provide an automated way to annotate naked genomes with consistent gene structures and functional descriptions.

Defining gene families will remain a critically important activity. However, we will increase the speed, accuracy and scalability in which evolutionary histories can be inferred. We will also greatly improve the way in which data from large-scale studies on gene expression or genome variation are included into the resource. In particular, a new Gene Expression Atlas will be included for interactive exploration of gene expression data. To help identify new drug targets or to identify re-use possibilities for existing drugs, WormBase ParaSite will include links to targets and chemistry data (by linking to the ChEMBL database). We will also enable users to query available phenotypic data.

In addition to the new features, we will frequently update the site to provide rapid access to new data. We will continue to provide training on the use of the resource and maintain a live and responsive helpdesk.

Technical Summary

WormBase ParaSite is a database that provides rapid access to new high-throughput genomic and related data from parasitic flatworms and roundworms (helminths). These data include genome sequence, gene expression data, and regulatory data, and are generally produced using massively parallel nucleotide sequencing strategies, and need to be integrated and interpreted to inform parasitology. A major challenge is to provide structural and functional annotation on the genome assemblies, to automatically update this as new experimental evidence becomes available and maintain tracking between successive versions such that researchers can continue their work as the reference data sets improve.

ParaSite is mostly implemented through the re-use and (where necessary, the) extension of database technologies developed elsewhere, including the MAKER pipeline (and other tools like RepeatMasker and RFAM) for genome annotation, tools derived from lepbase for representation of genome quality the Ensembl software stack for genome data management and preparation, and the BioMart data warehousing tool that provides high-performance data discovery and retrieval for common use cases centred on genes. Both Ensembl and BioMart provide an interface through the use of the mod-perl programming language embedded in an Apache webserver, while utilising MySQL (a common relational database management system) as the underlying data store. Increasingly, we are supporting the direct incorporation of data stored in binary, indexed file formats (e.g. BAM, CRAM for sequence alignments), simplifying the database build process and improving performance. We are using the emerging Track hub technology to arrange these files to ensure that users can locate and filter data of interest appropriately.

Planned Impact

Across the globe parasitic worms (helminths) cause a massive economic burden and are responsible for long term, chronic diseases. Helminths are therefore studied with the aim of killing or controlling them. For pathogens with smaller genomes, particularly viruses, bacteria, and protozoa, access to genome data has transformed the way research is conducted and has led to major insights into spread of infections and drug resistance, and has led to the development of new drugs and vaccine candidates. A similar transformation is starting to take place in helminth research; rapid changes in sequencing technologies have driven down costs and large scale data on genome and gene expression are becoming available. WormBase ParaSite was established in 2014, to enable the helminth research field to accelerate by exploiting the rapid growth in available data. Through assisting helminth researchers, ParaSite will impact governments, NGOs and companies with an interest in disease control.

Amongst the downstream beneficiaries from helminth research will be those suffering from infections - more than a billion people worldwide. Human infections, mainly amongst the poorest communities, can result in abdominal pain, haemophilia, stunted growth and mental development, malnutrition, fatigue, disfigurement, blindness, circulatory disorders, or liver and bladder pathologies. Some anthelmintic drugs do exist but with an over-reliance on a small repertoire, the development and spread of drug resistance is an ever-present danger.

The global agriculture industry will also benefit from new helminth control measures. In the UK, potato farming is badly affected by potato cyst nematode, and livestock are affected by gastrointestinal nematodes and liver flukes.

WB-PS was launched to exploit the rapid increase in available helminth sequence data (genomes and gene expression data). Through the organisation, analysis and dissemination of these data, WormBase ParaSite aims to: (i) provide a clear, annotated representation of the functional regions of genome sequences; (ii) transfer knowledge from well-annotated to less well-annotated genomes and (iii) allow comparisons between helminths so that differences between genomes can be correlated with the evolution of pathogenic traits. Automatic pipelines integrate new data to ensure that users can access an up-to-date interpretation of all available data, and the use of standard data query and retrieval interfaces reduces time that would otherwise be wasted in finding and re-formatting data to make it interoperable.

The new application will up-scale WormBase ParaSite - to ensure that the expected flood of new data (more numerous and more contiguous genome assemblies; new expression and variation data) can be processed and made useful to helminth researchers. Another objective is to ensure rapid releases such that this data is quickly disseminated to the community; another is to provide training, in situ at prominent nodes of helminth research, to ensure maximise the familiarity of researchers with the available data and tools.

A new portal within ParaSite will be aimed directly at researchers developing drug treatments. We will use sequence similarity to identify homologues to known drug targets from other species (as curated in the ChEMBL resource). We will provide filters to allow users to select genes from parasites whose homologues have properties such as inhibition by a drug that has reached clinical trials but has no known toxicology warnings, or aggregated scores that reflect physico-chemical properties of a compound or drug. To predict new, exploitable target-compound combinations, users will be able to combine their results with relevant gene expression data (e.g. expressed in a mammalian-infective stage), absence of an orthologue in the parasite's host.
 
Description WormBase: expanding the reference resource for helminth research
Amount £889,457 (GBP)
Funding ID MR/S000453/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 09/2018 
End 08/2023
 
Title WormBase ParaSite Release 14 
Description Total: 173 genomes, representing 142 distinct species New species added: Schmidtea mediterranea (PRJNA379262), Schistosoma bovis (PRJNA451066), Opisthorchis felineus (PRJNA413383), Ditylenchus dipsaci (PRJNA498219), Halicephalobus mephisto (PRJNA528747), Mesorhabditis belari (PRJEB30104) Updated genomes for Steinernema carpocapsae (PRJNA202318) Updated annotation for Mesocestoides corti (PRJEB510) Data on gene expression added: 201 RNASeq studies across 48 species S. mediterranea studies align to the new assembly (ASM260089v1) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact WormBase ParaSite had an average of ~7000 unique users per month in 2019 
URL https://parasite.wormbase.org
 
Title WormBase ParaSite Release 15 
Description Site now contains a total of 197 genomes, representing 161 distinct species New species added: Panagrolaimus davidi (PRJEB32708), Panagrolaimus es5 (PRJEB32708), Propanagrolaimus ju765 (PRJEB32708), Panagrolaimus ps1159 (PRJEB32708), Panagrolaimus superbus (PRJEB32708), Caenorhabditis becei (PRJEB28243), Caenorhabditis bovis (PRJEB34497), Caenorhabditis panamensis (PRJEB28259), Caenorhabditis parvicauda (PRJEB12595), Caenorhabditis quiockensis (PRJEB11354), Caenorhabditis sulstoni (PRJEB12601), Caenorhabditis tribulationis (PRJEB12608), Caenorhabditis uteleia (PRJEB12600), Caenorhabditis waitukubuli (PRJEB12602), Caenorhabditis zanzibari (PRJEB12596), Setaria digitata (PRJEB479729), Fasciola gigantica (PRJNA230515), Paragonimus westermani (PRJNA454344), Echinococcus oligarthrus (PRJEB31222) Updated or alternative genomes added for Steinernema carpocapsae (PRJNA202318) (previous assembly version restored alongside current assembly version), Angiostrongylus cantonensis (PRJEB350391), Hymenolepis diminuta (PRJEB30942), Steinernema feltiae (PRJEB353610), Schistosoma haematobium (PRJNA78265), Schistosoma japonicum (PRJEB520774) Updated annotation added for Parascaris univalens (PRJNA386823), Fasciola hepatica (PRJEB25283), Haemonchus contortus (PRJEB506), Ancylostoma ceylanicum (PRJNA72583) Caenorhabditis sp34 renamed Caenorhabditis inopinata site now contains 164 RNASeq studies from 48 species and 1 RNAi study 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact Usage stats not yet analysed 
URL https://parasite.wormbase.org
 
Title WormBase ParaSite Release 16 
Description Version: WBPS16 (September 2021) The site now contains 202 genomes, representing 163 species. Addition of six new genome assemblies Annotation updates for 12 genomes Addition of phenotype data for C. elegans genes, imported from WormBase Addition of gene name synonyms for a set of Strongyloides stercoralis genes Introduction of an archiving service Deprecation of CEGMA, and introduction of BUSCO as an annotation quality metric New repeat feature libraries for all genomes, generated with RepeatModeler2. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact Usage stats not yet analysed 
URL https://parasite.wormbase.org
 
Description Ensembl 
Organisation Ensembl
Country United Kingdom 
Sector Academic/University 
PI Contribution The Ensembl team at EMBL-EBI develop software and infrastructure for the storage and display of genomic data for selected species. WormBase ParaSite have deployed their software and infrastructure, with the specific goal of enabling genomics for the helminth research community.
Collaborator Contribution The Ensembl team at EMBL-EBI develop software and infrastructure for the storage and display of genomic data for selected species. WormBase ParaSite have deployed their software and infrastructure, with the specific goal of enabling genomics for the helminth research community.
Impact Continued WormBase ParaSite releases are reliant on Ensembl software.
Start Year 2014
 
Description WormBase consortium 
Organisation WormBase (Biology and Genome of C.Elegans)
Country United States 
Sector Charity/Non Profit 
PI Contribution WormBase Consortium is led by Paul Sternberg of CalTech, Kevin Howe of the EBI, Matt Berriman of the Wellcome Sanger Institute, and Lincoln Stein of the Ontario Institute for Cancer Research. The consortium runs a model organism database containing data from research on C. elegans and other nematodes. WormBase Parasite provides searching and data access capabilities that are not available through the WormBase website
Collaborator Contribution WormBase curates reference genomes which are then imported into WormBase Parasite and provide important functional information for understanding the genomes of comparator species.
Impact Provision of annotated genomes for C. elegans and Brugia malayi
Start Year 2014
 
Description Advanced Course in Helminth Bioinformatics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Three members of the WormBase ParaSite team acted as instructors in a 5 day workshop on Helminth Bioinformatics, hosted by the West African Centre for Cell Biology of Infectious Pathogens, Accra, Ghana. The workshop had approximately 20 participants, all postgraduate or post-doctoral researchers based in institutions across Africa. Material covered included use of the WormBase ParaSite database, an introduction to the Linux command line, transcriptomics, genome assembly, and population genetics. The main goal was to increase the genomics/bioinformatics capacity of helminth researchers in Africa. The course received excellent formal feedback, with WormBase Parasite being voted one of the most useful sections by many participants.
Year(s) Of Engagement Activity 2019
URL https://coursesandconferences.wellcomegenomecampus.org/our-events/helminth-bioinformatics-ghana-2019...
 
Description BSP drop in helpdesk 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact We ran a drop in desk at the annual British Society of Parasitology Spring meeting. We answered questions and demonstrated features from the user community.
Year(s) Of Engagement Activity 2019
 
Description BioMart webinar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A webinar (comprising presentation and question and answer session) aimed at worm researchers, covering data mining strategies in BioMart. The webinar remains available on the WormBase YouTube channel.
Year(s) Of Engagement Activity 2021
URL https://www.youtube.com/watch?v=IM2j7-OPmtQ
 
Description Helminth Bioinformatics (Asia) (Virtual) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Three member of the team participated as trainers trainers in a bioinformatics course that targeted helminth researchers in Asia. The course curriculum included extensive use of WormBase Parasite.
Year(s) Of Engagement Activity 2021
URL https://coursesandconferences.wellcomeconnectingscience.org/event/helminth-bioinformatics-asia-virtu...
 
Description Helminth genomics workshop, Shanghai 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A WormBase ParaSite staff member acted as lead instructor on a 2 day course in Helminth Bioinformatics in Shanghai, China. The course had approximately 40 participants, largely postgraduate students, from across China. Material covered included basic concepts in bioinformatics and use of the WormBase ParaSite database. Formal feedback was very positive.
Year(s) Of Engagement Activity 2019
 
Description KS2 seminar/workshop at local primary school 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Gave an assembly to ~120 pupils from Key stage 2 at local primary school. Included live microscopy and a science talk.
Year(s) Of Engagement Activity 2019
 
Description Poster presentation at the Parasitic Helminths: New Perspectives in Biology and Infection meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A WormBase ParaSite staff member presented a poster on the WormBase ParaSite resource at the Parasitic Helminths: New Perspectives in Biology and Infection conference (Hydra, Greece). Users were informed of new features and given the chance to ask questions on use of the resource.
Year(s) Of Engagement Activity 2019
 
Description Presentation and Workshop at annual "Molecular and Cellular Biology of Helminth Parasites" meeting 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Workshop/tutorial resulted in increased awareness and understanding of how to use WormBase ParaSite, and ideas for further development
Year(s) Of Engagement Activity 2017,2018
 
Description Short talk in "Bridging the Divide" workshop, International C elegans meeting, UCLA 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A WormBase ParaSite staff member gave a short talk on the resource in a parasitology session at the International C elegans Meeting. The audience included both parasitologists and C elegans researchers.
Year(s) Of Engagement Activity 2019
 
Description Virtual workshop at British Society for Parasitology online confernce 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Operated a virtual booth at a three-day conference, which included a 2-hr open workshop session on the last day
Year(s) Of Engagement Activity 2021
URL https://bsp.uk.net/2020/07/24/parasites-online-2021/
 
Description WormBase ParaSite webinar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Two WormBase ParaSite staff members ran a webinar on the use of the resource, aimed at all helminth researchers. The webinar received positive feedback, and remains available online as training material.
Year(s) Of Engagement Activity 2019
URL https://www.ebi.ac.uk/training/online/course/introduction-wormbase-parasite-resources