Wormbase-ParaSite

Lead Research Organisation: Wellcome Sanger Institute
Department Name: Wellcome Trust Genome Campus

Abstract

Flatworms and roundworms are diverse groups of organisms and include those responsible for serious human, veterinary and plant diseases. Their global impact is hard to measure but annual human morbidity is estimated to be equivalent to at least 50 million productive years of life (c.f. 85M for HIV/AIDS), and agricultural losses from plant parasitic nematodes can be measured in hundreds of millions of dollars. Despite their impact on human health, a recent report highlighted that parasitic helminths attract only $77M per annum in research expenditure (cf. $1.1billion for HIV/AIDS). Due to the wide range of pathology caused in various host species, no single model species can capture the range of disease-causing mechanisms involved. Parasitologists are therefore inherently interested in making comparisons between many different species. Access to genomic-scale datasets has revolutionized molecular and cell biological studies of protozoan pathogens, advancing basic and applied research, but this is only now starting to happen for parasitic worms. Major sequencing programmes are now underway and large scale functional genomics datasets are beginning to emerge (e.g. RNA-Seq is becoming commonplace). Studies of genomic variation in multiple isolates, to address clinical, epidemiological or applied agricultural questions, are the obvious next steps.

While the emergence of new genome-scale datasets is immensely exciting, genomes are often produced in a relatively poor states of assembly and annotation compared to existing reference genomes. Moreover, the prevailing paradigm for organising genomic information (essentially, the genome browser and the underlying data models that support this) are relatively poorly fitted to the exploration of hundreds of highly fragmented genomes with limited functional characterisation. In this application, we propose the creation of a new resource to organise, classify and allow the exploration hundreds of worm genomes, facilitating the exploitation of sequence-based data for understanding and ultimately controlling worm-induced pathology.
The propose resource will be called WormBase-ParaSite, and will be strategically aligned with WormBase (the main resource for the model nematode Caenorhadbitis elegans). Specifically the resource will provide:

(a) Gene structures and functional annotation for unannotated worm genomes.
(b) Comparative genomic analysis, visualisation and querying.
(c) Methods for exploration and data mining the complete data set, through an intuitive query-building interface accessible to research scientists.
(d) A platform for the visualisation of the results of high-throughput sequencing experiments in the context of other annotation that enables functional genomics and variation studies.
(e) An infrastructure for accepting and integrating functional annotation submitted from the community engaged in worm research.

The project will complement the existing scope (and leverage the existing content) of WormBase, which has supplied biologists working on the model worm Caenorhabditis elegans with an invaluable information resource since the genome of this species was one of the first to be deciphered. It will provide additional capacity and tools for the handling of a massively increased quantity of genomes, with a clear focus on the information that is most relevant to parasitologists. The resource will also provide a home for data from the flatworms, such as flukes and tapeworms, which are outside the scope of WormBase.

Technical Summary

WormBase-ParaSite will be a new database and user interfaces focused on parasitic helminths, i.e. roundworms and flatworms. Large numbers of these genomes are currently being sequenced and the new resource will solve the problem of badly-organised and inconsistently annotated genomes, which if unaddressed, will make these valuable data hard for researchers to utilise effectively. Over a period of 3 years, we propose to structurally and functionally annotate at least 200 genomes using well-documented, state-of-the-art approaches (e.g. use of Augustus/Maker, RFAM, InterPro etc.), and to perform comparative analyses based on selective pairwise and multiple DNA alignment, HMM clustering of protein sequences, and evolutionary analysis of protein families. We will develop new data mining tools (based on established data warehousing infrastructure such as BioMart or InterMine) to allow users to efficiently extract information from the database, offering queries relating to e.g. genetic variation, gene content, taxonomic distribution etc. The user interface will be based on the Ensembl platform and support the ability to compare reference and other data stored in the resource to user-generated data stored in standard file formats (e.g. BAM for alignments, VCF for variants, etc.) via easy-to-use upload/visualisation tools . Apart from this interactive interface, the proposed resource will also allow programmatic access to the data. Frequent data releases will ensure the prompt availability of data and analysis results to the community. To allow ongoing improvement of the annotation, we will deploy a community curation tool, while we will also work closely with WormBase, the database for the model nematode C. elegans, which is expected to provide an eventual home for some of the genomes of highest interest. Finally, we will run an open workshop aimed at training scientists in efficiently using the database resource and its visualisation and analysis tools.

Planned Impact

Parasitic helminths are studied with the aim of killing or controlling them. The proposed resource will significantly facilitate the exploitation and application of sequence-based data towards this aim. For pathogens with smaller and simpler genomes such as viruses and bacteria, genomic insights are already being translated into tangible benefits for medicine such as the ability to track pathogen transmission and the monitoring of drug resistance. Yet it is clear that the application of genomic science towards medical, veterinary and agricultural improvements is still in its infancy, and that many more and bigger benefits will be realized in the long term.

In analogy, sequencing-based research is expected to deliver significant benefits in the fight against the diverse helminthiases afflicting humans, animals, and plants. Downstream beneficiaries will first and foremost be people directly suffering from helminth infections, which includes about 2 billion people infected with soil-transmitted helminths alone (WHO 2012). Advances in drug treatment, transmission reduction or vaccination could improve the lives of many people who may otherwise suffer from serious gastrointestinal disease, stunted growth and mental development, malnutrition and fatigue, disfigurement, blindness, or liver and bladder pathologies. Although some effective anthelminthics exist, the available arsenal of drugs is limited and makes the development and spread of drug resistance - especially with mass drug administration being a predominant tool for helminthiasis control in developing countries - a real danger. Furthermore, large-scale improvements in the treatment and control of helminthiases are likely to bring huge socio-economic benefits to some of the least developed countries.

In addition to the direct health improvements from a reduction in helminth infections, people in endemic areas could also benefit indirectly e.g. by an improved response to vaccinations, by reduced transmission or by an improved disease outcome for other diseases such as tuberculosis, malaria, and HIV/AIDS, as co-infections with helminths have been shown to have potentially adverse effects (see Elliott and Yazdanbakhsh, 2012, and other articles in the same issue of this journal for recent reviews).

In the UK, one major impact of helminth-related diseases is on agricultural production, especially in the potato and in the sheep and goat farming industries. Improved interventions and control measures against helminths such as Globodera, Teladorsagia, and Haemonchus spp. could therefore greatly benefit UK farmers and related agricultural and pharmaceutical industries. Globally, species of the genus Heterodera are significant nematode pests of various agricultural plants including cereals and soybean. The proposed resource will therefore fit squarely within the BBSRC's strategic research priority of 'Food security' and contribute to the areas of 'Crop science, 'Animal health', and 'Livestock production'. In addition, novel or improved helminth interventions could benefit pet owners and their companion animals and could help reduce the environmental impact of the large-scale application of nematicides for crop production. On the other hand, the proposed resource could ultimately also contribute to an improved use of beneficial nematodes as e.g. entomopathogenic nematodes to fight pine weevils.

Looking further into the future, a thorough understanding of helminths and their interactions with the human immune system may lead to fundamental new insights that will allow a much more sophisticated manipulation of the human immune system for medical purposes. Similarly, scientists are just beginning to uncover the complex interactions between the gut microflora (i.e. bacteria), the macrofauna (i.e. helminths), and human immunity. Together, such knowledge may ultimately be exploited for and benefit the effective treatment of allergies and other (autoimmune) diseases.

Publications

10 25 50
 
Description In year 1, we released the first version of a new web based database to access more than 80 draft genomes for parasitic helminths. By year 3, we produced our eighth release, including 131 draft genomes. The organisms that are included cover parasites of medical, veterinary and agricultural significance, plus several examples of species that are used primary as experimental models across the research community.

The resource allows users to search for genes and browse genome data. In all, the site contains about 1.5 million new genes which have been organised into families, so that researchers can see alternatives of the same gene across different parasite species. It also allows researchers to begin to identify genes that are only found in specific species or groups of species, which therefore may account for important (i.e. exploitable) biological differences.

Since it's first release the site has expanded to contain 108 genomes. We are now on the 5th release of the site and have included transcriptome datasets with carefully annotated metadata. We now operate release cycles that are synchronised with those of the WormBase database.
Exploitation Route We are aiming for the website to become an important discovery tool for researchers working on parasitic worms. To that we reach as much of the releveant research community as possible, a news feature on the Sanger Institute and EBI websites is being prepared and this will be accompanied by Twitter announcements and and email to approximately 100 Principle investigators.

On the site we also include some basic information about each of the 112 species that are included. Although the site is not targeted at a general audience, it should be clear to any user what information is contained.

To maximise utilisation by the research community, we have demonstrated the project at an international helminth meeting and have started roadshow workshops visiting UK hubs of helminth research. To date, two such workshops have been held and two more are scheduled.
Sectors Agriculture, Food and Drink,Pharmaceuticals and Medical Biotechnology

URL http://parasite.wormbase.org
 
Description Bioinformatics and Biological Resources Fund
Amount £500,000 (GBP)
Funding ID BB/P024610/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 07/2017 
End 01/2021
 
Title WormBase ParaSite Release 14 
Description Total: 173 genomes, representing 142 distinct species New species added: Schmidtea mediterranea (PRJNA379262), Schistosoma bovis (PRJNA451066), Opisthorchis felineus (PRJNA413383), Ditylenchus dipsaci (PRJNA498219), Halicephalobus mephisto (PRJNA528747), Mesorhabditis belari (PRJEB30104) Updated genomes for Steinernema carpocapsae (PRJNA202318) Updated annotation for Mesocestoides corti (PRJEB510) Data on gene expression added: 201 RNASeq studies across 48 species S. mediterranea studies align to the new assembly (ASM260089v1) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact WormBase ParaSite had an average of ~7000 unique users per month in 2019 
URL https://parasite.wormbase.org
 
Title WormBase ParaSite Release 15 
Description Site now contains a total of 197 genomes, representing 161 distinct species New species added: Panagrolaimus davidi (PRJEB32708), Panagrolaimus es5 (PRJEB32708), Propanagrolaimus ju765 (PRJEB32708), Panagrolaimus ps1159 (PRJEB32708), Panagrolaimus superbus (PRJEB32708), Caenorhabditis becei (PRJEB28243), Caenorhabditis bovis (PRJEB34497), Caenorhabditis panamensis (PRJEB28259), Caenorhabditis parvicauda (PRJEB12595), Caenorhabditis quiockensis (PRJEB11354), Caenorhabditis sulstoni (PRJEB12601), Caenorhabditis tribulationis (PRJEB12608), Caenorhabditis uteleia (PRJEB12600), Caenorhabditis waitukubuli (PRJEB12602), Caenorhabditis zanzibari (PRJEB12596), Setaria digitata (PRJEB479729), Fasciola gigantica (PRJNA230515), Paragonimus westermani (PRJNA454344), Echinococcus oligarthrus (PRJEB31222) Updated or alternative genomes added for Steinernema carpocapsae (PRJNA202318) (previous assembly version restored alongside current assembly version), Angiostrongylus cantonensis (PRJEB350391), Hymenolepis diminuta (PRJEB30942), Steinernema feltiae (PRJEB353610), Schistosoma haematobium (PRJNA78265), Schistosoma japonicum (PRJEB520774) Updated annotation added for Parascaris univalens (PRJNA386823), Fasciola hepatica (PRJEB25283), Haemonchus contortus (PRJEB506), Ancylostoma ceylanicum (PRJNA72583) Caenorhabditis sp34 renamed Caenorhabditis inopinata site now contains 164 RNASeq studies from 48 species and 1 RNAi study 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact Usage stats not yet analysed 
URL https://parasite.wormbase.org
 
Title WormBase ParaSite Release 16 
Description Version: WBPS16 (September 2021) The site now contains 202 genomes, representing 163 species. Addition of six new genome assemblies Annotation updates for 12 genomes Addition of phenotype data for C. elegans genes, imported from WormBase Addition of gene name synonyms for a set of Strongyloides stercoralis genes Introduction of an archiving service Deprecation of CEGMA, and introduction of BUSCO as an annotation quality metric New repeat feature libraries for all genomes, generated with RepeatModeler2. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact Usage stats not yet analysed 
URL https://parasite.wormbase.org
 
Title WormBase ParaSite Release 17 
Description An update to the open access database WormBase ParaSite. Contain the following: Integration of AlphaFold 3D protein structures for 8 species. Addition of 11 new genome assemblies of which 6 are new species. Annotation updates for 2 genomes. Gene-phenotype associations 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Usage stats not yet analysed 
URL https://parasite.wormbase.org
 
Title WormBase-ParaSite 
Description WormBase-ParaSite is an database of parasitic worm (platyhelminth and nematode) genomes. The first release, in September 2014, contained 81 annotated genomes. Of these, more than 50 are new unpublished drafts. The database allows users to search for specific genes and to identify gene families based on their phylogentic distributions. 
Type Of Material Database/Collection of data 
Year Produced 2014 
Provided To Others? Yes  
Impact Access to the site has increased to about 1000 users per month. 
URL http://parasite.wormbase.org
 
Description Ensembl 
Organisation Ensembl
Country United Kingdom 
Sector Academic/University 
PI Contribution The Ensembl team at EMBL-EBI develop software and infrastructure for the storage and display of genomic data for selected species. WormBase ParaSite have deployed their software and infrastructure, with the specific goal of enabling genomics for the helminth research community.
Collaborator Contribution The Ensembl team at EMBL-EBI develop software and infrastructure for the storage and display of genomic data for selected species. WormBase ParaSite have deployed their software and infrastructure, with the specific goal of enabling genomics for the helminth research community.
Impact Continued WormBase ParaSite releases are reliant on Ensembl software.
Start Year 2014
 
Description WormBase consortium 
Organisation WormBase (Biology and Genome of C.Elegans)
Country United States 
Sector Charity/Non Profit 
PI Contribution WormBase Consortium is led by Paul Sternberg of CalTech, Kevin Howe of the EBI, Matt Berriman of the Wellcome Sanger Institute, and Lincoln Stein of the Ontario Institute for Cancer Research. The consortium runs a model organism database containing data from research on C. elegans and other nematodes. WormBase Parasite provides searching and data access capabilities that are not available through the WormBase website
Collaborator Contribution WormBase curates reference genomes which are then imported into WormBase Parasite and provide important functional information for understanding the genomes of comparator species.
Impact Provision of annotated genomes for C. elegans and Brugia malayi
Start Year 2014
 
Description Advanced Course in Helminth Bioinformatics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Three members of the WormBase ParaSite team acted as instructors in a 5 day workshop on Helminth Bioinformatics, hosted by the West African Centre for Cell Biology of Infectious Pathogens, Accra, Ghana. The workshop had approximately 20 participants, all postgraduate or post-doctoral researchers based in institutions across Africa. Material covered included use of the WormBase ParaSite database, an introduction to the Linux command line, transcriptomics, genome assembly, and population genetics. The main goal was to increase the genomics/bioinformatics capacity of helminth researchers in Africa. The course received excellent formal feedback, with WormBase Parasite being voted one of the most useful sections by many participants.
Year(s) Of Engagement Activity 2019
URL https://coursesandconferences.wellcomegenomecampus.org/our-events/helminth-bioinformatics-ghana-2019...
 
Description Annual workshop at the British Society of Parasitology spring meeting 2016 and 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Practical workshop demonstrating common use-cases for WormBase ParaSite tools
Year(s) Of Engagement Activity 2016,2017
 
Description BSP drop in helpdesk 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact We ran a drop in desk at the annual British Society of Parasitology Spring meeting. We answered questions and demonstrated features from the user community.
Year(s) Of Engagement Activity 2019
 
Description Genome Decoders project 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Schools
Results and Impact In 2017 we started a collaborative project with Institute for Research with Schools where we directly collaborate with school students (primarily A-level) to computationally analyse the genome of a parasitic worm. The 'event' is ongoing and currently involves 50 schools. A launch day was held in September attended by 200 students and teachers. It included seminars and workshop activities.
In 2018, the continuing project resulted in the "Engaged Team Prize" on the Wellcome Genome Campus.
Year(s) Of Engagement Activity 2017
URL http://publicengagement.wellcomegenomecampus.org/genome-decoders
 
Description Helminth Bioinformatics (Asia) (Virtual) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Three member of the team participated as trainers trainers in a bioinformatics course that targeted helminth researchers in Asia. The course curriculum included extensive use of WormBase Parasite.
Year(s) Of Engagement Activity 2021
URL https://coursesandconferences.wellcomeconnectingscience.org/event/helminth-bioinformatics-asia-virtu...
 
Description Helminth genomics workshop, Shanghai 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A WormBase ParaSite staff member acted as lead instructor on a 2 day course in Helminth Bioinformatics in Shanghai, China. The course had approximately 40 participants, largely postgraduate students, from across China. Material covered included basic concepts in bioinformatics and use of the WormBase ParaSite database. Formal feedback was very positive.
Year(s) Of Engagement Activity 2019
 
Description Poster presentation at the Parasitic Helminths: New Perspectives in Biology and Infection meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A WormBase ParaSite staff member presented a poster on the WormBase ParaSite resource at the Parasitic Helminths: New Perspectives in Biology and Infection conference (Hydra, Greece). Users were informed of new features and given the chance to ask questions on use of the resource.
Year(s) Of Engagement Activity 2019
 
Description Presentation and Workshop at annual "Molecular and Cellular Biology of Helminth Parasites" meeting 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Workshop/tutorial resulted in increased awareness and understanding of how to use WormBase ParaSite, and ideas for further development
Year(s) Of Engagement Activity 2017,2018
 
Description Short talk in "Bridging the Divide" workshop, International C elegans meeting, UCLA 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A WormBase ParaSite staff member gave a short talk on the resource in a parasitology session at the International C elegans Meeting. The audience included both parasitologists and C elegans researchers.
Year(s) Of Engagement Activity 2019
 
Description WormBase ParaSite webinar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Two WormBase ParaSite staff members ran a webinar on the use of the resource, aimed at all helminth researchers. The webinar received positive feedback, and remains available online as training material.
Year(s) Of Engagement Activity 2019
URL https://www.ebi.ac.uk/training/online/course/introduction-wormbase-parasite-resources
 
Description practical workshop at Molecular Helminthology: An Integrated Approach 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Practical workshop presented to an audience of scientists
Year(s) Of Engagement Activity 2017
URL https://www.elsevier.com/events/conferences/molecular-helminthology-an-integrated-approach