WormBase ParaSite
Lead Research Organisation:
European Bioinformatics Institute
Department Name: Ensembl Genomes
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
WormBase ParaSite is a database that provides rapid access to new high-throughput genomic and related data from parasitic flatworms and roundworms (helminths). These data include genome sequence, gene expression data, and regulatory data, and are generally produced using massively parallel nucleotide sequencing strategies, and need to be integrated and interpreted to inform parasitology. A major challenge is to provide structural and functional annotation on the genome assemblies, to automatically update this as new experimental evidence becomes available and maintain tracking between successive versions such that researchers can continue their work as the reference data sets improve.
ParaSite is mostly implemented through the re-use and (where necessary, the) extension of database technologies developed elsewhere, including the MAKER pipeline (and other tools like RepeatMasker and RFAM) for genome annotation, tools derived from lepbase for representation of genome quality the Ensembl software stack for genome data management and preparation, and the BioMart data warehousing tool that provides high-performance data discovery and retrieval for common use cases centred on genes. Both Ensembl and BioMart provide an interface through the use of the mod-perl programming language embedded in an Apache webserver, while utilising MySQL (a common relational database management system) as the underlying data store. Increasingly, we are supporting the direct incorporation of data stored in binary, indexed file formats (e.g. BAM, CRAM for sequence alignments), simplifying the database build process and improving performance. We are using the emerging Track hub technology to arrange these files to ensure that users can locate and filter data of interest appropriately.
ParaSite is mostly implemented through the re-use and (where necessary, the) extension of database technologies developed elsewhere, including the MAKER pipeline (and other tools like RepeatMasker and RFAM) for genome annotation, tools derived from lepbase for representation of genome quality the Ensembl software stack for genome data management and preparation, and the BioMart data warehousing tool that provides high-performance data discovery and retrieval for common use cases centred on genes. Both Ensembl and BioMart provide an interface through the use of the mod-perl programming language embedded in an Apache webserver, while utilising MySQL (a common relational database management system) as the underlying data store. Increasingly, we are supporting the direct incorporation of data stored in binary, indexed file formats (e.g. BAM, CRAM for sequence alignments), simplifying the database build process and improving performance. We are using the emerging Track hub technology to arrange these files to ensure that users can locate and filter data of interest appropriately.
Planned Impact
Across the globe parasitic worms (helminths) cause a massive economic burden and are responsible for long term, chronic diseases. Helminths are therefore studied with the aim of killing or controlling them. For pathogens with smaller genomes, particularly viruses, bacteria, and protozoa, access to genome data has transformed the way research is conducted and has led to major insights into spread of infections and drug resistance, and has led to the development of new drugs and vaccine candidates. A similar transformation is starting to take place in helminth research; rapid changes in sequencing technologies have driven down costs and large scale data on genome and gene expression are becoming available. WormBase ParaSite was established in 2014, to enable the helminth research field to accelerate by exploiting the rapid growth in available data. Through assisting helminth researchers, ParaSite will impact governments, NGOs and companies with an interest in disease control.
Amongst the downstream beneficiaries from helminth research will be those suffering from infections - more than a billion people worldwide. Human infections, mainly amongst the poorest communities, can result in abdominal pain, haemophilia, stunted growth and mental development, malnutrition, fatigue, disfigurement, blindness, circulatory disorders, or liver and bladder pathologies. Some anthelmintic drugs do exist but with an over-reliance on a small repertoire, the development and spread of drug resistance is an ever-present danger.
The global agriculture industry will also benefit from new helminth control measures. In the UK, potato farming is badly affected by potato cyst nematode, and livestock are affected by gastrointestinal nematodes and liver flukes.
WB-PS was launched to exploit the rapid increase in available helminth sequence data (genomes and gene expression data). Through the organisation, analysis and dissemination of these data, WormBase ParaSite aims to: (i) provide a clear, annotated representation of the functional regions of genome sequences; (ii) transfer knowledge from well-annotated to less well-annotated genomes and (iii) allow comparisons between helminths so that differences between genomes can be correlated with the evolution of pathogenic traits. Automatic pipelines integrate new data to ensure that users can access an up-to-date interpretation of all available data, and the use of standard data query and retrieval interfaces reduces time that would otherwise be wasted in finding and re-formatting data to make it interoperable.
The new application will up-scale WormBase ParaSite - to ensure that the expected flood of new data (more numerous and more contiguous genome assemblies; new expression and variation data) can be processed and made useful to helminth researchers. Another objective is to ensure rapid releases such that this data is quickly disseminated to the community; another is to provide training, in situ at prominent nodes of helminth research, to ensure maximise the familiarity of researchers with the available data and tools.
A new portal within ParaSite will be aimed directly at researchers developing drug treatments. We will use sequence similarity to identify homologues to known drug targets from other species (as curated in the ChEMBL resource). We will provide filters to allow users to select genes from parasites whose homologues have properties such as inhibition by a drug that has reached clinical trials but has no known toxicology warnings, or aggregated scores that reflect physico-chemical properties of a compound or drug. To predict new, exploitable target-compound combinations, users will be able to combine their results with relevant gene expression data (e.g. expressed in a mammalian-infective stage), absence of an orthologue in the parasite's host.
Amongst the downstream beneficiaries from helminth research will be those suffering from infections - more than a billion people worldwide. Human infections, mainly amongst the poorest communities, can result in abdominal pain, haemophilia, stunted growth and mental development, malnutrition, fatigue, disfigurement, blindness, circulatory disorders, or liver and bladder pathologies. Some anthelmintic drugs do exist but with an over-reliance on a small repertoire, the development and spread of drug resistance is an ever-present danger.
The global agriculture industry will also benefit from new helminth control measures. In the UK, potato farming is badly affected by potato cyst nematode, and livestock are affected by gastrointestinal nematodes and liver flukes.
WB-PS was launched to exploit the rapid increase in available helminth sequence data (genomes and gene expression data). Through the organisation, analysis and dissemination of these data, WormBase ParaSite aims to: (i) provide a clear, annotated representation of the functional regions of genome sequences; (ii) transfer knowledge from well-annotated to less well-annotated genomes and (iii) allow comparisons between helminths so that differences between genomes can be correlated with the evolution of pathogenic traits. Automatic pipelines integrate new data to ensure that users can access an up-to-date interpretation of all available data, and the use of standard data query and retrieval interfaces reduces time that would otherwise be wasted in finding and re-formatting data to make it interoperable.
The new application will up-scale WormBase ParaSite - to ensure that the expected flood of new data (more numerous and more contiguous genome assemblies; new expression and variation data) can be processed and made useful to helminth researchers. Another objective is to ensure rapid releases such that this data is quickly disseminated to the community; another is to provide training, in situ at prominent nodes of helminth research, to ensure maximise the familiarity of researchers with the available data and tools.
A new portal within ParaSite will be aimed directly at researchers developing drug treatments. We will use sequence similarity to identify homologues to known drug targets from other species (as curated in the ChEMBL resource). We will provide filters to allow users to select genes from parasites whose homologues have properties such as inhibition by a drug that has reached clinical trials but has no known toxicology warnings, or aggregated scores that reflect physico-chemical properties of a compound or drug. To predict new, exploitable target-compound combinations, users will be able to combine their results with relevant gene expression data (e.g. expressed in a mammalian-infective stage), absence of an orthologue in the parasite's host.
Publications
Yates AD
(2022)
Ensembl Genomes 2022: an expanding genome resource for non-vertebrates.
in Nucleic acids research
Wood V
(2020)
Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns.
in Open biology
Mallick R
(2022)
Accelerated variant curation from scientific literature using biomedical text mining.
in microPublication biology
International Helminth Genomes Consortium
(2019)
Comparative genomics of the major parasitic worms.
in Nature genetics
Howe KL
(2020)
Ensembl Genomes 2020-enabling non-vertebrate genomic research.
in Nucleic acids research
Howe KL
(2021)
Ensembl 2021.
in Nucleic acids research
Harris TW
(2020)
WormBase: a modern Model Organism Information Resource.
in Nucleic acids research
Gene Ontology Consortium
(2021)
The Gene Ontology resource: enriching a GOld mine.
in Nucleic acids research
Cunningham F
(2022)
Ensembl 2022.
in Nucleic acids research
Description | We have developed and improved WormBase Parasite (http://parasite.wormbase.org), a resource currently providing access to 210 genomes from 130 nematode and 39 flatworm species. Data available includes genome assemblies, annotations, comparative genomics, and functional analysis, and a range of query interfaces and tools, including genome browsers and a data mining platform. During the funded period we made 6 releases of the resource and added a number of new features to the platform, including a mechanism for the capture of free-text comments on genes from the research community, and a sub portal for exploring helminth gene expression data. We also provided training for the resource (with interactive training workshops) at national and international conferences. The 2017 WormBase ParaSite article (PMID: 27899279) has 236 citations in PubMed. |
Exploitation Route | For those studying parasite-mediated pathologies, ParaSite provides an organised way to efficiently access the data; and information about similarities and differences between genes, and species, that will potentially provide the information needed to develop new strategies for control and treatment. A variety of interfaces (interactive and programmatic) are provided to facilitate data access. |
Sectors | Agriculture Food and Drink Chemicals Pharmaceuticals and Medical Biotechnology |
URL | http://parasite.wormbase.org |
Description | The target audience of WBPS is primarily academia, where our resource has grown to be accessed by tens of thousands of users per year, with >300 citations of the 2017 paper recorded in PubMed. The most widely accessed species include medically and agronomically important species such as blood flukes, tapeworms and soil-transmitted helminths. |
First Year Of Impact | 2024 |
Sector | Agriculture, Food and Drink,Healthcare |
Impact Types | Economic |
Description | WormBase: expanding the reference resource for helminth research |
Amount | £889,457 (GBP) |
Funding ID | MR/S000453/1 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 08/2018 |
End | 03/2022 |
Title | Supplementary material from "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns" |
Description | Table S1 |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://rs.figshare.com/articles/dataset/Supplementary_material_from_Term_Matrix_a_novel_Gene_Ontolo... |
Title | Supplementary material from "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns" |
Description | Table S1 |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://rs.figshare.com/articles/dataset/Supplementary_material_from_Term_Matrix_a_novel_Gene_Ontolo... |
Title | Supplementary material from "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns" |
Description | Table S2 |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://rs.figshare.com/articles/dataset/Supplementary_material_from_Term_Matrix_a_novel_Gene_Ontolo... |
Title | Supplementary material from "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns" |
Description | Table S2 |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://rs.figshare.com/articles/dataset/Supplementary_material_from_Term_Matrix_a_novel_Gene_Ontolo... |
Title | Supplementary material from "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns" |
Description | Table S3 |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://rs.figshare.com/articles/dataset/Supplementary_material_from_Term_Matrix_a_novel_Gene_Ontolo... |
Title | Supplementary material from "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns" |
Description | Table S3 |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://rs.figshare.com/articles/dataset/Supplementary_material_from_Term_Matrix_a_novel_Gene_Ontolo... |
Title | Supplementary material from "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns" |
Description | Table S4 |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://rs.figshare.com/articles/dataset/Supplementary_material_from_Term_Matrix_a_novel_Gene_Ontolo... |
Title | Supplementary material from "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns" |
Description | Table S4 |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://rs.figshare.com/articles/dataset/Supplementary_material_from_Term_Matrix_a_novel_Gene_Ontolo... |
Title | Supplementary material from "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns" |
Description | Table S5 |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://rs.figshare.com/articles/dataset/Supplementary_material_from_Term_Matrix_a_novel_Gene_Ontolo... |
Title | Supplementary material from "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns" |
Description | Table S5 |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://rs.figshare.com/articles/dataset/Supplementary_material_from_Term_Matrix_a_novel_Gene_Ontolo... |
Title | Supplementary material from "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns" |
Description | Table S6 |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://rs.figshare.com/articles/dataset/Supplementary_material_from_Term_Matrix_a_novel_Gene_Ontolo... |
Title | Supplementary material from "Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns" |
Description | Table S6 |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://rs.figshare.com/articles/dataset/Supplementary_material_from_Term_Matrix_a_novel_Gene_Ontolo... |
Title | WormBase ParaSite |
Description | WormBase ParaSite is aimed at researchers engaged in parasitic worm genomics, encompassing flatworms as well as nematodes, and provides genome sequence, genome browsers, semi-automatic annotation and comparative genomics data for >160 species. Additional tools include a cross species data mining platform, protein and nucleotide sequence search, and a variant effect predictor to enable the analysis of different strain/isolate genomes in the context of the reference. |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | The 2017 WormBase ParaSite article (PMID: 27899279) has 236 citations in PubMed. |
URL | https://parasite.wormbase.org |
Description | Ensembl |
Organisation | Ensembl |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | The Ensembl team at EMBL-EBI develop software and infrastructure for the storage and display of genomic data for selected species. WormBase ParaSite have deployed their software and infrastructure, with the specific goal of enabling genomics for the helminth research community. |
Collaborator Contribution | The Ensembl team at EMBL-EBI develop software and infrastructure for the storage and display of genomic data for selected species. WormBase ParaSite have deployed their software and infrastructure, with the specific goal of enabling genomics for the helminth research community. |
Impact | Continued WormBase ParaSite releases are reliant on Ensembl software. |
Start Year | 2014 |
Description | WormBase consortium |
Organisation | WormBase (Biology and Genome of C.Elegans) |
Country | United States |
Sector | Charity/Non Profit |
PI Contribution | WormBase Consortium is led by Paul Sternberg of CalTech, Kevin Howe of the EBI, Matt Berriman of the Wellcome Sanger Institute, and Lincoln Stein of the Ontario Institute for Cancer Research. The consortium runs a model organism database containing data from research on C. elegans and other nematodes. WormBase Parasite provides searching and data access capabilities that are not available through the WormBase website |
Collaborator Contribution | WormBase curates reference genomes which are then imported into WormBase Parasite and provide important functional information for understanding the genomes of comparator species. |
Impact | Provision of annotated genomes for C. elegans and Brugia malayi |
Start Year | 2014 |
Description | Presentation and training at annual "Molecular and Cellular Biology of Helminth Parasites" meeting |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Poster presentation and extensive demonstration and feedback gathering with a cohert of ~100 helminth researchers. |
Year(s) Of Engagement Activity | 2018 |
Description | WBPS Helminth Bioinf Thailand |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Delivered a training workshop in Thailand for practical hands-on training in helminth genome analysis, organised with Wellcome Connecting Science and Khon Kaen University, Thailand. The course concluded with a public outreach event in a local museum. |
Year(s) Of Engagement Activity | 2023 |
URL | https://coursesandconferences.wellcomeconnectingscience.org/event/helminth-bioinformatics-asia-20230... |
Description | WBPS Hydra 2023 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | WormBase ParaSite training was delivered at the Parasitic Helminths conference in Hydra, Greece. |
Year(s) Of Engagement Activity | 2023 |
URL | https://helminthconference.org/wp-content/uploads/2023/08/Hydra-Summary-2023.pdf |
Description | WBPS schisto Brazil |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Delivered a short course titled 'Using WormBase ParaSite to boost your research' at the XVI International Symposium on Schistosomiasis, in Our Preto, Brazil. |
Year(s) Of Engagement Activity | 2023 |
URL | http://www.vppcb.fiocruz.br/16symposium-schisto/pages/programacao_en |
Description | Wellcome Advanced Course in Helminth Genomics - WormBase ParaSite component |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | In collaboration with Wellcome Genome Campus Connecting Science and the Wellcome Sanger Institute Parasite genomics group, we developed a week-long comprehensive training course on helminth genomics, covering topics such as genome assembly and annotation, and population genomics. A significant component of the course (1.5 days) was devoted to WormBase ParaSite. We delivered the course for the first time in September 2019 to a cohort of African helminth biologists in Ghana. |
Year(s) Of Engagement Activity | 2019 |
Description | Workshop at the British Society of Parasitology Spring meetings 2018 and 2019 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Practical workshop demonstrating common use-cases for WormBase ParaSite tools |
Year(s) Of Engagement Activity | 2018,2019 |