WGS-aqua: Capacity building for the widespread adoption of whole genome sequencing (WGS) for the molecular epidemiology of aquaculture pathogens.

Lead Research Organisation: University of Bath
Department Name: Biology and Biochemistry


The rapidly increasing global population, combined with mounting environmental pressures and resource limitation, means that the sustainable production and distribution of adequate quantities of healthy, safe food for everyone is set to become one of mankind's biggest challenges. Due to poor global regulation and management, many of our natural resources have been hopelessly over-exploited, particularly so over the last few decades. The fishing industry is a perfect example, where catastrophic collapses of whole fisheries have resulted from decades of short-termism. This has inevitably resulted in an increasing reliance on farming (aquaculture), and this industry globally now accounts for more seafood (fish and shellfish) consumed than the capture sector.

A major cause of commercial losses in aquaculture is infectious disease, and intensive farming practices in particular will increase the risk. Rearing stressed animals at high densities greatly increases the probability of disease outbreaks on a farm, and pathogens may spread to other farms or even to wild fish. Moreover, the international trade in eggs and live fish increases the likelihood of global disease spread, and the introduction of exotic pathogens into vulnerable native species. In addition to parasites such as lice, serious diseases are also caused by microbes such as bacteria and viruses. In order to detect and manage these infections more efficiently, we urgently require more data on the genetics of the pathogens, why they cause outbreaks when they do, and how they can transmit geographically or between different species of fish.

Many of these challenges are analogous to human diseases, and public health infectious disease epidemiologists are faced with understanding why a new strain of a "superbug" (such as MRSA) has emerged, and how likely it is to spread. Fortunately, the last few years have witnessed a huge technological advance which provides to means to address these problems with much more confidence. This technology makes it possible to decode the entire genetic content (genome) of different strains of bacteria and viruses very quickly and relatively cheaply. Tiny variations in the genome makes it possible to track the transmission of these pathogens, and by being able to identify all the genes present in the genome it is possible to predict whether a given strain will be highly virulent, or difficult to treat due to antibiotic resistance.

This project will exploit the advances in the generation and analysis of genome data for human pathogens, and will apply the same, or similar, techniques to aquaculture pathogens. By doing so, it is hoped that other academics, stakeholders and companies will recognize the benefits of the approach and it will become widely adopted. One of the major challenges with the new sequencing technology is that the vast amounts of data quickly become unwieldy and difficult to manage and analyse efficiently. To address this, a major focus of the project will be the modification of intuitive software tools developed for the genome data for public health pathogens. In order to make sure the system is as useful as possible, we will first hold a workshop which will bring together experts in different fields, both public health and aquaculture, in order to identify the key requirements of such a system. It is very important that such an easy to use, yet powerful, system like this is developed now, so that data from different studies can be combined efficiently, and we can have a truly global picture of the emergence and spread of different strains. Once we have optimized the software, we will illustrate its usefulness by generating genome data for three serious aquaculture pathogens (two bacterial species and one virus) and uploading the data to the system. This will show the relationships between the different strains, where they are distributed on a map, and which disease and resistance genes they contain.

Technical Summary

Recent years have seen a rapidly increasing reliance on aquaculture as a global food source. Significant challenges need to be overcome in order to ensure this means of food production is sustainable. Infectious disease management represents one such challenge. International trade in eggs and live fish increase the risk of pathogen spread, and there is a danger of disease spillover from farmed to wild stocks. Whilst antibiotics might be effective in the short-term but, as with human pathogens, resistance quickly emerges. The advent of next-generation whole genome sequencing (WGS) of pathogens has greatly advanced our understanding of disease emergence and spread, and much of the methodology is transferable from public health to aquaculture. The adoption of WGS for targeted studies and, ultimately, routine surveillance of aquaculture pathogens, would represent a critical turning point in ensuring the long-term sustainability of aquaculture as a global food source.

The over-arching aim of this project is to facilitate the widespread adoption of WGS for aquaculture disease management. This will be achieved by optimizing and implementing community-oriented WGS database infrastructure and software tools designed for intuitive data management and visualization of pathogen spread, and housing these resources under a single site, wgs-aqua.net. The project will bring together bioinformaticians, modellers, statisticians and population genomicists working at the forefront of infectious disease epidemiology in the public health arena, with key stakeholders and academics in the aquaculture sector. We will demonstrate the broad utility of WGS, and of our software, by sequencing a total of ~250 genomes of Flavobacterium psychrophilum, Vibrio anguillarum and Koi Herpes Virus as exemplars of commercially important pathogens. These data will shed light on transmission, host adaptation, resistance and virulence.

Planned Impact

A key deliverable of this project is the development and provision of a web-based platform for the intuitive storage, sharing and visualization of whole genome sequence (WGS) data for aquaculture pathogens. Taking a strong lead from work carried out for public health pathogens, this system will provide the means for efficient epidemiological surveillance and detailed molecular, evolutionary and ecological analysis and modelling. This will have a wide range of benefits both for disease management and for basic bioscience. Key beneficiaries will include researcher in the academic and industrial sector working on novel intervention and containment strategies for aquaculture disease, such as the development of vaccines, novel treatment options (eg phage therapy), and diagnostic kits. The surveillance tools will also provide important data informing on the spread of pathogens that have acquired resistance to antibiotics, thus informing on optimal stewardship strategies. Researchers working on basic bioscience questions, for example relating to pathogenesis or genome dynamics, will also find the databases invaluable for gene mining and understanding spatial patterns of diversification.

The provision of a central surveillance system will allow key stakeholders (eg Cefas) and industry to identify and provide early warning of novel pathogen variants that may be particularly virulent or transmissible. Moreover, the data may also provide evidence concerning management options (eg through identification of resistance genes), or evidence that spread is linked to a potential control point in the rearing process. Developments are ongoing to link pathogen databases (eg wgsa.net; developed by DA, a Co-I on the project) with collection of metadata in the field via a smartphone app (www.epicollect.net). The linking of genome sequence data, with extensive metadata in a central resource will make it possible to develop detailed models of pathogen emergence and spread, and the evaluation of potential intervention strategies in biological (evolutionary and ecological) and economic contexts. Such advances will ultimately benefit policy makers in re-evaluating guidelines for disease control measures.

This project has a strong emphasis on community engagement including academics, stakeholders (Cefas, Defra), and industry. Broad academic support is highlighted elsewhere, we also have key support from three major industrial partners, Novartis, Ridgeway Biologicals and Dawnfresh farming, who highlight the importance of pathogen tracking and in utilizing the WGS data for vaccine development (see letters of support). As stated by Spencer Russell, research manager at Novartis in his letter of support

"Bacterial and viral diseases will be a general and on-going problem in aquaculture and the need to develop methods for tracking the spread of current and emerging diseases using advanced molecular epidemiological methods is critical to the future of aquaculture disease management"

We will seek further industrial participation, primarily through the workshop at the start of the project, and will provide additional community benefits through resources housed on our portal site, wgs-aqua.net.

Aquaculture is the fastest growing food-producing sector in the world, and the societal benefits of improved disease management for sustainable aquaculture are profound. Given pressing challenges facing global food security and the increasing reliance on aquaculture, the development of measures to ensure fish health and protect against disease spread is of paramount importance. Aquaculture is set to play a central role in maintaining the world's food supply, but the risks and losses due to infectious disease will increase and the sector expands and becomes even more globalized. The development of the infrastructure required for central surveillance and WGS data storage is therefore critical and timely.


10 25 50
Description We have generated genome data on seven aquaculture pathogens (the original grant only listed three of these). These data have provided detailed information on the the molecular epidemiology and evolution of these species, eg in respect to movement via trade, host adaptation, antibiotic resistance, and ecological adaptation. For example, one manuscript currently under review describes the introduction and spread of Renibacterium salmoninarum in Chile. This work identifies at least four introductions into Chile of this pathogen, three of which were followed by extensive spread within the country. One paper under review describes a 500Kb recombination event in Vibrio anguillarum, the first such large scale event described for Vibrio, and of relevance for understanding host specialisms. Another paper ready for submission describes the population structure of Yersinia ruckeri and the dynamics of global spread. We have also described host adaptation in the ecological generalist Lactococcus garviaea - in this case genome sequencing has revealed a genomic region of high gene content variability that is associated with outbreaks in trout. Other papers in advanced state include a study on the the diversity of Koi Herpes Virus in the UK, the genomic variability and geographic structuring of Flavobacterium psychrophilum. genome data from six bacterial pathogens have been housed on a private database using the BIGSdb framework. We have also published a comprehensive review in Frontiers in Microbiology which describes the use of whole genome sequencing for aquaculture. Comparative genomics of the Vibrio anguillarum genome data has revealed a large-scale homologous recombination event, the first such event described for Vibrio. Analysis of V. aestuarianus and V. genome sequences have revealed evidence concerning the emergence of outbreaks in oyster farms in Ireland.
Exploitation Route As more of the data are published, they will be made available to the community via the BIGSdb platform and the ENA.
Sectors Agriculture, Food and Drink,Environment

Description The data are being used as reference genome data sets, as intended, and to provide baseline evidence concerning diversity and population structure. The techniques and pipelines developed have been used and further developed for pathogen management in aquaculture.
First Year Of Impact 2018
Sector Agriculture, Food and Drink
Impact Types Economic

Description GW4 SWBio Doctoral Training Partnership (DTP)
Amount £50,000 (GBP)
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 09/2017 
End 10/2021
Description University of Bath Alumni Fund
Amount £35,000 (GBP)
Organisation University of Bath 
Sector Academic/University
Country United Kingdom
Start 09/2016 
End 10/2019
Description Yersiniose - utredning av økende forekomst hos norsk oppdrettslaks i sjøfasen (Yersiniosis - investigation of the increasing occurrence of marine Yersiniosis in Norwegian farmed salmon)
Amount kr 9,097,193 (NOK)
Funding ID FHF 901505 
Organisation Norwegian Seafood Council 
Sector Public
Country Norway
Start 07/2018 
End 09/2021
Title Genome sequence for F. psychrophilum 
Description Over 200 genome sequences generated from salmon and trout outbreaks, and from diverse host / environmental sources. Global collection. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact Manuscript currently in preparation 
Title Genome sequences for Koi-Herpes Virus 
Description Over 30 genome sequences of Koi-Herpes virus have been generated. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact Manuscript in preparation 
Title Genome sequences for Lactococcus garvieae 
Description Over 50 genome sequences for L. garvieae, including from major trout outbreaks and from diverse environmental sources. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact Manuscript in preparation 
Title Genome sequences for Vibrio anguillarum Vibrio aestuarianus and Vibrio splendidus 
Description Over 70 genome sequences for V anguillarum. including pre-vaccine historical isolates (now published in Microbial Genomics) 45 genomes of Vibrio aestuarianus and Vibrio splendidus 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact V. anguillarum data now published. The Vibrio aestuarianus paper is in preparation and describes the origin of strains responsible for summer mortality syndrome in Irish oyster farms. 
URL https://www.frontiersin.org/articles/10.3389/fmicb.2020.01430/full
Title Genome sequences for Yersinia ruckeri 
Description Over 100 genome sequences of Y ruckeri, predominantly from Europe and North America. This study has been expanded to include more global written up and will be submitted shortly. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact Manuscript ready for submission 
Title PIRATE: pipeline for exploring the pan-genome 
Description A new bioinformatics pipeline for exploring bacterial pan-genomes using different thresholds of sequence identity to define gene homology groups. 
Type Of Material Data analysis technique 
Provided To Others? No  
Impact Manuscript in preparation 
Title Supporting data for "PIRATE: A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria" 
Description Cataloguing the distribution of genes within natural bacterial populations is essential for understanding evolutionary processes and the genetic basis of adaptation. Here we present a pangenomics toolbox, PIRATE (Pangenome Iterative Refinement And Threshold Evaluation), which identifies and classifies orthologous gene families in bacterial pangenomes over a wide range of sequence similarity thresholds. PIRATE builds upon recent scalable software developments to allow for the rapid interrogation of thousands of isolates. PIRATE clusters genes (or other annotated features) over a wide range of amino-acid or nucleotide identity thresholds and uses the clustering information to rapidly identify paralogous gene families and putative fission/fusion events. Furthermore, PIRATE orders the pangenome using a directed graph, provides a measure of allelic variation and estimates sequence divergence for each gene family. We demonstrate that PIRATE scales linearly with both number of samples and computation resources, allowing for analysis of large genomic datasets, and compares favorably to other popular tools. PIRATE provides a robust framework for analysing bacterial pangenomes, from largely clonal to panmictic species. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Title genome sequence for Piscirickettsia salmonis 
Description Over 20 genome sequence for Piscirickettsia salmonis from salmon farms in Chile 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact Manuscript in preparation 
Title genome sequences for Renibacterium salmoninarum 
Description 30 genome sequences for R. salmoninarum from Chile 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact Manuscript in preparation 
Description ARCH-UK network 
Organisation University of Stirling
Department Division of Communications, Media and Culture
Country United Kingdom 
Sector Academic/University 
PI Contribution This BBSRC/NERC funded network involves the key project partners (Stirling, Cefas). It began on 1/3/17 so has not yet made a direct contribution to the project.
Collaborator Contribution Only began on 1/3/17
Impact None so far.
Start Year 2017
Description Collaboration between E Feil and Jie Feng (Chinese Academy of Sciences, Beijing) 
Organisation Chinese Academy of Sciences
Country China 
Sector Public 
PI Contribution E Feil invited to spend 2 weeks in Beijing to analyse data on Aeromonas salmonicida outbreak in large salmon farm in China. Over 30 genome sequences generated and manuscript in advanced state of preparation
Collaborator Contribution Prof Feng provided the strains and expertise, Feil and Bayliss bioinformatics support and guidance on evolutionary aspects.
Impact Manuscript in preparation
Start Year 2016
Description Yersiniosis in Norwegian Farmed Salmon project 
Organisation Norwegian School of Veterinary Science
Country Norway 
Sector Academic/University 
PI Contribution Sponsor: Norwegian Seafood Council (FHF 901505) Yersiniose - utredning av økende forekomst hos norsk oppdrettslaks i sjøfasen (Yersiniosis - investigation of the increasing occurrence of marine Yersiniosis in Norwegian farmed salmon) Aug 2018 - Sept 2021 TOTAL 10M NOK (£923,000). PI Duncan Colquhuon (Norwegian Veterinary Institute, Oslo) Co-I Heidrun Wergeland, University of Bergen, Co-Is E. Feil Sion Bayliss (University of Bath)
Collaborator Contribution Bioinformatics support for whole genome sequence data analysis. Advice on evolutionary analysis.
Impact Multidisciplinary project looking at the source and spread of Yersiniosis (Yersinia ruckeri) in Norwegian salmon. Generated large volumes of sequence data and phenotypic data on large strain collection. Challenge assays generated data on virulence.
Start Year 2018
Description workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact The workshop was organised as an integral part of WP1 for the wgs-aqua.net project funded by the BBSRC/NERC sustainable aquaculture call. The project will initially focus on three exemplar pathogens; Vibrio anguillarum (Va), Flavobacterium psychrophilum (Fp) and Koi Herpes Virus (KHV). More details of the project are here (http://wgs-aqua.net/) including a full list of workshop participants (http://wgs-aqua.net/2015/06/). The aim of the workshop was to bring together experts from aquaculture with those from other fields (predominantly public health) for knowledge exchange and resource sharing, with an ultimate view to maximally exploit whole genome sequencing of aquaculture pathogens for disease management.
Year(s) Of Engagement Activity 2014
URL http://wgs-aqua.net/category/conferences/