Characterization & exploitation of copy number variation in Atlantic salmon

Lead Research Organisation: University of Aberdeen
Department Name: College of Life Sci and Med Graduate Sch

Abstract

Atlantic salmon aquaculture is worth more than 1 billion pounds sterling annually to the UK economy and represents a rapidly expanding global industry facing major challenges to ensure sustainable growth. There is currently a major opportunity to develop genetic tools that allow the breeding and production of salmon with favourable traits, including high resistance to disease and parasites and the capacity to convert environmentally sustainable diets into fast-growing and nutritionally-valuable flesh. The objective of this studentship is to characterize an unstudied aspect of genetic variation in Atlantic salmon called copy number variation (CNV) and to exploit this knowledge in aquaculture, focussing on improved flesh growth and quality characteristics.

CNVs are duplicated or deleted regions of chromosomal DNA and one of the main forms of genomic variation distinguishing individuals within animal species. They are widely associated with functional genes and much scientific data links them to a range of diseases or other important biological variation in humans and other species. However, despite their recognised importance, CNVs have never been researched or applied in farmed finfish species. This represents a major untapped resource considering the large potential gains that might be made using CNVs as markers for valuable aquaculture traits. Our project will exploit deep DNA sequencing technology and the latest computational methods to detect CNVs in Atlantic salmon from many genetic backgrounds, including fish showing large variation in fillet yield (the percentage of edible flesh), flesh gaping (damage to the flesh arising during production) and flesh texture (the firmness of edible flesh). These traits are strongly linked to the profitability of aquaculture, impacting many parts of the production chain that ultimately leads to the consumer at the supermarket shelf.

Thus, we will characterize salmon CNVs spanning the genome, determining those affecting protein-coding genes. We will use independent technologies to thoroughly validate the CNVs and to establish their effects on gene expression - an important measure of functional importance for the animal. The final step of our project will involve using statistical methods to associate a characterized panel of CNVs with trait data taken from a much larger group of salmon. The objective here will be to identify specific CNV markers for flesh traits which have high economic value.

The project will be led by the University of Aberdeen (UoA) with Xelect Ltd - a BBSRC supported company - as its industrial partner. Xelect will provide a range of salmon populations, expertise in genetics and trait identification, plus access to a growing database of trait variation. The main business of Xelect is to develop genetic markers for use in the salmon industry: the company currently licenses markers to salmon egg producers across the world. Therefore, the project is likely to lead to the development and licensing of CNV markers globally, both for the traits mentioned already, and potentially many others in the future. The UoA will benefit from a major advancement in understanding of the Atlantic salmon genome, which, in addition to having applied value, has strong relevance in many other academic research contexts. The UoA may also benefit from a contractual agreement with Xelect to share a portion of financial royalties coming from licensing of CNV markers. Finally, as CNVs are very poorly characterized in fish generally, the project will provide proof-of-concept on the feasibility of CNV detection and exploitation, which can be applied in other species of societal importance.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/M016455/1 01/10/2015 30/09/2019
1721959 Studentship BB/M016455/1 01/10/2015 30/09/2019 Alicia Bertolotti
 
Description Atlantic salmon (Salmo salar) is of considerable economic, cultural and environmental importance. For these reasons, the genetics of this species has been extensively studied using well-characterized markers, particularly single nucleotide polymorphisms (SNPs) and microsatellites, progressing our understanding of population structure and the genetic basis for numerous traits. However, larger genetic variants have not yet been studied in any salmonid. Structural variants (SVs) are a major component of genetic variation in all species, and include duplications, deletions and inversions ranging from hundreds to millions of DNA basepairs in length. SVs can influence phenotypic variation both within and between populations by altering gene functions and expression, but to date have been well-characterized in just a few model species. To address this knowledge gap, I developed a bioinformatic pipeline to detect SVs in the Atlantic salmon genome, which can also be applied to other species. The approach involves SV detection using paired-end whole genome sequencing (WGS). Pre-calling steps include a series of easy-to-implement filters to reduce false positives inherent to SV detection with complex genomes and imperfect reference assemblies. SV calling was carried out using the Smoove version of the probabilistic LUMPY program, which detects SVs by integrating signals gained from pair-end and split-read mapping, along with read depth. SV calls were then genotyped within Smoove using the Bayesian program SVtyper and their potential impact annotated using snpEff. A tool called SV-plaudit was incorporated into the workflow for efficient visual curation of SV calls. To validate the curated SVs, a long-read sequencing approach was developed using the Oxford Nanopore MinION platform, allowing the false positive rate to be established. 15,625 high-confidence SVs were detected in n=493 individuals, from populations covering much of the species range, including both wild and farmed fish. The overall true positive SV call rate was ~15%, justifying the importance of filtering and visual curation. Among the SVs identified, 90% were deletions, 8% duplications, and 2% inversions, many overlapping genes and ~6% predicted to have a major impact on gene function. I show that the global SV dataset recaptures known population structure from past studies. Interestingly, SVs differentiated between wild and farmed fish may have been involved in salmon domestication according to gene enrichment and expression analyses. The SV detection workflow and SV dataset presented here will be useful for ongoing research supporting Atlantic salmon conservation and breeding.
Exploitation Route In the immediate term, the structural variation (SV) pipeline developed in this project is being used in ongoing BBSRC funded research, including the project "AquaLeap: Innovation in Genetics and Breeding to Advance UK Aquaculture Production, Award Reference BB/S004181/1). To date, the method has been applied in the AquaLeap project to call and genotype SVs in a commercial Atlantic salmon population, with a view to dissect the role of SVs in the genetic basis of disease resistance (work in process).

A publication of the key findings from this project was published in nature communications (https://doi.org/10.1038/s41467-020-18972-x). Other groups and projects have started to use the SV characterization pipeline in salmonids ( eg https://www.frontiersin.org/articles/10.3389/fgene.2021.639355/full) and other non-model species to improve the understanding and exploitation of genetic information.

The findings and resources generated in the project have also been used as leverage to apply for further funding, establish new industry collaborations and to maintain ongoing academic collaborations
Sectors Education,Manufacturing, including Industrial Biotechology,Other

URL https://doi.org/10.1038/s41467-020-18972-x
 
Description Competitive international PhD studentship on Structural Variation in Farmed Animal Genomes
Amount £146,000 (GBP)
Organisation University of Edinburgh 
Department Royal School of Veterinary Studies
Sector Academic/University
Country United Kingdom
Start 09/2021 
End 03/2025
 
Description SAIC funding embedded within BBSRC/NERC Aquaculture Initiative Consortium grant: 'AquaLeap: Innovation in Genetics and Breeding to Advance UK Aquaculture Production' (not reported in that award)
Amount £1,700,000 (GBP)
Organisation Scottish Aquaculture Innovation Centre 
Sector Multiple
Country United Kingdom
Start 01/2019 
End 12/2022
 
Description UK Aquaculture Initiative: Collaborative Research and Innovation Projects (large consortia grant) "AquaLeap: Innovation in Genetics and Breeding to Advance UK Aquaculture Production"
Amount £1,700,000 (GBP)
Funding ID BB/S004181/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 01/2019 
End 12/2021
 
Title SV calling pipeline 
Description A comprehensive and accurate pipeline has been generated allowing novel structural variants (SVs) to be identified using whole genome sequencing in Atlantic salmon - ICSASG_V2 reference genome - (Lien et al., 2016). This pipeline will be made available in GitHub. The pipeline aims to overcome several key challenges faced when calling SVs in complex genomes from non-model organisms. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? No  
Impact Even well mapped, high coverage and well aligned sequencing reads can be problematic when identifying SVs such as insertions, deletions, rearrangements and inversions. Determining true structural changes from sequencing artefacts remains an outstanding challenge in WGS analysis. Multiple approaches developed rely not only on heavy filtering of SVs, but also usage of previous data (previously validated SVs, parentage information). In the case of species where no previous SV information is available, developing filters against false positive calls is particularly challenging and often not accurate enough. This pipeline provides a step by step approach to accurately calling and curating a large set of novel SVs, using solely a bioinformatic approach: Step 1 - gap regions: Complications due to coverage bias (AT or GC rich areas notably) and repetitive segments of varying length and copy number mean that there are many sequences of varying length that are not successfully anchored to the reference chromosomes. These regions are referred to as gap regions. These gap regions need to be identified and discarded for variant discovery as they not only contain no information but also lead many programmes to confuse them for homozygous deletion events. Step 2 - high depth regions: Similarly to gap regions, highly repetitive or TE-rich regions of the genome can lead to read pile-up with alignment software. When using standard paired-end sequencing, reads that have high repetitive content are often not long enough to be able to differentiate them from other reads with the same repetitive content. Therefore, they are often aligned arbitrarily to one part of the assembly causing read pile up in one location. These high depth regions often lead SV calling programmes to wrongly identify them as duplications. To identify high depth regions, per-base coverage data is produced using mosdepth (v.0.2.4). All regions above 5x coverage more than the average and present in more than 10% of individuals can be merged into a .bed file and used as a filter when calling variants. Step 3 - read alignment: Alignment to the reference genome is carried out using the Burrow- Wheelers Aligner (BWA) with default parameters. The reads are then converted from the generic Sequence Alignment/MAP format (SAM) to the Binary Alignment/MAP (BAM) format for downstream analysis using. Coverage and quality of alignments, as well as batch effect and sample error are checked using Indexcov. Step 4 - SV calling: The indexed BAM alignment files are used as input for SV calling programme Lumpy smoove (https://github.com/brentp/smoove). This tool encompasses multiple detection signals including read-pair, split-read and read-pair (Layer, Chiang, Quinlan, & Hall, 2014). SV validation: The novel and high quality step of the SV calling pipeline is in the manual curation of SV calls using SV plaudit (https://github.com/jbelyeu/SV-plaudit) (Layer, Chiang, Quinlan, & Hall, 2014). Every SV is visually and rapidly curated to form a final set of high confidence, high-quality SV calls. This step makes obsolete the use of heavy (and often inaccurate) filters and allows us to detect high-confidence SVs in species and genomes with no prior SV data. 
 
Title Atlantic salmon SVs 
Description Nine farmed Atlantic salmon (Salmo salar) individuals were sequenced using whole genome sequencing approaches (NCBI accession number: PRJNA378201). SV landscape was characterized for the first time in Atlantic salmon genome 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact Using the pipeline described in the section "Research Tools & Methods", 6598 high confidence have been detected - 6,150 deletions, 385 duplications and 63 inversions - ranging from 101 bp and 11,941 bp, with an average size of 1,345 bp. These variants are distributed throughout the 29 chromosomes of Atlantic salmon (ICSASG_V2) 
 
Title Genome re-sequencing of salmon with different resistance to gill disease 
Description 20 genomes (10 susceptible, 10 resistant to ameobic gill disease according to gEBVs) were sequenced at 15-20x coverage and used to characterize structural variation 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact Identification of candidate functional variants for disease resistance 
 
Title Structural Variation in the Atlantic salmon genome 
Description Novel bioinformatic workflow developed for reliable detection of structural variation in salmonid genome. Used to characterize SVs in 492 Atlantic salmon reported in Bertolotti et al. 2020 
Type Of Material Data analysis technique 
Year Produced 2020 
Provided To Others? Yes  
Impact Data and bioinformatic workflow being used in ongoing projects and collaboration with industry partners 
URL https://www.nature.com/articles/s41467-020-18972-x
 
Description CIGENE (NMBU) 
Organisation Norwegian University of Life Sciences (NMBU)
Country Norway 
Sector Academic/University 
PI Contribution Development of SV calling pipeline to identify novel SVs in 454 sequenced salmon
Collaborator Contribution Sharing of large whole-genome sequencing dataset (454 Atlantic salmon individuals, sequenced at 10-20x coverage). The collaboration aims to combine the analyses of structural variants and SNPs at a population genomics scale across the geographical distribution of Atlantic salmon
Impact novel SVs in wild Atlantic salmon genome
Start Year 2015
 
Description Collaboration on Atlantic salmon structural variation landscape 
Organisation Norwegian University of Life Sciences (NMBU)
Country Norway 
Sector Academic/University 
PI Contribution Led the development of structural variation detection and characterisation in Atlantic salmon genomes
Collaborator Contribution NMBU (CIGENE lab led by Prof. Sigbjorn Lien) provided raw sequencing data for 453 Atlantic salmon genomes. Xelect Ltd supported associated studentship as iCASE partner. University of Colorado (Dr Ryan Layer) supported advance bioinformatics.
Impact Bertolotti, A.C., Layer, R.M., Gundappa, M.K. et al. The structural variation landscape in 492 Atlantic salmon genomes. Nat Commun 11, 5176 (2020). https://doi.org/10.1038/s41467-020-18972-x
Start Year 2017
 
Description Collaboration on Atlantic salmon structural variation landscape 
Organisation University of Colorado Boulder
Country United States 
Sector Academic/University 
PI Contribution Led the development of structural variation detection and characterisation in Atlantic salmon genomes
Collaborator Contribution NMBU (CIGENE lab led by Prof. Sigbjorn Lien) provided raw sequencing data for 453 Atlantic salmon genomes. Xelect Ltd supported associated studentship as iCASE partner. University of Colorado (Dr Ryan Layer) supported advance bioinformatics.
Impact Bertolotti, A.C., Layer, R.M., Gundappa, M.K. et al. The structural variation landscape in 492 Atlantic salmon genomes. Nat Commun 11, 5176 (2020). https://doi.org/10.1038/s41467-020-18972-x
Start Year 2017
 
Description Collaboration on Atlantic salmon structural variation landscape 
Organisation Xelect Ltd
Country United Kingdom 
Sector Private 
PI Contribution Led the development of structural variation detection and characterisation in Atlantic salmon genomes
Collaborator Contribution NMBU (CIGENE lab led by Prof. Sigbjorn Lien) provided raw sequencing data for 453 Atlantic salmon genomes. Xelect Ltd supported associated studentship as iCASE partner. University of Colorado (Dr Ryan Layer) supported advance bioinformatics.
Impact Bertolotti, A.C., Layer, R.M., Gundappa, M.K. et al. The structural variation landscape in 492 Atlantic salmon genomes. Nat Commun 11, 5176 (2020). https://doi.org/10.1038/s41467-020-18972-x
Start Year 2017
 
Description Layer Lab - UC boulder 
Organisation University of Colorado Boulder
Country United States 
Sector Academic/University 
PI Contribution Detection of novel Atlantic salmon SVs
Collaborator Contribution Actively participating in providing help and novel programmes for SV calling, as well as adding crucial steps in the experimental design especially during the calling and validation of SVs steps.
Impact High confidence, visually validated SVs for Atlantic salmon
Start Year 2018
 
Description Xelect ltd 
Organisation Xelect Ltd
Country United Kingdom 
Sector Private 
PI Contribution 6 months of the PhD involved industrial placements with Xelect ltd were skills useful for scientific research in industry were developed. Laboratory as well as data analysis support was provided.
Collaborator Contribution Xelect Ltd: a privately-owned aquaculture genetics company founded in 2012 (spin-out from the University of St Andrews). Xelect is the industrial partner to the PhD project and has actively collaborated by providing DNA samples for whole genome sequencing, valuable support in building of the structural variants calling pipeline and laboratory space.
Impact Nine farmed Atlantic salmon (Salmo salar) individuals were sequenced using whole genome sequencing approaches (NCBI accession number: PRJNA378201).
Start Year 2015
 
Description Alicia Bertolotti: Copy number variation in the Atlantic salmon (Salmo salar) genome. Genome 10K and Genome Science Conference, Norwich, UK. 29 Aug - 1 Sept 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This was a presentation at an international genomics conference, to an audience of around 200, comprising leading academics, research fellows, post-graduates, undergraduates, industry, and the media. Findings from the award were disseminated and discussed, altering the views of prominent scientists in terms of complexities of salmonid genome evolution and its impact on population level genomic variatino.
Year(s) Of Engagement Activity 2017
 
Description Coordinated meeting of the international 'Functional Annotation of All Salmonid Genomes' initiative 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact This meeting was attended by 30 salmonid biologists linked to the international FAASG initiative (https://www.faasg.org/). Key discussions were held on the future of the initiative, its links to UK infrastructure (EMBL-EBI) and future funding priorities, influencing funders in attendence (Norwegian Research Council and Genome Canada)
Year(s) Of Engagement Activity 2019
URL https://icisb.org/faasg-meeting/
 
Description Daniel Macqueen: Whole genome duplication and the evolution of salmonid fish: the state of the art. Genome 10K and Genome Science Conference, Norwich, UK. 29 Aug - 1 Sept 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This was a presentation at an international genomics conference, to an audience of around 200, comprising leading academics, research fellows, post-graduates, undergraduates, industry, and the media. Findings from the award were disseminated and discussed, altering the views of prominent scientists in terms of complexities of salmonid genome evolution.
Year(s) Of Engagement Activity 2017
 
Description Invited departmental seminar, University of Bristol 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact I gave a talk titled 'Genome duplication and diversification: insights served with salmon' to an academic audience interested in genome evolution, which led to discussions around new collaborations and sparked a range of questions and stimulating discussions.
Year(s) Of Engagement Activity 2019
URL http://www.bris.ac.uk/biology/events/2019/research-seminar-16122019.html
 
Description Invited talk at International workshop Functional annotation of the Atlantic salmon genome, translation to improved health and performance in aquaculture. 'Advancing aquaculture by genome functional annotation: Memorial University, Canada. Aug 26-27th 2019. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Gave an invited talk at International workshop Functional annotation of the Atlantic salmon genome, translation to improved health and performance in aquaculture. 'Advancing aquaculture by genome functional annotation: Memorial University, Canada. Aug 26-27th 2019. The outcomes were an increased mutual understanding of research and collaborative activity with international collaborating scientists.
Year(s) Of Engagement Activity 2019
 
Description Lead coordinator of the fourth International Conference on the Integrative Biology of Salmonids (https://icisb.org/). 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact The fourth International Conference on Integrative Salmonid Biology (ICISB 2019), followed on from previous meetings in 2012 (Oslo, Norway), 2014 (Vancouver, Canada) and 2016 (Puerto Varas, Chile). The ICISB meetings have been core funded and organized by the International Cooperation to Sequence the Atlantic Salmon Genome (ICSASG), a trilateral effort between Canada, Chile and Norway. The theme of ICISB 2019 was'Beyond the genome: taking leaps forward in salmonid biology' to reflect the recent staggering progress in genomic resource development and exploitation since the Atlantic salmon reference genome was published in 2016.

There was an audience of ~200, which represented a mixture of researchers from Professors leading in the field, to undergraduate students. Many international collaborations and opportunities for further research, funding and meetings were explored with a range of stakeholders, including funders, media and industry.
Year(s) Of Engagement Activity 2019
URL https://icisb.org/
 
Description Presentation at Aquaculture Europe 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation titled "THE STRUCTURAL VARIATION LANDSCAPE IN ATLANTIC SALMON AND IT'S POTENTIAL CONTRIBUTION TO DISEASE RESISTANCE" was given at Aquaculture Europe 2019 (October 2019, Berlin). The purpose was to disseminate findings from AquaLeap and other research projects focused on structural variation. The talk sparked several scientific questions.
Year(s) Of Engagement Activity 2019
URL https://www.aquaeas.eu/images/stories/Meetings/AE2019/AE19_Blue_9-30_003.pdf
 
Description Presentation for SBS PGR Conference & Symposium 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact Talk summarising PhD project for the University of Aberdeen's Postgraduate conference.
Year(s) Of Engagement Activity 2018
 
Description Salmonid Genomics Workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Presentation for BBSRC partnering meeting for Functional annotation of the Atlantic salmon genome, translation to improved health and performance in Aquaculture
Year(s) Of Engagement Activity 2018
 
Description School visit 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact A talk for school pupils about PhDs and careers in science
Year(s) Of Engagement Activity 2018
 
Description Seminar at the Roslin Institute: "Farmed Fish Integrative Genomics" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Other audiences
Results and Impact I gave a seminar at the Roslin Institute, which was an overarching overview of the major research projects in my lab. The seminar was linked to job vacancy, so pitched towards the relevance of my work to BBSRC/UKRI remit and the interests/remit of the Roslin Institute. I was succesful in getting the position (Reader, University of Edinburgh), so a major impact followed this seminar.
Year(s) Of Engagement Activity 2018
 
Description Seminar, Daniel Macqueen: 'Linking evolution and function through genomes - Fishing for insights'. Roslin Institute, University of Edinburgh. Hosted by Prof. Ross Houston. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact This was a seminar, where findings from this award were described. The audience were academics, research scientists, PhD students and undergraduate students. The seminar served to disseminate knowledge and understanding generated during the award, and led to fruitful discussions about future collaborations.
Year(s) Of Engagement Activity 2017