15AGRITECHCAT3 Innovative NextGen pig breeding using DNA sequence data

Lead Research Organisation: University of Edinburgh
Department Name: The Roslin Institute

Abstract

Summary
In this project, we will develop a new technology we call NextGen Breeding, which is based on the collection and utilization of very large quantities of sequence data, and will enable us to dramatically accelerate the rate of genetic improvement in our pig populations. This new technology will help us deliver real improvements to pig industry productivity and sustainability in the UK and around the world. The project involves collaboration between two world class UK partners, PIC (part of Genus plc), the world's leading pig breeding company and The Roslin Institute (RI), the world's leading research centre in the application of genomics and quantitative genetics to farm animal breeding. The project requires whole genome sequencing of samples on an unprecedented scale and even though our innovative approach dramatically reduces the costs over the conventional paradigm, the risk and costs are still considerable. Sharing the costs with Innovate UK would enable us to undertake this project.

Technical Summary

Summary
In this project, we will develop a new technology we call NextGen Breeding, which is based on the collection and utilization of very large quantities of sequence data, and will enable us to dramatically accelerate the rate of genetic improvement in our pig populations. This new technology will help us deliver real improvements to pig industry productivity and sustainability in the UK and around the world. The project involves collaboration between two world class UK partners, PIC (part of Genus plc), the world's leading pig breeding company and The Roslin Institute (RI), the world's leading research centre in the application of genomics and quantitative genetics to farm animal breeding. The project requires whole genome sequencing of samples on an unprecedented scale and even though our innovative approach dramatically reduces the costs over the conventional paradigm, the risk and costs are still considerable. Sharing the costs with Innovate UK would enable us to undertake this project.

Planned Impact

Impact summary
(i) Animal breeding companies, breed societies, and levy boards. A successful outcome of the project is expected to quantify the value of NextGen Breeding, quantify the volume of data required to drive it, demonstrate how to implement it, and provide the tools required to implement it. NextGen Breeding is expected in increase the precision, efficacy and sustainability of animal breeding. By demonstrating and enabling NextGen Breeding animal breeding companies, breed societies and levy boards will be enabled to deliver a higher quality product more quickly and cheaply to their customers. PIC will further benefit from being able to directly commercialise NextGen Breeding rapidly.
(ii) The entire chain of users of pig products. The entire chain of users of pig products, including meat packers, processors, retailers and consumers by providing a higher quality product, which costs them less, and is more environmentally friendly, healthier and suited to their individual requirements.
(iii) NextGen Breeding is highly applicable to plant breeders, who are increasingly adopting genomic selection. Therefore the benefits to plant breeding organisations, in the developed and developing world, will be similar to those outlined for animal breeding companies, breed societies, and levy boards. These benefits will also pertain to managers of breeding programs for companion animal populations.
(iv) UK plc will benefit from increased tax revenues through increased profitability of PIC, the pork supply chain, and other agricultural users should they adopt the method.
(v) The academic community. Scientifically, the project constitutes a novel approach for generating and utilizing huge volumes of sequence data. This will enable larger and hence more powerful experiments than currently feasible, which together with the analysis of the data set to be generated in this project will answer fundamental questions in to animal genetics and more broadly to the genetics of quantitative traits, which is of interest to human geneticists, plant geneticists, and evolutionary biologists. The methods developed and utilised in this project will be applicable to several fields concerned with the generation and analysis of huge volumes of data (genetics, meteorology, engineering, etc.). The size of the data generated will spawn new research into methods of analysis, which will benefit researchers in many fields.
(vi) Commercial sequence and genotype providers. Companies providing SNP or sequence data will be able to open up a completely new market based on low cost provision of huge volumes sequence data.
(vii) Society. All members of society who work to improve or depend upon the competitiveness and sustainability of agriculture will benefit from the downstream practical applications outlined above. The application of NextGen Breeding by breeding organisations will lead to faster and more sustainable genetic progress, leading to healthier food, and food production that is more resource efficient and affordable. Increased efficiencies in agriculture has direct societal benefits in greater food security with less environmental impact.
(viii) UK science base. The proposed methods and data set will provide a platform for increased R&D capabilities in the UK, maintaining its scientific reputation and associated institutions, with increased capability for sustainable agricultural production.
(ix) Training. The proposed research will be embedded within training courses that the PI is regularly invited to give, and the post-docs working on the project will have the opportunity to be trained at a world-class institute in a cutting edge area of research while interacting with a leading commercial partner.
(x) Policy. Sequence data is expensive, but the research and practical benefits are potentially large. Therefore much investment will be made in sequence data in the coming years. The methods and outcomes from this project will guide this.

Publications

10 25 50
 
Description The aim of the project is to increase the efficiency of genomic selection of pigs. Specifically, we are working with PIC pig breeding company, on the collection and utilization of very large quantities of sequence data, which will enable us to dramatically accelerate the rate of genetic improvement in pig populations. We have developed AlphaImpute software package for imputing and phasing genotype data in populations with pedigree information available; and AlphaSeqOpt software packages for determining the optimum distribution of sequencing resources within a population of interest. These tools are routinely used in our lab to identify the most effective sequencing strategy of the population within the budget constraints.
Next to this, we have produced one of the largest genetic variation datasets ever created in pig.
Exploitation Route AlphaSeqOpt and AlphaImpute are freely available in the AlphaGenes website.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment

URL http://www.alphagenes.roslin.ed.ac.uk/alphasuite-softwares/alphaseqopt/
 
Description This project has attracted great interest by the genomic selection community and Prof. Hickey has been approached by plant breeders to form new research partnerships which should bring new funding streams into the Institute. This project has yielded one of the largest genetic variation datasets in pig. In next phases, this data will be used in association studies with relevant traits in pig breeding. This will enable the identification of genomic regions that will become important breeding targets in the future, with substantial economic impact and impact on improvement of animal welfare.
First Year Of Impact 2016
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment
Impact Types Societal,Economic

 
Description Newton Fund Workshop Brazil
Amount £52,000 (GBP)
Funding ID 228949780 
Organisation British Council 
Sector Charity/Non Profit
Country United Kingdom
Start 04/2016 
End 09/2016
 
Description Newton Fund Workshop Mexico
Amount £37,550 (GBP)
Funding ID 2016-RLWK7-10399 
Organisation British Council 
Sector Charity/Non Profit
Country United Kingdom
Start 04/2017 
End 03/2018
 
Title Development of LCSeq 
Description LCSeq is a system that we are developing which enables sequence information to be generated for animal breeding populations at low-cost. LCSeq recognises that in livestock populations individuals are highly related and thus share long genome segments (haplotypes). Rather than sequencing individuals at high-coverage aims to sequence the haplotypes that are present in the population by spreading the sequencing resources at low-coverage across many individuals. 
Type Of Material Data analysis technique 
Provided To Others? No  
Impact the research is still on-going, however, one of the softwares that will contribute to this model is AlphaSeqOpt, freely available on the AlphaSuite. 
URL http://www.alphagenes.roslin.ed.ac.uk/alphasuite-softwares/alphaseqopt/
 
Description NextGen Breeding project 
Organisation Genus plc
Country United Kingdom 
Sector Private 
PI Contribution The AlphaSuite is a collection of software that we have developed to perform many of the common tasks in animal breeding, plant breeding, and human genetics including genomic prediction, breeding value estimation, variance component estimation, GWAS, imputation, phasing, optimal contributions, simulation, field trial designs, and various data recoding and handling tools.
Collaborator Contribution PIC is providing the DNA sequencing data from approximately 14,000 individual from their genetic nucleus.
Impact At this stage of the collaboration the outputs have not been generated.
Start Year 2015
 
Title AlphaImpute 
Description Imputation can cost-effectively generate high-density genotypes of many individuals. Typical genotyping strategies involve genotyping a small number of individuals with expensive high-density marker panels, and a large number of individuals with cheaper low-density panels. Imputation is the used to infer the un-typed high-density markers in the individuals genotyped at low-density. AlphaImpute is a flexible tool that imputes genotypes and alleles accurately and quickly for datasets with large pedigrees and large numbers of genotyped markers. It combines basic rules of Mendelian inheritance, probabilistic inferences of genotypes, phasing of long stretches of haplotypes, and imputation of genotypes from a haplotype library. AlphaImpute consists of a single program however it calls both AlphaPhase1.1 and GeneProbForAlphaImpute. All information on the model of analysis, input files and their layout, is specified in a single parameter file. 
Type Of Technology Software 
Year Produced 2016 
Impact The AlphaImpute package is freely available in AlphSuite and includes supporting manual, and access to technical support with the aim of benefiting the academic research community in animal breeding. The program has been downloaded over 200 times in recent years, attracting users from a number of different academic institutions internationally. AlphaImpute has supported collaboration with a number of industrial partner. One such example is the Innovate UK funded project in collaboration with PIC. This project has accelerated the rate of genetic gain by 35% in pigs, enabled by AlphaImpute. Major emphasis has been put on making AlphaImpute more computationally effective and accessible to small animal breeding operation and/or academic institutions, we have succeeded in improved the computing time by 75%. 
URL http://www.alphagenes.roslin.ed.ac.uk/alphasuite-softwares/
 
Title AlphaPhase 
Description The use of phased sequencing data has been shown to significantly increase the accuracy of imputation. AlphaPhase has been used as part of an imputation pipeline. Existing programs for phasing, have generally scaled poorly to large datasets with long and expensive burden in the computational resources available. Additionally, the increasing production of large sequencing data bundles and its heterogeneity complicate the phasing process. The current version of AlhaPhase implements methods to determine phase using an extended Long Range Phasing and Haplotype Library Imputation. 
Type Of Technology Software 
Year Produced 2016 
Impact The AlphaPhase package is freely available in AlphSuite and includes supporting manual, and access to technical support with the aim of benefiting the academic research community in animal breeding. Since its recent publication in the AlphaSuite, AlphaPhase have been downloaded 5 times. The AlphaPhase program is closely related to AlphaImpute, and is playing a key role in the Innovate UK funded project in collaboration with PIC, Innovate UK, Aviangen Innovate UK and ICBF. 
URL http://www.alphagenes.roslin.ed.ac.uk/alphasuite-softwares/
 
Title AlphaSeqOpt 
Description With improving technologies and decreasing costs, it is now possible and much more informative to collect genomic data by whole genome sequencing. However, sequencing all individuals at high coverage in a large population is not feasible. Instead, we can harness the fact that individuals within a population are related and thus share sections of the genome. If we can identify and sequence individuals that share more of their genome with a large number of individuals in the population then we can pass on the generated sequence data to other individuals that share the same regions of the genome as the sequenced individual, a process known as imputation. AlphaSeqOpt is a software tool that enables researchers and breeders to define a minimal set of individuals that share more of their genomes with a large number of individuals in the population. AlphaSeqOpt also provides the sequencing investment required for a key individual in order to generate accurate and high quality sequence data that can be used to impute sequence for other individuals in the population. 
Type Of Technology Software 
Year Produced 2016 
Impact The AlphaSeqOpt package is freely available in AlphSuite and includes supporting manual, and access to technical support with the aim of benefiting the academic research community in animal breeding, and expect to have the publication accepted in the near future. AlphaSeqOpt is a key element on the Innovate UK funded project with PIC and Innovate UK Aviagen, and ICBF project. 
URL http://www.alphagenes.roslin.ed.ac.uk/alphasuite-softwares/alphaseqopt/
 
Title AlphaSim 
Description One of the fundamental questions in populations dynamic is assessing how changes in the current structure and environment affect the structure composition in both the short and long-term. Plant and animal breeding programs benefits from having a tool to evaluate the potential of different selection strategies or new emerging technologies to improve population performance. Empirical datasets to assess the effect of different factors on one population are difficult to collect, since they require substantial financial and time investments and are subject to noise and error. Simulation is a key tool for both researchers and breeders to assess the impact of different factors given a known historical and current population structure prior to implementation within a real-life setting. AlphaSim is a fast and flexible software tool that enables researchers and breeders to do this. Unlike other simulation tools, AlphaSim has the functionality to manipulate fine details of the population structure in order to simulate realistic scenarios and provides detailed outputs for use in downstream analyses. 
Type Of Technology Software 
Year Produced 2016 
Impact AlphaSim is a freely available software package that simulates genetic population and can assess breeding programs. The AlphaSim package includes a manual, tutorial, and access to technical support with the aim of benefiting the academic research community in animal breeding. This software package has already attracted users from a number of different academic institutions and has supported a number of peer-reviewed academic publications. These publications include: Potential of gene drives with genome editing to increase genetic gain in livestock breeding programs. 2017. Gonen, S, J. Jenko, G. Gorjanc, A.J. Mileham, C.B.A. Whitelaw, J.M. Hickey. Genetics Selection Evolution, 49:3. AlphaSim: Software for Breeding Program Simulation. 2016. Faux A. M., G. Gorjanc, R. C. Gaynor , M. Battagin, S. M. Edwards, D. L. Wilson, Sarah J. Hearne, S. Gonen, and J. M. Hickey. The Plant Genome vol. 9, no.3. AlphaSim is not only used in academic research, but has also attracted industrial collaborations. One such example is our recently awarded Innovate UK grant in collaboration with Driscoll's. 
URL http://www.alphagenes.roslin.ed.ac.uk/alphasuite-softwares/alphasim/
 
Description Big Data in Agriculture, Part of the DuPont Pioneer Plant Sciences Symposia Series, at Roslin Institute, 14-15 May 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Symposium held at the Roslin institute, organised by members of my group, sponsored by third parties from the breeding industry
Year(s) Of Engagement Activity 2018
 
Description Contribution to the New York Time article: Open Season Is Seen in Gene Editing of Animals 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Open Season Is Seen in Gene Editing of Animals was a feature article on gene Editing by Amy Harmon. Professor John Hickey was interviewed as specialist in the Quantitative Genetic field.
Year(s) Of Engagement Activity 2016
URL https://www.nytimes.com/2015/11/27/us/2015-11-27-us-animal-gene-editing.html?_r=0
 
Description Cows in the Computer 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Members from the Hickey group attended the Roslin Open Day and had a interactive activity set up called 'Cows in the Computer'. This was a basic simulation representing a small portion of work the group undertakes.
Year(s) Of Engagement Activity 2017
 
Description John Hickey Guest in Farming Today (BBC Radio 4) 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact On Monday 26th September, The BBC Radio 4 Farming Today had Professor John Hickey as specialist scientist on the subject of breeding programs and scientific impact.
Year(s) Of Engagement Activity 2016
URL http://www.bbc.co.uk/programmes/b07w5xxq
 
Description Lush Visiting Professors and Short Courses in Animal Breeding and Genetics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Dr. JOHN HICKEY from the Roslin Institute (https://www.ed.ac.uk/roslin/about/contact-us/staff/john-hickey) will be a Lush Visiting Professor at Iowa State University in April and May, 2018.

Dr. Hickey will give a 2-day short course May 10 and 11 entitled "Plant and animal breeding - exploiting new technologies in different ways and at different scale". A course synopsis is given at the end of this message.
Year(s) Of Engagement Activity 2018
URL https://www.bcb.iastate.edu/animal-breeding-and-genetics-announces-lush-visiting-professors-and-shor...
 
Description Modern plant and animal applied genomics driven by genotype and sequence data, Universitat Politècnica de Valencia, 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Visiting teaching activity with advanced course in plant and animal breeding.
Year(s) Of Engagement Activity 2018
 
Description Modern plant and animal applied genomics driven by genotype and sequence data, University of Zagreb, Croatia, 17-19 July 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Workshop organised and given by me and two other members of my group.
Year(s) Of Engagement Activity 2018
 
Description Researcher Links workshop at CNRG, INIFAP, Tepatitlán and Guadalajara, Mexico, 3-7 February 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Workshop organised and given by me and the members of my group
Year(s) Of Engagement Activity 2018
 
Description Short course in Evolutionary Quantitative Genetics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact Evolutionary Quantitative Genetics course was a comprehensive review of modern concepts in Evolutionary Quantitative Genetics. The contents of the course are basic statistics, population genetics, quantitative genetics, evolutionary response in quantitative traits, estimating the fitness of traits and mixed models and their extensions. the instructor was Dr Bruce Walsh, Department of Ecology Evolutionary Biology, University of Arizona, and co-author of Genetics and Analysis of Quantitative Traits. The Course was hosted by Professor John Hickey at the Roslin Institute.
Year(s) Of Engagement Activity 2016
URL http://www.alphagenes.roslin.ed.ac.uk/bruce-walsh-visit/
 
Description Short course: The Search for Selection, Roslin Institute, University of Edinburgh 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Description
Biologists are obsessed (indeed, seduced) by the search for signatures of selection in organismal features of interest, ranging from specific traits to genome-wide signatures. A vast number of approaches have been suggested in this search for selection, including genomic-based signatures of recent or ongoing selection, tests based on either excessive amounts or nonrandom patterns of divergence (in both fossil sequences and functional genomics data) and the more classical Lande-Arnold fitness estimates (direct association of phenotypic values with fitness estimates) and their modern extensions (such as aster models). Given the breadth of such searches, a large amount of machinery has been developed, but is rarely presented in a unified fashion. This course presents an integrated overview of all these approaches, highlighting common themes and divergent assumptions.

The goal of this course is to expose investigators from all branches of biology to this rich menagerie of tests, applicable for population geneticists, genome biologists, evolutionary ecologists, paleontologists, functional morphologists, and just about any biologist who ponders on how to formally demonstrate that a feature (or features) of interest might have been shaped by selection.

Intended Audience.
The intended audience is advanced graduate students, postdocs, and faculty with an interest in searching for targets of selection, be they particular genomic sequences or specific traits. Given the breadth of this topic, the material is of interest to students from functional genomics, population and evolutionary genetics, ecology, paleobiology, functional morphology, and statistics (as well as other fields). Background required: some basic introduction to population and/or quantitative genetics.
Year(s) Of Engagement Activity 2018
URL https://wheat.pw.usda.gov/GG3/node/695
 
Description Spoke on BBC Today Show 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Spoke about animal breeding and new technologies on the BBC Radio 4, The Today Programme.
Year(s) Of Engagement Activity 2018
 
Description Teaching course: Next Generation Plant and Animal Breeding Programs, Animal Science Department, University of Nebraska, Lincoln. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Series of the lectures and workshops on Plant and Animal Breeding Programs exploring current practices and future areas
of research. The course was designed and imparted by John Hickey and key members of his team.
Year(s) Of Engagement Activity 2016
URL http://animalscience.unl.edu/next-generation-plant-and-animal-breeding-programs
 
Description The Expert Working Group on Wheat Breeding Methods and Strategies 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Expert Working Group on Wheat Breeding Methods and Strategies seeks to exchange breeding methods research information and germ plasm to expert build capacity and support in wheat breeding programs, with more efficient breeding methods consistent with the latest scientific advances. The EWG is working on activities such us workshops, training courses, communications, and sharing of germplasm and information to reach larger pool of wheat breeders and trained in state-­of-­the-­art breeding methods.
Year(s) Of Engagement Activity 2015,2016,2017
URL http://www.wheatinitiative.org/activities/expert-working-groups/wheat-breeding-methods-and-strategie...