Analysis of quantitative genetic traits in a huge data set

Lead Research Organisation: University of Edinburgh
Department Name: The Roslin Institute

Abstract

The genetic basis of quantitative trait variation and covariation is central to human genetics, evolutionary biology, and plant and animal breeding. In medical genetics many diseases, including schizophrenia, heart disease and cancer, are complex traits with continuous phenotypes and liabilities, which have multiple genome variants contributing genetic variance. In evolutionary biology fitness is largely due to such quantitative traits (e.g., fecundity, longevity). In plant and animal breeding most of the economically important traits are quantitative traits (e.g., milk, meat, and grain yields, environmental footprint, fecundity).

Huge datasets are needed for statistical genomics because many variants (probably thousands), which can be clustered together, contribute to any individual quantitative trait and their effects can combine in complex ways (additive, dominant, epistatic). Moreover, important portions of the genetic variance of quantitative traits are controlled by variants that are rare, have small effect sizes or are highly correlated with other variants. The effects of such quantitative trait variants can only be separated when very powerful statistical models are used in very large data sets.

We will analyse the genetic basis of 25 quantitative traits at the molecular level by creating and analysing a dataset containing genome sequences, pedigrees and trait records of 325000 pigs from the world's biggest commercial breeding programme. The dataset will be created and analysed using imputation and analysis algorithms based on those that we developed to support the breeding programme.

The size of the dataset and the quality of the data will allow us to address three big questions:-
1. Which genome variants control which quantitative traits, how do they control them and how do the multiple variants that control a single trait interact?
2. What kinds of mechanisms cause traits to co-vary? To what extent does pleiotropy and linkage disequilibrium contribute? What is the distribution of the magnitude and sign of joint effects of genomic regions on pairs of traits?
3. To what extent do huge data sets help us address these questions? For the first time we have the technology to generate genome sequence data for hundreds of thousands of individuals at low cost and the computer power to store and analyse such data.

The aim of this project is to harvest scientific benefits from a 15 year billion dollar pig breeding program. Our previous projects asked how statistical genomics helps animal breeding; this project asks how animal breeding helps statistical genomics.

Technical Summary

This project aims to harvest scientific benefits from a 15 year, billion dollar pig breeding program. We will analyse the genome sequences, pedigrees and phenotypes of 325,000 pigs, in order to:-
- Analyse the genetic basis of 25 quantitative traits at the molecular level.
- Explain the covariance between traits.
- Quantify the extent to which huge datasets help us answer these questions
Our objectives are:-
(1)The genetic basis of quantitative traits
For each of the 25 traits we will count the number of quantitative trait variants that can be mapped, analyse how they are distributed across the genome (e.g. randomly or in clusters), which types of variant (e.g., indels or SNPs) control them; and which kinds of genome element (e.g. coding versus noncoding) contain the variants.
We will analyse the interactions between the mapped variants that control each trait to quantify the degree to which they show additivity, dominance, or epistasis. We will also quantify the joint distribution of allele frequencies, ages, and effect sizes, quantify how and by how much the genetic variation changes across many generations. We will quantify the degree to which the contributions to genetic variation differ among the 11 related populations and the 19 generations of our data set and the extent to which they compare with what is known of other species.
(2)What kinds of mechanisms cause traits to co-vary?
We will measure the correlation between traits (locally and genome wide) and identify the extent to which pleiotropy and linkage disequilibrium contribute, the distribution of the magnitude and sign of joint effects of genomic regions on pairs of traits and the degree to which these variants are new or old, common or rare, lie in each type of functional region, and have large or small effect sizes.
(3)We will quantify the extent to which huge data sets, with and without functional annotation information, help us with addressing these questions.

Planned Impact

(i) The academic community. Scientifically, the project constitutes a step change in genetics research because this is the first data set of this scale with whole genome sequence data. As outlined in the section "Academic Beneficiaries" there are several benefits that will accrue to the academic community (animal, plant, human and evolutionary geneticists and other fields that develop and utilise large scale computational methods). This impact will be delivered via publication in journals, presentations at conferences, seminars, and by making data and software available.

(ii) Animal breeding companies, breed societies, and levy boards. The biological insights about quantitative traits will guide these organisations in their efforts to turn genetic variance in traits into response to selection in a way that is sustainable. The quantification of the power of a huge data set combined with functional annotation will guide them in their investments in data for the coming years. The software and scripts that we will use to generate and analyse the data in this project will be made available to these organisations.

(iii) The entire chain of users of pig products. The entire chain of users of pig products, including meat packers, processors, retailers and consumers will benefit because the knowledge generated will equip PIC and other pig breeding companies with tools to deliver a higher quality product, which costs less, and is more environmentally friendly, healthier and suited to individual requirements of stakeholders in the supply chain.

(iv) Plant breeding organisations. The methods, data sets of this scale, and biological insights are also highly relevant to plant breeding organisations. Therefore the benefits to plant breeding organisations, in the developed and developing world, will be similar to those outlined for animal breeding companies, breed societies, and levy boards.

(v) Commercial sequence and genotype providers. Companies providing SNP or sequence data will be able to open up a completely new market based on low cost provision of huge volumes sequence data.

(vi) UK Treasury will benefit from increased tax revenues through increased profitability of PIC, the pork supply chain, other UK agricultural users should they adopt the method, and UK based sequence and genotype providers.

(vii) UK science infrastructure and capacity. The proposed methods and data set will provide a platform for increased R&D capabilities in the UK, maintaining its scientific reputation and associated institutions, with increased capability for sustainable agricultural production. The proposed research will be embedded within training courses that the PI is regularly invited to give, and the post-docs working on the project will have the opportunity to be trained at a world-class institute in a cutting edge area of research while interacting with a leading commercial partner.

(viii) Policy. Sequence data is expensive, but the research and practical benefits are potentially large. Therefore much investment will be made in sequence data in the coming years. The outcomes from this project will guide these investments. This will be particularly relevant for projects such as the Genomics England project which is spending >£300 million to sequence 100,000 individuals.

(ix) Society. All members of society who work to improve or depend upon the competitiveness and sustainability of agriculture will benefit from the downstream practical applications outlined above. The application of the outcomes by breeding organisations will lead to faster and more sustainable genetic progress, leading to healthier food, and food production that is more resource efficient and affordable. Increased efficiencies in agriculture has direct societal benefits in greater food security with less environmental impact. The knowledge will feed into educational programs.

Publications

10 25 50
publication icon
Whalen A (2019) Parentage assignment with genotyping-by-sequencing data. in Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie

 
Description Newton Fund Workshop Brazil
Amount £52,000 (GBP)
Funding ID 228949780 
Organisation British Council 
Sector Charity/Non Profit
Country United Kingdom
Start 04/2016 
End 09/2016
 
Description Newton Fund Workshop Mexico
Amount £37,550 (GBP)
Funding ID 2016-RLWK7-10399 
Organisation British Council 
Sector Charity/Non Profit
Country United Kingdom
Start 04/2017 
End 03/2018
 
Title Development of LCSeq 
Description LCSeq is a system that we are developing which enables sequence information to be generated for animal breeding populations at low-cost. LCSeq recognises that in livestock populations individuals are highly related and thus share long genome segments (haplotypes). Rather than sequencing individuals at high-coverage aims to sequence the haplotypes that are present in the population by spreading the sequencing resources at low-coverage across many individuals. 
Type Of Material Data analysis technique 
Provided To Others? No  
Impact the research is still on-going, however, one of the softwares that will contribute to this model is AlphaSeqOpt, freely available on the AlphaSuite. 
URL http://www.alphagenes.roslin.ed.ac.uk/alphasuite-softwares/alphaseqopt/
 
Description NextGen Breeding project 
Organisation Genus plc
Country United Kingdom 
Sector Private 
PI Contribution The AlphaSuite is a collection of software that we have developed to perform many of the common tasks in animal breeding, plant breeding, and human genetics including genomic prediction, breeding value estimation, variance component estimation, GWAS, imputation, phasing, optimal contributions, simulation, field trial designs, and various data recoding and handling tools.
Collaborator Contribution PIC is providing the DNA sequencing data from approximately 14,000 individual from their genetic nucleus.
Impact At this stage of the collaboration the outputs have not been generated.
Start Year 2015
 
Title AlphaSim 
Description One of the fundamental questions in populations dynamic is assessing how changes in the current structure and environment affect the structure composition in both the short and long-term. Plant and animal breeding programs benefits from having a tool to evaluate the potential of different selection strategies or new emerging technologies to improve population performance. Empirical datasets to assess the effect of different factors on one population are difficult to collect, since they require substantial financial and time investments and are subject to noise and error. Simulation is a key tool for both researchers and breeders to assess the impact of different factors given a known historical and current population structure prior to implementation within a real-life setting. AlphaSim is a fast and flexible software tool that enables researchers and breeders to do this. Unlike other simulation tools, AlphaSim has the functionality to manipulate fine details of the population structure in order to simulate realistic scenarios and provides detailed outputs for use in downstream analyses. 
Type Of Technology Software 
Year Produced 2016 
Impact AlphaSim is a freely available software package that simulates genetic population and can assess breeding programs. The AlphaSim package includes a manual, tutorial, and access to technical support with the aim of benefiting the academic research community in animal breeding. This software package has already attracted users from a number of different academic institutions and has supported a number of peer-reviewed academic publications. These publications include: Potential of gene drives with genome editing to increase genetic gain in livestock breeding programs. 2017. Gonen, S, J. Jenko, G. Gorjanc, A.J. Mileham, C.B.A. Whitelaw, J.M. Hickey. Genetics Selection Evolution, 49:3. AlphaSim: Software for Breeding Program Simulation. 2016. Faux A. M., G. Gorjanc, R. C. Gaynor , M. Battagin, S. M. Edwards, D. L. Wilson, Sarah J. Hearne, S. Gonen, and J. M. Hickey. The Plant Genome vol. 9, no.3. AlphaSim is not only used in academic research, but has also attracted industrial collaborations. One such example is our recently awarded Innovate UK grant in collaboration with Driscoll's. 
URL http://www.alphagenes.roslin.ed.ac.uk/alphasuite-softwares/alphasim/
 
Title AlphaSim GUI 
Description In an effort to improve the accessibility and usability of our AlphaSim software, we have developed a graphical user interface (GUI), which uses the Java runtime environment (JRE). By increasing the usability of our software, we hope that the impact of these programs will be even greater, especially for people where available resources are at a premium. Tthe AlphaSim GUI is freely available on the AlphaGenes webpage, and includes video tutorials, practical exercises and support. 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact Currently we do not have noticeable impacts, as the GUI has been only available for a few months. 
URL http://www.alphagenes.roslin.ed.ac.uk/alphasuite-softwares/alphasim/
 
Title AlphaSimR 
Description AlphaSimR is a next generation software package in the line of our successful earlier package AlphaSim. The new package is accessible in a user-friendly way via an interface in the public domain environment R. The package is used for stochastic simulations of breeding programs to the level of DNA sequence for every individual. Contained is a wide range of functions for modeling common tasks in a breeding program, such as selection and crossing. These functions allow for constructing simulations of highly complex plant and animal breeding programs via scripting in the R software environment. Such simulations can be used to evaluate overall breeding program performance and conduct research into breeding program design, such as implementation of genomic selection. Included is the 'Markovian Coalescent Simulator' ('MaCS') for fast simulation of biallelic sequences according to a population demographic history [Chen et al. (2009)]. 
Type Of Technology Software 
Year Produced 2018 
Impact This package has rapidly expanded our possibilities to apply breeding simulation in research projects, both in academic research projects and for the breeding industry (most notably Driscolles and Bayer). Several graduate students used the package for their internship projects. 
URL https://alphagenes.roslin.ed.ac.uk/wp/software/alphasimr/
 
Description Big Data in Agriculture, Part of the DuPont Pioneer Plant Sciences Symposia Series, at Roslin Institute, 14-15 May 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Symposium held at the Roslin institute, organised by members of my group, sponsored by third parties from the breeding industry
Year(s) Of Engagement Activity 2018
 
Description Contribution to the New York Time article: Open Season Is Seen in Gene Editing of Animals 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Open Season Is Seen in Gene Editing of Animals was a feature article on gene Editing by Amy Harmon. Professor John Hickey was interviewed as specialist in the Quantitative Genetic field.
Year(s) Of Engagement Activity 2016
URL https://www.nytimes.com/2015/11/27/us/2015-11-27-us-animal-gene-editing.html?_r=0
 
Description John Hickey Guest in Farming Today (BBC Radio 4) 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact On Monday 26th September, The BBC Radio 4 Farming Today had Professor John Hickey as specialist scientist on the subject of breeding programs and scientific impact.
Year(s) Of Engagement Activity 2016
URL http://www.bbc.co.uk/programmes/b07w5xxq
 
Description Lush Visiting Professors and Short Courses in Animal Breeding and Genetics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Dr. JOHN HICKEY from the Roslin Institute (https://www.ed.ac.uk/roslin/about/contact-us/staff/john-hickey) will be a Lush Visiting Professor at Iowa State University in April and May, 2018.

Dr. Hickey will give a 2-day short course May 10 and 11 entitled "Plant and animal breeding - exploiting new technologies in different ways and at different scale". A course synopsis is given at the end of this message.
Year(s) Of Engagement Activity 2018
URL https://www.bcb.iastate.edu/animal-breeding-and-genetics-announces-lush-visiting-professors-and-shor...
 
Description Modern plant and animal applied genomics driven by genotype and sequence data, University of Zagreb, Croatia, 17-19 July 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Workshop organised and given by me and two other members of my group.
Year(s) Of Engagement Activity 2018
 
Description Researcher Links workshop at CNRG, INIFAP, Tepatitl√°n and Guadalajara, Mexico, 3-7 February 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Workshop organised and given by me and the members of my group
Year(s) Of Engagement Activity 2018
 
Description Short course in Evolutionary Quantitative Genetics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact Evolutionary Quantitative Genetics course was a comprehensive review of modern concepts in Evolutionary Quantitative Genetics. The contents of the course are basic statistics, population genetics, quantitative genetics, evolutionary response in quantitative traits, estimating the fitness of traits and mixed models and their extensions. the instructor was Dr Bruce Walsh, Department of Ecology Evolutionary Biology, University of Arizona, and co-author of Genetics and Analysis of Quantitative Traits. The Course was hosted by Professor John Hickey at the Roslin Institute.
Year(s) Of Engagement Activity 2016
URL http://www.alphagenes.roslin.ed.ac.uk/bruce-walsh-visit/
 
Description Short course: The Search for Selection, Roslin Institute, University of Edinburgh 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Description
Biologists are obsessed (indeed, seduced) by the search for signatures of selection in organismal features of interest, ranging from specific traits to genome-wide signatures. A vast number of approaches have been suggested in this search for selection, including genomic-based signatures of recent or ongoing selection, tests based on either excessive amounts or nonrandom patterns of divergence (in both fossil sequences and functional genomics data) and the more classical Lande-Arnold fitness estimates (direct association of phenotypic values with fitness estimates) and their modern extensions (such as aster models). Given the breadth of such searches, a large amount of machinery has been developed, but is rarely presented in a unified fashion. This course presents an integrated overview of all these approaches, highlighting common themes and divergent assumptions.

The goal of this course is to expose investigators from all branches of biology to this rich menagerie of tests, applicable for population geneticists, genome biologists, evolutionary ecologists, paleontologists, functional morphologists, and just about any biologist who ponders on how to formally demonstrate that a feature (or features) of interest might have been shaped by selection.

Intended Audience.
The intended audience is advanced graduate students, postdocs, and faculty with an interest in searching for targets of selection, be they particular genomic sequences or specific traits. Given the breadth of this topic, the material is of interest to students from functional genomics, population and evolutionary genetics, ecology, paleobiology, functional morphology, and statistics (as well as other fields). Background required: some basic introduction to population and/or quantitative genetics.
Year(s) Of Engagement Activity 2018
URL https://wheat.pw.usda.gov/GG3/node/695
 
Description Teaching course: Next Generation Plant and Animal Breeding Programs, Animal Science Department, University of Nebraska, Lincoln. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Series of the lectures and workshops on Plant and Animal Breeding Programs exploring current practices and future areas
of research. The course was designed and imparted by John Hickey and key members of his team.
Year(s) Of Engagement Activity 2016
URL http://animalscience.unl.edu/next-generation-plant-and-animal-breeding-programs
 
Description The Expert Working Group on Wheat Breeding Methods and Strategies 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Expert Working Group on Wheat Breeding Methods and Strategies seeks to exchange breeding methods research information and germ plasm to expert build capacity and support in wheat breeding programs, with more efficient breeding methods consistent with the latest scientific advances. The EWG is working on activities such us workshops, training courses, communications, and sharing of germplasm and information to reach larger pool of wheat breeders and trained in state-­of-­the-­art breeding methods.
Year(s) Of Engagement Activity 2015,2016,2017
URL http://www.wheatinitiative.org/activities/expert-working-groups/wheat-breeding-methods-and-strategie...