Development of a high-throughput pipeline to identify causal variants and its demonstration in pig muscle

Lead Research Organisation: University of Edinburgh

Department Name: Roslin Institute

Abstract

This project will increase the effectiveness of commercial livestock breeding programmes by developing a method of identifying causal genomic variants, the individual genome elements that control the traits that breeders need to enhance. Traits like muscling, which is the example we use in the project, are controlled by thousands of causal genomic variants, and breeding selections depend on identifying genome regions that contain a preponderance of beneficial causal variants, without identifying individual variants. If breeders had a method of identifying causal genomic variants, their selections would be more accurate and more precise, and in the future they will be able to use genome editing to accelerate improvement while protecting genetic diversity.

Our method will work as a framework of stages to identify causal variants by evaluating information from different sources. The first stage takes historical breeding information and identifies genome regions with millions of variants that have an equal probability of being beneficial to the trait and an equal, but lower, probability of being deleterious. Each subsequent stage of the framework brings in a new source of information and uses it to adjust the two probabilities for each variant. As the stages proceed, a reducing number of variants emerge with an increasing probability of being causal and beneficial for the trait. Early stages of the framework use information that is already available or easy to collect so that the majority of variants can be rejected without passing to stages where the information is expensive to collect.

In the project we propose to develop the framework and integrate and test four stages including gene-editing of muscle cells in culture. In the future, the framework can be expanded to include new sources of information as they come available.

To be successful the project needs to solve three problems:-
1. We need a computational framework to integrate information from different sources and identify putative causal variants.
2. We need to test putative causal variants by gene-editing muscle cells in culture.
3. We need to evaluate the framework in a real breeding program.

The project will develop an "Allele Testing" framework for breeding programmes by integrating:
- Sequence data and phenotypes on 375,000 pigs from a recently concluded project of ours;

- Functional genomic and expression data that is publicly available, or which we have generated in a Roslin funded Pump Priming Project or will collect in this proposed project;

- Data from gene-editing of cultured muscle cells to be collected in the proposed project.

The project has three objectives, as follows:-
1. We will develop a genomics pipeline that integrates; GWAS, expression quantitative trait loci (eQTL) and functional annotation into a ranked list of putative causal variants, using a suite of statistical and bioinformatic methods.

2. We will use gene editing to introduce putative causal genomic variants into a pig in vitro cell system for detection of a cell phenotype.

3. We will validate the "Allele Testing" framework by predicting genomic breeding values for a set of validation pigs, with and without the information on these putative causal genomic variants discovered by the "Allele Testing" framework, followed by comparing the accuracy of both sets of genomic breeding values by correlating them to progeny test records for the validation pigs.

Technical Summary

This project will increase the effectiveness of commercial livestock breeding programmes by developing an "Allele-Testing" framework to identify causal genomic variants that control economically important traits, using pig muscling as an exemplar trait. The framework will act as a multi-stage filter to identify causal variants by integrating information from different sources. As the stages proceed, a reducing number of variants emerge with an increasing probability of being causal and beneficial for the trait. In the project we propose to develop the framework and integrate and test four stages including genome-editing in vitro. In the future, the framework can be expanded to include new sources of information as they come available.

The project will develop an "Allele Testing" framework for breeding programmes by integrating:

- Sequence data and phenotypes on 375,000 pigs from a recently concluded project of ours;

- Functional genomic and expression data that is publicly available, or which we have generated in a Roslin funded Pump Priming Project or will collect in this proposed project;

- Data from gene-editing of cultured muscle cells to be collected in the proposed project.

The project has three objectives, as follows:

1. Develop a genomics pipeline that integrates; GWAS, expression quantitative trait loci (eQTL) and functional annotation into a ranked list of putative causal variants, using a suite of statistical and bioinformatic methods.

2. Use gene editing to introduce putative causal genomic variants into a pig in vitro cell system for detection of a cell phenotype.

3. Validate the "Allele Testing" framework by predicting genomic breeding values for a set of validation pigs, with and without the information on these putative causal genomic variants discovered by the "Allele Testing" framework, followed by comparing the accuracy of both sets of genomic breeding values by correlating them to progeny test records test records.

Planned Impact

The primary goal of this project is to increase the effectiveness of livestock breeding programmes. Our direct links with the pig breeding industry means that the outcomes of this project, if successful, will be immediately translated into practice with a positive economic impact on 25% of the "technified" global pork industry.

There will also be a downstream beneficial impact for the scientific community via the tools and knowledge developed within the project, particularly contributions to the fields of animal breeding, plant genetics, medical science, quantitative genetics, computational genomics, gene editing and cell biology. All of these fields would benefit from tools to discover causal variants for quantitative traits. Finally, the general public and policy makers will benefit from improved efficiency and sustainability of commercial pig production, and the project is likely to have wider ranging impacts on the production of other livestock species.

Animal breeding companies, breed societies, and levy boards: Tools that increase the efficiency of livestock breeding programmes will make it possible to breed better production animals that are healthier and have better welfare. The software and scripts that we will use to generate and analyse data, and the causal genomic variants that we identify in this project, and the editing tools and cell systems will be made available to these organisations.

Users of animal products: The entire chain of users of pig products, including meat packers, processors, retailers and consumers will benefit from Genus and other breeding companies being equipped with tools to increase the effectiveness of commercial livestock breeding programmes. These tools will allow them to deliver a lower cost, higher quality product, that is more environmentally friendly, healthier and suited to individual requirements of stakeholders in the supply chain.

UK Treasury: Will benefit from increased tax revenues through increased profitability of Genus and other UK adopters, the pork supply chain, other UK agricultural users should they adopt downstream products of breeding, and UK based providers of sequencing, genome editing, and cell biology technology.

UK science infrastructure and capacity: The proposed methods and data set will provide a platform for increased R&D capabilities in the UK, maintaining its scientific reputation and associated institutions, with increased capability for enhancing sustainable agricultural production. The proposed research will be embedded within external training courses that the PI is regularly invited to give, and the post-doc working on the project will have the opportunity to be trained at a world-class institute in a cutting-edge area of research while interacting with a leading commercial partner.

Plant genetics, medical genetics and other fields of genetics: Plant genetics (for breeding), medical genetics (to aid drug discovery and personalized medicine) and other fields of genetics such as evolutionary biology (to understand the evolution of natural populations) all would benefit from knowledge of causal variants, the finding of which has been somewhat intractable historically. The Allele Testing framework, if successful, could help each of these fields.

Society and education: All members of society who work to improve or depend upon the competitiveness and sustainability of agriculture will benefit from the downstream practical applications outlined above. The application of the outcomes by breeding organisations will lead to faster and more sustainable genetic progress, leading to healthier food, and food production that is more resource efficient and affordable. Increased efficiencies in agriculture has direct societal benefits in greater food security with less environmental impact. The knowledge will feed into local undergraduate and graduate programs and public engagement programmes at the Easter Bush Outreach Centre.

Funded Value:

£746,468

Funded Period:

Jan 21 - Aug 24

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/T014067/1

Principal Investigator:

Gregor Gorjanc

Richard Mellanby

John Hickey

Research Subject:

Animal science (99%)

Research Topic:

Livestock production (99%)

Organisations

People	ORCID iD
Gregor Gorjanc (Principal Investigator)	http://orcid.org/0000-0001-8008-2787
Richard Mellanby (Principal Investigator)
John Hickey (Principal Investigator)
Francesc Donadeu (Co-Investigator)
Melissa Kaye Jungnickel (Researcher Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Baumdicker F (2022) Efficient ancestry and mutation simulation with msprime 1.0. in Genetics

Desire S (2023) A genome-wide association study for loin depth and muscle pH in pigs from intensely selected purebred lines. in Genetics, selection, evolution : GSE

Fortuna G (2024) Accounting for the nuclear and mito genome in dairy cattle breeding-A simulation study in JDS Communications

Gozalo-Marcilla M (2021) Genetic architecture and major genes for backfat thickness in pig lines of diverse genetic backgrounds. in Genetics, selection, evolution : GSE

Johnsson M (2021) Genetic variation in recombination rate in the pig. in Genetics, selection, evolution : GSE

Johnsson M (2021) Evidence for and localization of proposed causative variants in cattle and pig genomes. in Genetics, selection, evolution : GSE

Johnsson M (2024) Building in vitro tools for livestock genomics: chromosomal variation within the PK15 cell line. in BMC genomics

Lehmann B (2025) On ARGs, pedigrees, and genetic relatedness matrices

Mafra Fortuna G (2023) Short communication: Accounting for nuclear and mito genome in dairy cattle breeding - a simulation study

Pocrnic I (2022) Optimisation of the core subset for the APY approximation of genomic relationships

Key Findings
Further Funding
Collaboration
Software and Technical Products
Engagement Activities


Description	We have analysed associations between phenotypic variation and genetic variation along the pig genome. By doing this, we have mapped genome regions of interest. We have further fine-mapped genetic variation in these regions to highlight potential genomic variants (alleles) that could be genome-edited to test causality and use in selective breeding. The substantial linkage-disequilibrium in these regions is highlighting many potential variants. To complement this work we have done two more lines of work. First, from routinely obtained muscle tissues in the collaborators' breeding programme, we have obtained gene expression data and conducted an eQTL study. Second, we have done genome editing of PK15 pig cell line in the most promising region to find promising/causal loci impacting complex traits in pigs related to muscle and fat growth and efficiency.
Exploitation Route	We are collaborating with leading pig breeding programmes and other public and private animal and plant breeding organizations for other species. As such, the results of this project can pave the way towards practical application through our academic and industry engagement.
Sectors	Agriculture Food and Drink


Description	Genetics and breeding of Taurine-Indicine crossbred dairy cattle
Amount	£105,000 (GBP)
Organisation	East of Scotland BioScience (EastBio)
Sector	Charity/Non Profit
Country	United Kingdom
Start	08/2021
End	08/2025


Description	Pig breeding with Genus PIC
Organisation	Genus plc
Country	United Kingdom
Sector	Private
PI Contribution	Optimising breeding programme
Collaborator Contribution	Supplying data and knowledge
Impact	Collaboration just begun
Start Year	2018


Title	AlphaSuite of software for data science, genetics, and breeding
Description	AlphaSuite of software for data science, genetics, and breeding available from https://github.com/AlphaGenes The major tools include: * AlphaSimR for simulation of breeding programmes https://github.com/AlphaGenes/AlphaSimR * AlphaBayes for estimation of SNP effects on phenotype https://github.com/AlphaGenes/AlphaBayes * AlphaAssign for finding progeny-parent (pedigree) relationships https://github.com/AlphaGenes/AlphaAssign * AlphaPhase for phasing and imputation of SNP array genotype data https://github.com/AlphaGenes/AlphaPhase * AlphaImpute for phasing and imputation of SNP array genotype data https://github.com/AlphaGenes/AlphaImpute * AlphaImpute2 for phasing and imputation of SNP array genotype data (version 2) https://github.com/AlphaGenes/AlphaImpute2 * AlphaPeel for genotype calling, phasing, and imputation in pedigreed populations https://github.com/AlphaGenes/AlphaPeel * AlphaFamImpute for genotype calling, phasing, and imputation in families https://github.com/AlphaGenes/AlphaFamImpute * AlphaPlantImpute for phasing and imputation in plant populations (version 2) https://github.com/AlphaGenes/AlphaPlantImpute * AlphaPlantImpute2 for phasing and imputation in plant populations (version 2) https://github.com/AlphaGenes/AlphaPlantImpute2 * AlphaMate for balancing selection and management of genetic diversity in breeding programmes https://github.com/AlphaGenes/AlphaMate * AlphaPart for analysing trend in genetic means and variances https://github.com/AlphaGenes/AlphaPart
Type Of Technology	Software
Year Produced	2018
Open Source License?	Yes
Impact	AlphaSuite is used by leading public and private animal and plant breeding programmes that supply genetics worldwide in the Global North and Global South.
URL	https://github.com/AlphaGenes


Description	Data-Driven Breeding and Genetics course (2 weeks) on-line
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The principles of animal and plant breeding are increasingly coalescing due to advances in technology and increasing demands and opportunities for agriculture. This two-week graduate level course of integrated lectures and practicals is designed to equip students, academics, and practitioners with theoretical and applied knowledge, skills and tools to design, optimise, and deploy Data Driven Breeding and Genetics techniques for Animals and Plants. It was jointly delivered by scientists and teachers from the University of Edinburgh and colleagues from the Swedish University of Agricultural Sciences and the CGIAR's Excellence in Breeding Platform, with guest lectures from various academic and industry collaborators. Due to the pandemic the course took place in virtual format from the 20th Sep and 1st Oct 2021. The course lectures were pre-recorded to enable asynchronous worldwide delivery. Course participants engaged with the lectures and practicals at their own pace. They engaged with course instructors and other participants via Slack and daily Zoom sessions (one in the UK morning and one in the UK afternoon time). Day 1 - Introduction to breeding Welcome and Introduction (Gregor Gorjanc) Introduction to breeding programme modelling (Gregor Gorjanc) AlphaSimR MOOC - Introduction (Gregor Gorjanc) AlphaSimR MOOC - Relationship between DNA & traits (Gregor Gorjanc) R crash course on using ggplot and tidyverse (Thiago Paula Oliveira) The role of livestock in global food security (Geoff Simm) Day 2 - Breeding programme design AlphaSimR MOOC - DNA lottery (Gregor Gorjanc) AlphaSimR MOOC - Response to selection (Gregor Gorjanc) AlphaSimR MOOC - Modelling complex breeding programmes (Gregor Gorjanc) How does a major multinational animal breeding programme operate in the 21st century (Andreas Kranis) How does a major multinational plant breeding programme operate in the 21st century (Brian Gardunia) Day 3 - Genomic data in breeding Genomic data, SNP array genotyping and sequencing, and Strategies to generate genomic data in breeding programmes (Gregor Gorjanc) Phasing genomic data with heuristic and probabilistic methods (Gregor Gorjanc) Imputation of genomic data (Gregor Gorjanc) AlphaPeel practical - probabilistic genotype calling, phasing, and imputation of genomic data in pedigreed populations (Jana Obsteter) AlphaImpute2 practical - fast phasing and imputation (Jana Obsteter) AlphaFamImpute practical - genotype calling, phasing, and imputation algorithm for large full-sib families (Jana Obsteter) AlphaAssign practical - parentage assignment (Jana Obsteter) Breeding in aquaculture (Ross Houston) Tea breeding and a genomic selection outlook (Nelson Lubanga) Day 4 - Modelling phenotype data to estimate environmental effects Introduction to experimental design of field trials (Daniel Tolhurst) Introduction to linear mixed models for plant breeding (Daniel Tolhurst) Analysis of phenotype data, including data collected from i) single field trials (with spatial) and ii) field trials across multiple (Daniel Tolhurst) ASReml practicals (Daniel Tolhurst & Thiago Paula Oliveira) Overview of forest tree breeding (Jaroslav Klapste) Genomic selection provides new opportunities for intercrop breeding (Jon Bancic) Day 5 - Population and Quantitative genetics for breeding Introduction to population and quantitative genetics for breeding (Martin Johnsson) Change in frequencies with drift (Martin Johnsson) Change in frequencies with mutation, migration and selection (Martin Johnsson) Additive effects (Martin Johnsson) Non-additive effects (Martin Johnsson) Inbreeding depression and heterosis (Martin Johnsson) Practicals (Martin Johnsson) Genetic evaluation in a multinational plant breeding programs AND/OR CGIAR Excellence in Breeding platform (Eduardo Covarrubias-Pazaran) Roadmap for black soldier fly breeding (Leticia de Castro Lara) Day 6 - Quantitative genetics for breeding II Variance, covariance, correlation and heritability (Eduardo Covarrubias-Pazaran) Correlated response to selection (Eduardo Covarrubias-Pazaran) Recurrent selection strategies (Eduardo Covarrubias-Pazaran) Practicals (Eduardo Covarrubias-Pazaran) National breeding programme for the Norwegian Red dairy cattle (Janez Jenko) Breeding a man's best friend (Joanna Ilska) Day 7 - Modelling phenotype data to estimate genetic effects Genetic evaluations with focus on pedigree-based BLUP (Ivan Pocrnic) Introduction to genome-wide association studies (Ivan Pocrnic) Genomic evaluations (Ivan Pocrnic) Practicals (Ivan Pocrnic) A multipart breeding strategy for introgression of exotic germplasm in elite breeding programs using genomic selection (Irene Breider) Population genetics tools with perspective in dog research (Mateja Janes) Day 8 - Sustainable breeding Breeders' dillema Optimal contribution selection Optimal cross selection AlphaMate practical - optimising selection, management of diversity, and mate allocation in breeding programs A walk-through of three examples AlphaPart - quantifying the drivers of genetic change (Jana Obsteter & Thiago Paula Oliveira) Recursive models in animal breeding (Maria Martinez Castillero) Economic objectives in animal and plant breeding (Cheryl Quinton) Day 9 - Exploiting modern technologies in breeding programmes The role of reproductive technologies to boost animal breeding (Gabriela Mafra Fortuna & Gerson Oliveira) Breeding for disease resistance in animals (Andrea Doeschel-Wilson) Editing livestock genomes (Simon Lillico) Evaluating the use of gene drives to limit the spread of invasive populations (Nicky Faber) The potential of genome editing and gene drives for improving complex traits (Gregor Gorjanc) Day 10 - Open-ended work on topics of participants' interest
Year(s) Of Engagement Activity	2021


Description	HighlanderLab Twitter channel
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The HighlanderLab updates the scientific community and a broader audience about news around our research group, scientific output and engagement activities - on management and improvement of populations using data science, genetics, and breeding.
Year(s) Of Engagement Activity	2019,2020,2021,2022
URL	https://twitter.com/HighlanderLab


Description	HighlanderLab website
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The HighlanderLab updates the scientific community and a broader audience about news around our research group, scientific output and engagement activities - on management and improvement of populations using data science, genetics, and breeding.
Year(s) Of Engagement Activity	2021,2022
URL	http://www.ed.ac.uk/roslin/HighlanderLab


Description	Massive Online On-demand Course on Modelling breeding programmes using AlphaSimR
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Breeding programmes are key to the genetic improvement of plant varieties and animal breeds used in agriculture. This unique course shows how to model an existing or new breeding programme and the evaluation of alternative breeding scenarios.The course is free and lasts for 5 weeks. https://www.edx.org/course/breeding-programme-modelling-with-alphasimr
Year(s) Of Engagement Activity	2022,2023
URL	https://www.edx.org/course/breeding-programme-modelling-with-alphasimr