Effect and interaction of mutation and recombination in the dynamics of genetic variation in modern chickens

Lead Research Organisation: University of Edinburgh
Department Name: The Roslin Institute

Abstract

Resolving how genetic variation is maintained in closed populations under directional selection is one of the fundamental, yet still open questions in quantitative genetics. Addressing this question has also vital implications for the sustainability of genetic improvement programmes that underpin global staple food production systems. Considerable research effort has focused on studying how selection reduces genetic variation and how its footprint can be identified in the genome. Little effort has been placed to study how genetic variation is de-novo created and shuffled across generations. The reason is that until now, it was not feasible to track mutation, the evolutionary force creating new genetic variation, on the large-scale that analysis required, due to technological and economic constraints. The only option available to geneticists was to rely on low precision estimates for the mutation rate, since they mostly arose from small scale studies in experimental populations. These parameters then would be used in models aiming to describe the dynamics of genetic variation, which were based on many theoretical assumptions due to lack of data on the observed rate of mutation. The uncertainty of the parameters used, combined with our limited understanding of the interplay of forces on genetic variation, reduce the power of our current models to capture the underlying processes and therefore, explain why "there is still no clear resolution on the evolutionary forces responsible for the maintenance of variation", as one of the gold standard textbooks in Quantitative Genetics (Lynch and Walsh, 2018) emphatically states.

The main hypothesis of this proposal is that the rate of creation and shuffling of genetic variation can counterbalance the effects of selection, genetic drift and inbreeding in finite closed populations, such as the elite breeding populations. Our goal is to take a data-driven approach to quantify with unprecedented accuracy the mutation and recombination processes that could explain how genetic variation appears to be maintained in closed populations that are under intense directional selection. To achieve this aim, we will use a unique genomic repository consisting of hundreds of thousands of birds (either sequenced in high-depth or genotyped with SNP arrays) extending over 25 generations with full pedigree and phenotypic information to study how genetic variation is maintained over time. This dataset gives us the required power to precisely estimate important parameters, such as how many mutational events occur per generation, where they tend to occur across the genome, and how the new genetic variation due to mutation balances its removal. As our data extend over many generations, we will have the power to track both mutations and chromosomal segments through the pedigree in order to remove false positive observations that can inflate estimates. Our study will also focus on quantifying recombination, as the force shuffling genetic variation, and we will explore its interplay with mutation. Finally, we will apply our more precise estimates to update theoretical expectations on where selection limits may lay and thus we will aim to reconcile theoretical predictions and observations. Beyond the potential to address the key question of how genetic variation is maintained, our findings could be directly applicable across animal and plant breeding organisations to improve the resilience and sustainability of their genetic improvement programmes.

Technical Summary

The question of what determines the observed levels of genetic variation is central to quantitative genetics, yet it remains unresolved. According to theory, directional selection reduces this variation, yet observations from populations selected over a long time suggest that variation is largely maintained.

This project aims to precisely estimate the effect of the forces generating (mutation), and modifying (recombination) genetic variation using a dataset comprising hundreds of thousands of chickens from a commercial breeding programme that are sequenced or genotyped. As in birds, there is evidence that mutation and recombination rates are higher than in mammals, focusing on these forces can help to explain how they counterbalance the effect of directional selection of reducing genetic variation.

The overarching goal of the project is to reconcile the observed high levels of genetic variation with the theoretical expectations. To that effect, we will establish the rate and the variance explained by mutation as the force that creates new genetic variation using fully connected trios of sequenced animals spanning five generations on average. We will then phase all genotyped animals to estimate the recombination rate with high precision and pinpoint to the genetic architecture of its control. We will use recombination profiles to track haplotypes over fifteen generations to investigate and monitor how their preservation is related to changes in genetic variation. Finally, we will estimate the effect of selection on genetic variation and focus on specific genomic regions to monitor the changes in diversity with the ultimate objective to update theoretical expectations on where selection limits may lay.

This project has clear potential in producing a step change in addressing fundamental questions in genetics and its outcomes can be applied across animals and plants to improve the resilience and sustainability of their genetic improvement programmes

Planned Impact

This research underpins a variety of stakeholders in academic, commercial and public sector as outlined below.

1. The UK and international academic community.
Scientists from both across and within disciplines such as:
1.1 Quantitative genetics and animal and plant breeding
1.2 Wider genetics and genomics community, including evolutionary, conservation and human genetics, genome biology, poultry science
1.3 Bioinformatics, Data Science and Computational Biology
2. Animal breeding companies and societies, and levy boards such as HolsteinUK, EFFAB
3. The entire broiler supply chain including multipliers, farmers, processors, retailers and consumers
4. Other livestock and plant supply chains, e.g. ducks, turkeys, salmon, dairy
6. Policy makers, including regulators, domestic and international development agencies (e.g. FAO)
7. Third sector such as organisations and charities for preserving endangered species
8. General Public

Our research will drive future innovation and societal impact, by delivering benefits, including:

1. Field-advancing knowledge. This proposal aims to address previously intractable questions on the fundamental mechanisms involved in the maintenance of genetic diversity. We expect its outcomes, were to be successful, to precisely estimate and predict how frequent mutation and recombination are, where in the genome these are more likely to occur and how much they contribute to the maintenance of genetic diversity in chickens.

2. Applied research. Our approach could revolutionise breeding. With the improved understanding of the forces involved in the creation and shuffling of genetic diversity, it would be possible to refine control of gene flow to secure diversity without compromising genetic progress in order to balance short and long term benefits.

3.UK company productivity. This project has the potential to bring operational changes in breeding programmes and improve their efficiency. Thus, this will enable UK based breeding companies, such as Aviagen and Cherry Valley Farms to maintain their world leading position by strengthening their R&D investment and opening up new opportunities in the knowledge economy.

4. Initiate collaborations and strengthen the UK science base. We will strive to set a collaborative platform between breeding companies in different species and a UK-based research center that has pioneered the adoption of genomics in livestock. Successful outcomes of this project can drive a culture of open innovation and forge investment from the private sector and increase R&D capabilities in the UK, maintaining its scientific reputation and increased capability for sustainable agricultural production.

5. Sustainability and resilience of food supply chains. By better managing and exploiting their genetic resources, breeders will be able to accelerate genetic progress and tailor their products to meet the diverse requirements of their customers emanating from different environmental, market and societal conditions and adapt to ever changing conditions. Consequently, the downstream actors of the supply chain will directly benefit from higher quality products, which cost less, have smaller environmental footprint and are overall better suited to their individual requirements. Most importantly, the optimal utilisation of genetic resources in short-term selection will enhance the long-term sustainability of the supply chain.

6. Evidence based policy. Equipped with insights on management of diversity, policymakers can make informed decision influencing regulations.

7. Broader societal benefits including food security and environmental preservation, e.g. review protocols for conservation programmes to reflect the insights on the dynamics of diversity.

8. Training. The PDRA will be trained in a cutting edge area of research, while interacting with other scientists in a world-leading research environment.
 
Description One of our hypothesis is that mutation is contributing to the maintenance of genetic variance over time. The results of our analysis provided evidence in support of this hypothesis. More specifically, it has been shown that although the contribution of new mutations to the total genetic variance is small, over time these are accumulating and thus, their cumulative effect becomes considerable over a longer period. Our results have shown that during the period of study (25 generations), although the additive genetic variance has been decreasing, this was compensated by the corresponding increase of mutational variance and thus, the overall genetic variance has been stable across the overall period
Exploitation Route The contribution of mutation over time could be an important factor to consider on the sustainability of breeding programs. Our results support the hypothesis that mutation can effectively compensate for loss of variation.
Sectors Agriculture, Food and Drink

 
Description BBSRC FTMA: Towards integrating population genetics analysis of breeding cohorts with functional annotation of genomic features
Amount £23,500 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 11/2020 
End 03/2021
 
Title Correction of genotyping errors 
Description Although the error rate of genotyping using SNP arrays is low (e.g. 1-2%), it may have an impact in phasing accuracy. Such phasing errors can increase the number of detected cross-overs and thus, inflating the estimated recombination rate. However, in pedigreed population, some of the genotyping errors can be corrected, when using the rules of Mendelian inheritance and genotyping information from ancestors and progeny of focal individuals. We thus, have developed a new algorithm to highlight inconsistent SNP genotypes and if possible to correct errors with the most likely genotype. 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? Yes  
Impact Fixing genotyping errors prior to phasing and imputation can improve both their accuracy and the detection of cross-overs. Combined this tool with the cross-over detection can considerably reduce the number of false detection of recombination and reduce the number of outliers. 
URL https://github.com/andreaskranis/genofix
 
Title Detection of hotspots 
Description Reliable detection of recombination hotspots is pivotal to accurately quantify recombination. This is a key milestone of the project, as refining the methodology and the precision of localising crossovers will enable to explore the genetic basis of recombination. We have used extended simulations to fine tune the sensitivity of the detection algorithm and benchmark performance in the presence of phasing and genotyping errors. The next step is to use real data to accurately quantify cross over 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? No  
Impact We aim to make the algorithm available as an open source python library. The program will accept as input phased genotypes and relevant pedigree and output the number of crossover observed in each progeny. This will provide an accessible tool to the wider community to easily quantify recombination using genomic datasets. 
 
Title Estimation of heritability of recombination 
Description One of the main hypothesis of this grant was that recombination in chickens was under genetic control. We have performed a large-scale analysis and have been able to validate this hypothesis. Our results suggest a moderate to high heritability for the number of crossovers (>0.2). Further analyses will follow that will focus on identifying genomic loci that control the genetic variation and multi-trait analyses to determine genetic correlation with other traits. This will help us to address the key question of how genetic variance in maintained despite the finite effective population size and the intense directional selection. 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact Abstract submitted to the EAAP Annual Meeting. A manuscript is in preparation to be submitted to a peer-reviewed journal 
 
Title Estimation of mutational variance 
Description We performed an analysis with unprecedented scale to estimate the contribution of mutations in genetic variance in chickens over a period of 25 generations. Due to the size of the data, custom programs were written to contract the mutational relationship matrix. The analysis has confirmed the maintenance of genetic variance, but also has shown how the accumulation of mutation compensates for the loss of additive genetic variance. 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact Our analysis is the first in chicken breeding populations that are under strong directional selection. The results provide evidence to support our hypothesis that the role of mutation is important in the maintenance of genetic variation. One of the barriers to conduct such large scale analyses was the complexity of constructing the mutational relationship matrix. To overcome this, we will make available our custom programs. To this end, a manuscript is in preparation and will be submitted to a peer-reviewed journal. 
 
Title Implementation and evaluating of approaches to probabilistic pedigree correction by genotyping 
Description A reliable pedigree is a prerequisite for precision in phasing (haplotype assignment) and the accurate identification of mutation and recombination sites. Commercial chickens are bred in cohorts which means there are constrained probabilities for a pool of possible sires and dams for each offspring. We implemented a python testing framework to evaluate common approaches to the challenging problem of pedigree correction by genotype using both simulated and real pedigrees. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? No  
Impact It informs our quality control process and will improve the quality and reliability of the downstream analysis. It will also be of wider interest to the community for pedigree QC. 
 
Title Insight on the genomic structure of the studied population 
Description By using the software LDMAP (https://www.southampton.ac.uk/genomicinformatics/research/ld.page), Linkagen Disequilibrium maps were generate using 50K SNP arrays. The LD maps are giving insights in the patterns of recombination in the studied population. The general patterns are in agreement with previous studies using broiler data (Pengelly et al. (2016) Heredity, 117 (5), 375-382). These results will be useful for the next phases of the project, where a detailed map of the recombination will be generated. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? No  
Impact The results ensure that the analysis can be done in an efficient manner. This could be a useful tool to monitor changes over time in select populations. 
 
Title Simulation program for testing phasing strategies 
Description A simulation program was written to generate haplotype data. The simulated data will be used for evaluating and testing different phasing programs. An important feature of the program is that it allows to use the same SNPs as real-life arrays. Furthermore, the program allows to use real pedigree to remain as close as possible to the real data. This has been confirmed by comparing the simulated and real genotypes (allelic frequencies and genomic relationships were the same) 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? No  
Impact Once the testing phase is complete to accommodate other species, source code will become avaialbe with a permissive licence in GitHub. 
 
Title Directed knowledge graph of public livestock genomics data 
Description We built an integrated knowledge graph for livestock genomics around common use cases (postGWAS, RNASeq, ChIPSeq etc.). Our use case required fast query and traversal over large graphs (tens of millions of nodes) and we identified that the ReddisGraph 2.0 implementation of a directed graph which is built on the Redis noSQL database could support this. Data were semantically modeled to support linkage and query traversal between genetics and genomics (e.g. genetic variation, quantitative genetics, epigenetics, transcriptomics, QTL traits) and molecular functional annotation (e.g. GO terms, pathways, protein-interactions). 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? No  
Impact The knowledgebase supports our postGWAS analysis pipelines and is being incorperated into our ongoing quantitative genetics and genomics research. 
 
Description Research Agreement with Aviagen 
Organisation Aviagen
Country United Kingdom 
Sector Private 
PI Contribution The collaboration agreement between the University of Edinburgh and Aviagen Ltd has been signed. The collaboration determines the management of IP and ensures the appropriate management of foreground IP.
Collaborator Contribution Both partners have agreed the optimal terms to ensure a robust framework for the continuation of the project without barriers.
Impact Project is continuing.
Start Year 2020
 
Title capridb: A knowledge-graph database to Collect And PrioRitise Information 
Description This is python software for integrating genetics and genomics data into a knowledge-graph. This knowledge graph can be used to support genomics analysis such as post-GWAS functional annotation. 
Type Of Technology Software 
Year Produced 2020 
Impact Capridb allows us a flexible environment to integrate and query data, such as quantitative livestock traits, functional genomics, and pathways, which are pertinent to the impact of genetic variation and recombination. 
 
Title fixPed: a python library for correcting pedigree with genotyping data 
Description A python library for statistical evaluation of pedigree accuracy based on genotyping data. The library is able to simulate diploid genotypes and pedigrees as well as evaluate real pedigrees data. 
Type Of Technology Software 
Year Produced 2020 
Impact The library was developed to enable quality control of breeding pedigrees in order to improve accuracy in phasing, and more reliable identification of mutation and recombination sites. 
 
Title quiicksim: a python library to simulate genome evolution within a breeding program 
Description The program enables to track recombination (optionally under genetic control). It has specific provisions to incorporate real data (genotyped, haplotypes, breeding values, pedigree) to bring the simulation closer to specific real-life scenarios. The library is designed to be flexible and extensible. As the software is written in python a popular and widely used programming language, it allows users to create highly customisable scenarios with little effort 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact Used to develop various simulation scenarios for genofix, our program to correct genotyping errors in pedigreed populations (paper was submitted in World Congress of Quantitative Genetics in Livestock). The library is also used to support an ongoing MSc thesis supervised by Andreas Kranis and in collaboration with the University of Valencia 
URL https://github.com/andreaskranis/quickgsim