Improving livestock production through high-throughput identification of functional regulatory variation
Lead Research Organisation:
University of Edinburgh
Department Name: The Roslin Institute
Abstract
In the past several decades there has been a substantial global investment to try and map the regions of livestock genomes that control production, disease tolerance and welfare phenotypes. The ultimate aim of mapping these DNA regions is so that researchers can then use advanced genomics and breeding approaches to more rapidly improve the production and welfare of livestock. Although many important loci have been mapped, most often we do not know which precise genetic changes in these regions are linked to the observed differences in phenotypes, making it more difficult to apply advanced approaches such as gene editing to improve these traits. However, studies in cattle have estimated that variants that alter downstream phenotypes are over 18 times more likely to do so by leading to changes in transcription, i.e. the expression level of genes, than is expected by chance (Nat Genet 50, 362-367 (2018)). If we can map which variants directly impact expression levels, we can determine which genetic changes are most likely driving the observed changes in key traits. This will consequently substantially improve the rate at which we can improve important livestock phenotypes.
In this project we will apply a high-throughput approach that directly tests the impact on gene expression of millions of genetic changes at the same time. This will allow us to generate a catalogue of cattle functional variants directly linked to changes in transcription, and which may therefore underlie loci linked to important traits. However, we will also take this further, and test the impact of human genetic changes when in cattle cells as well as vice versa. Certain species are much better annotated with richer datasets than others, and we will use these data to determine which features are linked to genetic variants that impact gene regulation across species. Using these data and machine learning approaches we will develop statistical models that will allow researchers to predict which genetic changes will likely have an impact across species. This will allow researchers to exploit the data in better characterised species to improve less well annotated ones, further accelerating livestock improvement efforts but also potentially, for example, informing human disease studies that are based on animal models.
Consequently, this project is expected to substantially improve the understanding of both cattle and human phenotypes by mapping regulatory variants and developing statistical models for predicting variants that impact transcription across species.
In this project we will apply a high-throughput approach that directly tests the impact on gene expression of millions of genetic changes at the same time. This will allow us to generate a catalogue of cattle functional variants directly linked to changes in transcription, and which may therefore underlie loci linked to important traits. However, we will also take this further, and test the impact of human genetic changes when in cattle cells as well as vice versa. Certain species are much better annotated with richer datasets than others, and we will use these data to determine which features are linked to genetic variants that impact gene regulation across species. Using these data and machine learning approaches we will develop statistical models that will allow researchers to predict which genetic changes will likely have an impact across species. This will allow researchers to exploit the data in better characterised species to improve less well annotated ones, further accelerating livestock improvement efforts but also potentially, for example, informing human disease studies that are based on animal models.
Consequently, this project is expected to substantially improve the understanding of both cattle and human phenotypes by mapping regulatory variants and developing statistical models for predicting variants that impact transcription across species.
Technical Summary
Livestock research benefits from the fact that if functional variants can be identified they can be readily acted upon via breeding and genome editing. Modern massively parallel reporter assays (MPRA) have the potential to bridge the current substantial gaps between mapping genomic loci and identifying actionable variants. By cataloguing the impact of genetic variants on transcription levels on a genome-wide scale, it is possible to identify variants with a direct impact on transcription, something generally not possible with traditional eQTL studies. In this project we propose to apply the SuRE MPRA approach to cattle for the first time, to assess the potential impact of millions of variants on transcription and identify thousands of regulatory variants potentially driving downstream phenotypes.
Recent studies have shown that there is not only a considerable overlap in loci linked to the same traits across species, but that gene regulation is well conserved across mammals, to the extent that models for predicting distal regulatory elements work well across species. This suggests findings from one species can potentially be lifted over to another. To investigate the potential of this we will test which human genetic variants impact transcription in cattle cells, as well as vice versa, and see which variants have conserved impacts across species. Using these data alongside relevant annotations, we will develop machine learning models for predicting which genetic changes effect transcription and have conserved impacts on gene expression across mammals. This will enable the statistical prediction of functional variants and the potential lifting over and exploitation of findings across species. We will validate these models and predictions using CRISPR/Cas9 editing through introducing human variants predicted to have conserved impacts across species into cattle cells and assessing their impacts on transcription.
Recent studies have shown that there is not only a considerable overlap in loci linked to the same traits across species, but that gene regulation is well conserved across mammals, to the extent that models for predicting distal regulatory elements work well across species. This suggests findings from one species can potentially be lifted over to another. To investigate the potential of this we will test which human genetic variants impact transcription in cattle cells, as well as vice versa, and see which variants have conserved impacts across species. Using these data alongside relevant annotations, we will develop machine learning models for predicting which genetic changes effect transcription and have conserved impacts on gene expression across mammals. This will enable the statistical prediction of functional variants and the potential lifting over and exploitation of findings across species. We will validate these models and predictions using CRISPR/Cas9 editing through introducing human variants predicted to have conserved impacts across species into cattle cells and assessing their impacts on transcription.
Organisations
Publications
Zhao R
(2022)
The conservation of human functional variants and their effects across livestock species
in Communications Biology
| Description | We have successfully generated the first genome-wide massively parallel reporter assay (MPRA) dataset in cattle. This dataset covers both sub-species (bos indicus and bos taurus) and has matching human data with all three MPRA libraries tested across human and cattle cells. This unique resource has allowed us to not only define cattle regulatory variants at base pair resolution, but also identify genetic variants whose effects depend on their cellular environment. We are using these data alongside other omics data with bioinformatics/machine learning approaches to gain new insights into the genetics of gene regulation in cattle and how it has evolved across mammals. |
| Exploitation Route | We have already demonstrated these data can be used to identify functional variants underlying important phenotypes and we expect us and others can use the data to improve the estimation of genomic estimated breeding values (gEBVs) to improve cattle production and health phenotypes. |
| Sectors | Agriculture Food and Drink |
| Description | AI accelerated genomic improvement in LMIC livestock |
| Amount | $1,499,106 (USD) |
| Funding ID | INV-076519 |
| Organisation | Bill and Melinda Gates Foundation |
| Sector | Charity/Non Profit |
| Country | United States |
| Start | 01/2025 |
| End | 12/2026 |
| Title | Cattle PhastCons and PhyloP conservation scores |
| Description | This dataset provides conservation scores for cattle (ARS-UCD1.2/bosTau9), including PhastCons and PhyloP scores in BigWig format. Please cite the following paper if you use the cattle conservation scores: Zhao, R., Owen, R., Marr, M., Siddharth, J., Hong, N. C., Talenti, A., Hassan, M. A., & Prendergast, J. G. D. (2024). The potential of regulatory variant prediction AI models to improve cattle traits (p. 2024.08.01.606140). bioRxiv. https://doi.org/10.1101/2024.08.01.606140 |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://zenodo.org/doi/10.5281/zenodo.13332540 |
| Title | Cattle PhastCons and PhyloP conservation scores |
| Description | This dataset provides conservation scores for cattle (ARS-UCD1.2/bosTau9), including PhastCons and PhyloP scores in BigWig format. Please cite the following paper if you use the cattle conservation scores: Zhao, R., Owen, R., Marr, M., Siddharth, J., Hong, N. C., Talenti, A., Hassan, M. A., & Prendergast, J. G. D. (2024). The potential of regulatory variant prediction AI models to improve cattle traits (p. 2024.08.01.606140). bioRxiv. https://doi.org/10.1101/2024.08.01.606140 |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://zenodo.org/doi/10.5281/zenodo.13332541 |
| Title | Cattle tissue-specific and cross-tissue AI models for functional variant prediction |
| Description | AI models associated with the grant output preprint "The potential of AI models to identify regulatory variants underlying cattle traits Zhao R, Owen R, Marr M, Jayaraman S, Chue Hong N, Talenti A, Hassan MA, Prendergast JGD" |
| Type Of Material | Computer model/algorithm |
| Year Produced | 2025 |
| Provided To Others? | Yes |
| Impact | Models shown to be effective at differentiating between functional regulatory variants and other variants in the genome. Ongoing work is testing how they can be used to improve genomic estimated breeding values to improve the rate of genetic gain in cattle. |
| URL | https://doi.org/10.5281/zenodo.14901001 |
| Title | A machine learning pipeline using Catboost to predict functional variants |
| Description | This repository contains a machine learning pipeline for training and evaluating a CatBoost Classifier with Bayesian hyperparameter optimization as described in this preprint https://www.biorxiv.org/content/10.1101/2024.08.01.606140v1 |
| Type Of Technology | Software |
| Year Produced | 2025 |
| Open Source License? | Yes |
| Impact | See preprint https://www.biorxiv.org/content/10.1101/2024.08.01.606140v1 |
| URL | https://www.biorxiv.org/content/10.1101/2024.08.01.606140v1 |
| Title | Cross-species variant annotation Nextflow pipeline |
| Description | A cross-species variant annotation pipeline built with Nextflow. The pipeline offers five categories of annotations: sequence conservation, variant position properties, VEP annotations, sequence context, and predicted functional genomic scores using the Enformer deep learning sequence-based model. |
| Type Of Technology | Software |
| Year Produced | 2024 |
| Open Source License? | Yes |
| Impact | Used to derive functional variant prediction AI models see preprint for further details (https://www.biorxiv.org/content/10.1101/2024.08.01.606140v1) |
| URL | https://www.biorxiv.org/content/10.1101/2024.08.01.606140v1 |
