Identification of non-coding variants regulating gene expression

Lead Research Organisation: University of Cambridge
Department Name: Public Health and Primary Care

Abstract

Background
Common chronic diseases have complex, multifactorial aetiologies that involve the interplay of both genetic susceptibility and environmental risk factors. GWAS have been helpful in discovering the general genomic regions of susceptibility for complex traits. However identifying the causal variant within the locus and the mechanism by which it exerts its effect remains a challenge. The identified variants are often not causal, but rather proxies for nearby causal variants which could be present at rare or intermediate frequencies. Furthermore, the majority of GWAS variants are non-coding suggesting they may exert their effect through the regulation of gene expression. To make use of non-coding DNA variation in a precision medicine context, we need a better understanding of the causal variants affecting gene expression.

The INTERVAL cohort is a genomic bioresource of 50,000 blood donors recruited into a randomized trial to assess the frequency of blood donation. At the Wellcome Sanger Institute, we are currently generating RNA-sequencing data for 5,000 of these individuals in collaboration with AstraZeneca. Integrating this data with the whole exome sequencing data and genotyping data already available will allow us to associate genetic variants with changes in gene expression through expression quantitative trait locus (eQTL) mapping.

We will use the INTERVAL cohort of healthy volunteers to conduct an in depth analysis of non-coding variants regulating gene expression in normal human physiology. This information can then be applied to cohorts of disease patients to understand how non-coding variants are contributing to disease. For an initial case study to demonstrate the utility of this approach, we will focus on respiratory disease and specifically on asthma as this is an area of particular interest at AstraZeneca. AstraZeneca have suitable clinical cohorts available in which we can conduct the analysis described in the experimental plan below.


Aim
To identify the full allelic frequency spectrum of rare-intermediate-common genetic variants that might explain variability in gene expression.

Proposed experimental plan
1. Fine mapping of eQTL effects.
The size of the INTERVAL cohort will give us sufficient power to assess a greater allele frequency (>0.5%) of variants when mapping eQTLs. By integrating whole exome sequencing data we can additionally identify rare variants that are associated with aberrant expression of a gene. These two approaches will give us the full allelic spectrum of variants in a gene locus and their contributions to the variation in gene expression observed.

2. Contribution of variance QTLs to gene expression variation.
In addition to the mean expression, the variance of a gene can also be dependent on genotype. The identification of variance QTLs will identify alleles that are associated with tight control of gene variance and those where the control has been disrupted giving further insight into the mechanisms of gene expression regulation.

3. Imputation of pathway activity.
Aims 1 and 2 will generate a resource representing the association of genetic variants with gene expression under normal human physiology. We can use this information to impute the expression of a single gene but also a pathway in a disease cohort with genetic data available. This will highlight individuals with over expression or suppression of a particular pathway relative to the population average which will be informative for providing a more precision approach to treatment.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/V509425/1 01/10/2020 30/09/2024
2445119 Studentship BB/V509425/1 01/10/2020 30/09/2024 Thomas Vanderstichele