Determining the causal links and clinical significance of rare genetic variants

Lead Research Organisation: University of Edinburgh
Department Name: MRC Human Genetics Unit


GWAS have identified many common genetic variants associated with various traits and diseases. However most of the individual effects of common variants at the trait level are very small, requiring very large sample sizes (i.e. 10's of thousands) for detection, with associations found providing low accuracy predictions of an individual's liability to disease or outcomes after treatment. Genetic effects of common variants on intermediate phenotypes, such as gene expression or protein concentrations, are often much larger than those on traits and diseases and such associations are thus often detectable in smaller samples of hundreds or thousands of individuals. Combining information from large studies of disease outcomes and smaller transcriptomic or proteomic studies in a two-sample Mendelian randomisation study can be used to provide evidence for a causal path from DNA variation through gene expression to a disease outcome. Even so, the relatively small genetic effects of common variants on phenotypic traits make them difficult to study in the small functional studies feasible in the laboratory.
Some genetic variants have larger genetic effects than those common variants detected by GWAS, but such variants are often kept at low frequency within individual pedigrees by natural selection as a consequence of their larger effects on individual fitness. Such variants are hence difficult to detect in cosmopolitan studies of unrelated individuals, but may become detectable in studies of pedigreed populations, especially where small founder population size and drift may enhance the frequency of otherwise rare variants. Nonetheless such variants are unlikely to be in LD with and hence captured by associations with SNPs on standard arrays. Rare variants of large effect are most likely to be located within or close to expressed genes. Hence using DNA sequence from the exome and adjacent regions is a good strategy to capture such variants.
In this project we propose to link proteomic data with the exome variants to detect locally (i.e. cis) acting genetic effects on protein concentrations.

Technical Summary

Large studies of disease outcomes combined with studies of intermediate phenotypes (transcriptome, proteome) in two-sample Mendelian randomisation (MR) can be used to provide evidence for a causal path from DNA variation through gene expression to disease outcome. Even so, their small phenotypic effect size makes common genetic variants detected by GWAS in cosmopolitan populations difficult to follow up functionally. Genetic variants of larger effects size are rare in populations of largely unrelated individuals, but may become detectable in isolate populations with higher levels of kinship. Such variants are unlikely to be captured by standard SNP genotyping and so require more genome sequence-level analysis.
This project will identify novel and rare variants in whole-exome sequence data that have cis-acting effects on protein abundance in 8000 individuals from isolate populations from the Northern Isles of Scotland and from Croatia. Evidence for the role of protein expression in disease aetiology will be explored using two sample MR. These studies will include re-contact of cohort participants carrying rare variants with large predicted phenotypic effects for a more detailed phenotypic assessment and generation of tractable biological samples. Further study will include exploiting proteomic and other 'omic data to build graphical predictive models for individual traits or diseases incorporating multiple loci.
Description Lecture on "Omics" as part of Genetic Epidemiology course for Masters of Public Health
Geographic Reach Local/Municipal/Regional 
Policy Influence Type Influenced training of practitioners or researchers
Title Interactive tool for exploring genetic regulation of Immunoglobulin G glycosylation 
Description Interactive tool for exploring genome-wide associations of IgG glycosylation and secondary data created for the manuscript on genetic regulation of IgG glycosylation. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact This tool enables easy and quick query of genetic associations with IgG glycosylation. It is aimed at helping wet-lab scientists, glycomics enthusiasts or general public to easily look up for which genetic variants are involved in IgG glycosylation and what is their potential role in diseases and complex traits. 
Title Summary statistics of genome-wide association studies of Immunoglobulin G glycosylation 
Description The majority of proteins undergo post-translational glycosylation, in which complex carbohydrates are attached to the surface of proteins. These can affect protein structure and function, as is the case with Immunoglobulin G, whose effector functions are regulated by the composition of the carbohydrate. Aberrant glycosylation of IgG has been observed in many diseases, but little is understood about the mechanisms behind these changes. This dataset contains summary-level statistics of the largest genome-wide association study of IgG N-glycosylation to date (N=8,090). 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact This dataset will enable other scientist to perform Mendelian Randomisation analyses for assessment of causality of IgG glycosylation in various diseases and complex traits, analyses of genetic correlations, meta-analyses and any other analyses suitable for summary-level association data. 
Description Genetic regulation of protein glycosylation 
Organisation Genos Glycoscience Laboratory
Country Croatia 
Sector Private 
PI Contribution I am collaborating with this company in researching genetic regulation of protein glycosylation. I am advising on statistical data cleaning of the glycome data and share my knowledge and experience in genome-wide association studies. Together with their data analyst we defined new glycosylation traits that have a more direct biological interpretation and higher potential for translation into clinical practice. Genome-wide association analyses of this new phenotype were performed on the University's high performance computing cluster Eddie.
Collaborator Contribution Genos Glycoscience Laboratory is a research-intensive SME that specialises in high-throughput glycosylation studies. They provided new glycosylation datasets that complement well the existing omics available in the group, making our cohorts one of the richest cohorts omics-wise. The lead data analyst on the glycomics genome-wide association studies project is an employee of the company.
Impact doi: 10.1093/hmg/ddz054 Several other manuscripts are in preparation.
Start Year 2018
Description Rare exonic variants in Scottish and Croatian genetic isolates 
Organisation Regeneron Pharmaceuticals, Inc.
Country United States 
Sector Private 
PI Contribution The overall goal of the collaboration is to elucidate the contribution of rare exonic variants to complex traits of public health importance. My role in the project is preparation and sharing of phenotype data, discovery of novel rare variants and association analyses. We also provide expertise regarding the phenotype data.
Collaborator Contribution Regeneron Genetics Center provided exome sequencing data on 4000 individuals from our cohorts and expertise in cleaning and analysing this data.
Impact No specific outputs at this stage.
Start Year 2018
Description Talk at the Science Festival (Orkney) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Around 100 people were present at the Orkney Science Festival talk about DNA sequencing and what we can learn from it. The audience was very engaged and discussions continued even after the talk.
Year(s) Of Engagement Activity 2018