Linking Regulatory Elements Harboring Common Disease-Associated Variants to Their Target Genes

Lead Research Organisation: Babraham Institute
Department Name: Nuclear Dynamics

Abstract

Scientists and clinicians have compared the DNA sequences of millions of healthy and sick individuals and found that the presence of a particular traits, diseases or disease symptoms often correspond with small variations in an individuals DNA sequence. These variable regions are quite small, and when located within a gene can lead to inappropriate gene expression, which in turn causes disease. Knowing the particular disease gene a patient has is important because it allows clinicians to treat individuals appropriately, for example, with specific therapies of medicines designed to compensate for the mutation and alleviate symptoms or cure the disease. A major problem is that most of the variable regions found to be associated with disease lie far from any gene in the genomic sequence, and understanding how they contribute to disease is a mystery. In these cases clinicians cannot identify a disease gene, meaning the patient is likely to receive more generalized treatment rather than a specific treatment designed to overcome the specific genetic fault. Thus a great deal of potentially useful genetic information cannot be used in the fight against disease. For normal, healthy gene expression, it is known that many genes require short regions of DNA sequence called enhancers. Enhancers can be located at considerable distances along the DNA from the genes that they control. In fact they can be so far away that it is often not possible to identify which gene they control. However, we have developed a new method to allow us to identify all distal enhancers in the genome and assign them to the genes that they control. When we compared our preliminary list of enhancers to a list of genetic variations associated with disease we found a highly significant overlap. This means that many of the distant disease-associated variable regions are likely to be long-range enhancers, and because we know which enhancer controls which gene, we can identify the potential disease causing gene. Using the information we will gain from the proposed experiments we expect to identify hundreds to thousands of potential new disease genes. This information will open the door to hundreds or thousands of new clinical treatments and medicines that are specifically designed to treat the specific causes of disease, thereby significantly decreasing patient suffering and improving the success of clinical care.

Technical Summary

Numerous genome-wide association studies (GWAS) have identified thousands of regions of the human genome containing structural variants or single nucleotide polymorphisms (SNPs) that are associated with various common diseases and traits. The vast majority of GWAS SNPs lie in non-coding regions and more than half are located at considerable distances from nearby genes making it very difficult to predict if the associated variant is causal, or to identify a potential causal gene. Thus a large amount of potentially useful information on genetic variability and its links with disease is currently un-exploitable. A recent study (Maurano et al., 2012) shows that many GWAS SNPs map to DNAse I hypersensitive sites (DHS), which are a characteristic feature of gene regulatory elements such as enhancers. Enhancers can be up to a megabase away from the genes they regulate, often jumping over several intervening genes, and can be located within one gene while regulating another. Non-coding structural variants may therefore contribute to disease by disrupting transcription factor binding sites in enhancers or other regulatory elements that control distal genes, leading to gene mis-regulation. Our preliminary findings strongly support this hypothesis. In this application we propose to systematically identify all long-range genomic elements that interact with all human gene promoters in a panel of primary human cell types using a new method that we have developed called promoter capture Hi-C. Our preliminary data show that the long-range elements that we identify through interaction with promoters are enriched in active histone marks and transcription factor binding sites characteristic of enhancers. These genomic regions are significantly enriched in disease associated SNPs. This work will allow us to identify hundreds to thousands of new potential disease-causing genes, creating new biomarkers and opening the door to new drug targets and avenues of clinical intervention.

Planned Impact

Patients and clinicians

Many diseases have an identified genetic component associated with one or more SNPs or structural variants. However, as the majority of these variants map to non-coding regions, identifying target genes and pathways for therapeutic interventions has been difficult. Our project aims to bridge this gap by identifying associations of non-coding regions with gene promoters, which will facilitate a better understanding of disease etiology and potentially lead to new treatments. Importantly, since we aim to identify all long-range interactions with gene promoters in multiple cell types, this analysis will help the interpretation of not only currently known non-coding polymorphisms, but also of the ones that will be identified as clinically relevant in the future. First findings to this regard are likely to occur within 3 years (the course of this grant). However, the development and application of these findings in the form of testing the therapies in pre-clinical and early phase clinical studies will likely take another decade. We will partner with our clinical collaborators on the project, (including Profs Ouwehand and Smith who are practicing consultant physicians at Addenbrooke's hospital) to maximize the clinical benefit of the data produced in this project.

Babraham Institute, MRC and biotech/pharma industry

New treatments can potentially be developed based on new target genes identified through their long-range interactions with clinically significant non-coding variants. The proposed work therefore has the potential to generate intellectual property, which is likely to benefit the Babraham Institute and the MRC by licensing new treatments to biotech companies or founding our own start-up company in order to exploit our findings. The timescale for generating new intellectual property is likely to be 3 years (i.e., within the duration of this grant), with the development of disease therapeutics likely to take a further decade.

Academic community

Data resulting from our work will be of interest to a broad range of academic researchers, from clinical and population geneticists to basic scientists investigating gene regulation and the biology of the analysed cell types. In addition to publishing our research in peer-reviewed journals and presenting it at scientific conferences, we will produce a public online service to query target gene associations for all clinically relevant SNPs and structural variants as well as for any other loci of interest.

Publications

10 25 50
 
Description Blood Cell Promoter Capture Hi-C collaboration 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution Promoter Capture Hi-C in 17 primary human haematopoietic cell types done by us.
Collaborator Contribution Cell types provided by collaborators. Bioinformatic, statistical and genetic analyses done with collaborators
Impact Multi-disciplinary. Molecular Biology, Bioinformatics, Statistics, Population Genetics
Start Year 2014
 
Description Blood Cell Promoter Capture Hi-C collaboration 
Organisation NHS Blood and Transplant (NHSBT)
Country United Kingdom 
Sector Public 
PI Contribution Promoter Capture Hi-C in 17 primary human haematopoietic cell types done by us.
Collaborator Contribution Cell types provided by collaborators. Bioinformatic, statistical and genetic analyses done with collaborators
Impact Multi-disciplinary. Molecular Biology, Bioinformatics, Statistics, Population Genetics
Start Year 2014
 
Description Blood Cell Promoter Capture Hi-C collaboration 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution Promoter Capture Hi-C in 17 primary human haematopoietic cell types done by us.
Collaborator Contribution Cell types provided by collaborators. Bioinformatic, statistical and genetic analyses done with collaborators
Impact Multi-disciplinary. Molecular Biology, Bioinformatics, Statistics, Population Genetics
Start Year 2014
 
Description Cambridge Science Festival 2015 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Participated in Cambridge Science Festival evening. Answered questions and engaged in general discussion about our science with general public at the Babraham stand.
Year(s) Of Engagement Activity 2015
URL http://www.sciencefestival.cam.ac.uk/about/past-festivals/2015-cambridge-science-festival
 
Description Schools Day 2015 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact 10 pupils visited my lab for morning or afternoon of hands on experiments. Lots of questions and discussions and pupils had a chance to meet real scientists. School reported increased interest in subject area.
Year(s) Of Engagement Activity Pre-2006,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016