Integrating Capture-HiC with omic time course data to uncover the regulatory interactions modulated by genetic variation in disease

Lead Research Organisation: University of Manchester
Department Name: School of Health Sciences

Abstract

Rheumatoid arthritis (RA) is one of the most common chronic inflammatory diseases. A number of genetic differences related to RA have been discovered through Genome Wide Association Studies (GWAS) where large genetic datasets are analysed to statistically associate genetic changes in the human population (usually differences at single nucleotides, called SNPs) with disease risk. However, to make progress it is now important to improve our understanding of how these disease-associated SNPs affect the functions of cells and tissues, so that we can better understand the disease mechanism and ultimately develop more effective medicines. A major obstacle to understanding the effect of these disease-associated SNPs is that many of them lie far from the main control regions of protein-coding genes (known as promoters) in terms of the linear DNA sequence of the genome. There is strong evidence that some of these SNPs lie in enhancer regions which are distal control regions of DNA that come into contact with promoters through looping of the DNA sequence, i.e. these enhancer regions appear distant in terms of DNA sequence but may be close in physical space. One of the applicants, Peter Fraser, is a leading expert in investigating how DNA folds and has developed novel experimental methods to allow this to be tested. This allows us to investigate how the DNA regions associated with RA interact and control genes in human cells. In this project we will combine this technique with measurements of gene activity and enhancer activity in human cells over time. We will look specifically at stimulated T-cells, cells which are involved in the immune system and are known to be important determinants of RA disease progression. We will use these data to build mathematical models describing how T-cells regulate gene expression through enhancer activity and enhancer-promoter interaction. These models will allow us to better understand how regulatory proteins, called transcription factors, bind to the DNA at enhancers and promoters to turn genes on or off. Finally, we will carry out experiments on T-cells derived from healthy human volunteers where genetic data (SNP calls) are already available to test whether the natural genetic variation we observe at SNPs identified through our analysis do have a strong effect on enhancer activity or enhancer-promoter interactions. This would then provide strong evidence for how these SNPs regulate specific genes and we will investigate the downstream cellular pathways to which these genes belong.

The project brings a leading RA genetics group together with a leading molecular biology group and a leading mathematical modeller to work closely together and utilise the most up to date knowledge to gain insight into the function of genes that cause RA.

Technical Summary

It has become clear that T-cells are of fundamental importance in Rheumatoid arthritis (RA). In this project we will use time course data from stimulated primary human T-cells to uncover the regulatory function of RA-associated enhancers controlling target gene expression. Capture-HiC data enriched for promoters and disease-associated enhancers will be used to identify the physical interactions between enhancers and promoters and how these change through time. Nuclear RNA-Seq will be used to quantify nascent transcription. ATAC-Seq will be used to profile open chromatin and identify likely transcription factor binding (TF) events. We will collect data from these three assays from the same primary T-cell population over a 24 hour time period after stimulation. We will then use these data to develop a regulatory network model, using non-parametric Gaussian process methods to infer TF activities and mathematical models to describe the effect of TF-binding on promoter activity and nascent RNA production rates. We will score alternative regulatory network models through Bayesian model comparison in order to identify the most well-supported models of regulation for each target gene. Computational methods will be scaled up from previous smaller scale applications to deal with more complex enhancer-driven regulation involving large numbers of regulatory TFs. Methods will be published as open source computational tools for the community. Finally, we will validate our predictions through follow-up experiments on human data by comparing subjects with differences in SNPs implicated through our model-based analysis. We will use targeted ChIP-PCR to look at the effect of these SNPs on epigenetic marks and TF binding and targeted 3C to confirm enhancer-promoter interactions and look at whether these are altered according to genetic differences.

Planned Impact

Our vision is to leverage the power of a systems biology approach to better understand the pathogenesis of RA and to translate this to clinical benefit. Having established expertise in genetics, functional genomics, bioinformatics, statistical modelling and chromatin biology, for the first time we will incorporate chromatin interaction, chromatin accessibility and gene expression data generated from time-course experiments in stimulated CD4+ T-cells, a cell type known to be crucial in disease development, to determine how disease genes interact to cause RA. This will provide unprecedented insight into how biological pathways are regulated as a result of combinations of disease associated genetic regions and allow subgrouping of patients by the primary pathway disrupted. In addition, the study will provide a molecular profile, based on the time-course transcriptomic data, to define how a CD4+ T-cell responds to stimulus. The impact on this expression profile will then be analysed on different RA risk genetic backgrounds. In turn, this will inform the stratified medicine agenda by allowing better targeting of available therapies according to the primary pathway mediating disease We will focus on RA, initially in CD4+ T-cells, but this approach has the potential to inform the study of all diseases with a genetic component to susceptibility and our strategy and tools can be adapted for a diverse set of cell types and stimulation regimes.

Potential non-academic beneficiaries of the research include:

Pharmaceutical companies seeking drug targets to prioritise for RA and other complex or immune disorders and patients who may ultimately benefit from the resulting medicines (time-frame: 10-15 years from lead to product)

Pharmaceutical companies or health service providers interested in targeting existing treatments as personalised therapies who will benefit from better stratification of RA and other complex diseases, and patients who may ultimately benefit (time-frame: 1-3 years)

Health service providers carrying out genetic risk profiling who would benefit from a better risk assessment and prevention strategies from genetic variation data, and patients and their families who may ultimately benefit (time-frame: 1-3 years)

Research staff employed on the project will also be beneficiaries as they will be provided with comprehensive research training. As detailed in 'Pathways to Impact', the researchers will benefit from professional and career development opportunities at the University of Manchester. Past experience from the applicants' laboratories has indicated that the skills acquired by the researchers have been, and will continue to be, transferable to careers in research, industry, the NHS, medical writing, education and funding agencies (beneficiaries: researchers employed on the programme; time-frame: ongoing; prospective employers; time-frame: 3-5 years).

Publications

10 25 50
 
Description Fellowship
Amount £160,000 (GBP)
Organisation Versus Arthritis 
Sector Charity/Non Profit
Country United Kingdom
Start 10/2016 
End 07/2020
 
Description Research Grant
Amount £196,000 (GBP)
Organisation Versus Arthritis 
Sector Charity/Non Profit
Country United Kingdom
Start 12/2016 
End 01/2020
 
Description Versus Arthritis Centre for genetics and genomics
Amount £1,999,950 (GBP)
Funding ID 21754 
Organisation Versus Arthritis 
Sector Charity/Non Profit
Country United Kingdom
Start 07/2018 
End 07/2023
 
Title Accompanied data files used in the paper "Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis" 
Description lists of source file used in the paper "Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis" by Jing Yang, Amanda McGovern, Paul Martin, Kate Duffus, Xiangyu Ge, Peyman Zarrineh, Andrew P Morris, Antony Adamson, Peter Fraser, Magnus Rattray & Stephen Eyre. The paper has been accepted by Nature Communications. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://zenodo.org/record/3890171
 
Title Accompanied data files used in the paper "Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis" 
Description lists of source file used in the paper "Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis" by Jing Yang, Amanda McGovern, Paul Martin, Kate Duffus, Xiangyu Ge, Peyman Zarrineh, Andrew P Morris, Antony Adamson, Peter Fraser, Magnus Rattray & Stephen Eyre. The paper has been accepted by Nature Communications. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://zenodo.org/record/3840593
 
Title Accompanied data files used in the paper "Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis" 
Description lists of source file used in the paper "Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis" by Jing Yang, Amanda McGovern, Paul Martin, Kate Duffus, Xiangyu Ge, Peyman Zarrineh, Andrew P Morris, Antony Adamson, Peter Fraser, Magnus Rattray & Stephen Eyre. The paper has been accepted by Nature Communications. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://zenodo.org/record/3840592
 
Title Accompanied data files used in the paper "Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis" 
Description lists of source file used in the paper "Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis" by Jing Yang, Amanda McGovern, Paul Martin, Kate Duffus, Xiangyu Ge, Peyman Zarrineh, Andrew P Morris, Antony Adamson, Peter Fraser, Magnus Rattray & Stephen Eyre. The paper has been accepted by Nature Communications. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://zenodo.org/record/3899030
 
Title DEtime package 
Description DEtime is an R package for two-sample time series analysis using Gaussian process methods. This package implements the Gaussian regression framework for perturbation time point inferrence in a two sample case. The paper describing this package is available at DOI: https://doi.org/10.1093/bioinformatics/btw329 and arXiv: http://arxiv.org/abs/1602.01743. Please refer to the Jupyter notebook DEtime_illustration.ipynb for R codes about how to run the package. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact This software forms the basis of the later BGP package for single-cell data analysis. 
URL https://github.com/ManchesterBioinference/DEtime
 
Title HiChIP-Peaks 
Description This package can be used to find enriched peak regions from HiChIP datasets that can then be used as an input to available loop calling tools or to do differential peak analysis. It takes the HiC-Pro output and converts it to a restriction site level resolution map. It then selects reads within a specified number of restriction sites from the diagonal(default = 2) and models the background as a negative binomial. It calls peaks regions that significantly exceed the background. The output is a list of peaks with their properties and a bedgraph at a restriction site level resolution that describes the reads per site. Using the differential analysis command it can be used to create a consensus peakset and then identify differentially bound regions across samples. Results from this package can then be used for further analysis and as a peaks dataset input for various loop calling software. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact HiChIP is a powerful tool to interrogate 3D chromatin organization. Current tools to analyse chromatin looping mechanisms using HiChIP data require the identification of loop anchors to work properly. However, current approaches to discover these anchors from HiChIP data are not satisfactory, having either a very high false discovery rate or strong dependence on sequencing depth. Moreover, these tools do not allow quantitative comparison of peaks across different samples, failing to fully exploit the information available from HiChIP datasets. We develop a new tool based on a representation of HiChIP data centred on the re-ligation sites to identify peaks from HiChIP datasets, which can subsequently be used in other tools for loop discovery. This increases the reliability of these tools and improves recall rate as sequencing depth is reduced. We also provide a method to count reads mapping to peaks across samples, which can be used for differential peak analysis using HiChIP data. 
URL https://www.biorxiv.org/content/10.1101/682781v1.abstract