Discovery of low-frequency ulcerative colitis risk variants using whole-genome sequencing

Lead Research Organisation: Wellcome Sanger Institute
Department Name: Human Genome

Abstract

Ulcerative colitis (UC) and Crohn's disease (CD) are the two most common forms of inflammatory bowel disease (IBD). Together they affect approximately 400 in every 100,000 people in the UK. They are characterised by chronic and painful inflammation of the gastrointestinal tract that is thought to be caused by an over-reactive immune response to normal intestinal microbes in genetically susceptible individuals. Current therapies are expensive, don't always work, and can have severe side effects that necessitate regular clinical monitoring. There is currently an urgent clinical need for a more personalized approach to IBD therapy coupled with the development of more efficacious and cost-effective treatments.

Data from family studies have demonstrated that there is a large heritable component to IBD risk. Siblings of individuals with UC have a 7-17 fold increased risk of disease compared to individuals without an affected sibling, for CD the increased risk to siblings is 15-42 fold. Large-scale surveys of common genetic variants have identified 99 regions of the genome that explain a proportion of this increased risk of IBD. Scrutiny of the genes that lie within these regions has further developed our understanding of IBD biology and highlighted biological pathways that are potential targets for novel therapeutic interventions.

That further genetic regions remain to be correlated with IBD risk is beyond doubt because only around 20% of the increased sibling risk is explained by those confirmed to date. Some of unexplained risk variation is likely to be underpinned by low-frequency genetic variants of intermediate effect size as these have not been well surveyed by existing genome-wide scans. Recent technological advances in high-throughput DNA sequencing make these variants accessible for the first time on a scale that allows them to be thoroughly surveyed for association to common diseases such as IBD.

Here, I propose to conduct the first large-scale search for low-frequency variants associated with UC using whole-genome sequencing. DNA from 2000 UC patients of UK ancestry will be whole-genome sequenced and compared to sequences from 4000 individuals from the general UK population. Where significant differences are observed we will genotype these genetic variants in an additional 3000 UC cases and controls to confirm they are associated with UC and are not artefacts. Furthermore, we will conduct similar tests to search within UC sequences for genetic variants that underpin clinically relevant disease subclasses such as response to therapy, disease severity and age at onset. Data from the study will also be combined with that from a similar study in CD, underway at the Wellcome Trust Sanger Institute, to enable powerful searches for regions of the genome underlying IBD more broadly. This combined data will also allow us to better understand the genetic differences between the two common forms of IBD.

The identification of novel genetic regions is likely to shed further light on the biology underlying the disease and generate further hypotheses regarding disease aetiology. In time, the identified biological pathways may become targets for novel therapeutics and preventions. More accurate assessments of IBD risk, based on the markers identified in the study, may also be possible depending on the proportion of disease susceptibility variation explained by the identified regions. We will use both clinical and genetic markers of disease risk to build, and test, predictive algorithms in the hope of developing a clinically useful test for IBD status and/or disease severity and response to therapy.

Technical Summary

Ulcerative colitis (UC) and Crohn's disease (CD) are the two most common forms of inflammatory bowel disease (IBD), which is characterised by chronic inflammation of the GI tract. In the UK, IBD affects around 400:100,000 people. Current therapies are expensive, don't always work, and can have severe side effects that necessitate regular clinical monitoring. Genome-wide association studies have identified 47 independent loci associated with UC but much of the disease heritability remains unexplained. In an effort to identify additional risk loci, gain further insights into the genetic and molecular architecture of IBD and move towards more personalised therapy, a whole-genome sequence based association study of 2000 UC cases and 4000 UK population controls will be carried out.

The main objectives of the proposed research are:
1) Using whole-genome sequencing and genotype imputation, conduct highly-powered association tests to identify novel low-frequency variants of intermediate penetrance associated with UC.
2) Combine UC and CD whole-genome sequencing data to identify novel loci associated with IBD and better define the molecular architecture of the two diseases.
3) Utilise rich subphenotype data available across the UC samples to search for genetic markers of clinically relevant subphenotypes such as treatment response and disease severity.
4) Conduct fine-mapping of known loci to better identify causal variants.
5) Build and test predictive algorithms of disease status and severity using confirmed UC loci and clinical measures.
6) Address key issues concerning the design, analysis and interpretation of whole-genome sequence based association studies.

All sequencing data and variant calls will be deposited in the European Genome-Phenome Archive to empower both methodological and applied research in other labs.

Planned Impact

The main objective of this research is to identify novel loci underlying UC risk and further elucidate the biological aetiology of the disease. Pharmaceutical companies developing drugs will be interested in any findings that highlight novel, drugable, targets for IBD treatments. The development of novel therapeutics based on the results of this study is likely to take some time (>10 years) but existing drugs that target pathways not previously implicated in IBD risk may have a more immediate clinical impact. For example, GWAS in Crohn's disease first highlighted autophagy as a key process in Crohn's disease pathogeneses. Sirolimus (rapamycin) is a drug used to upregulate autophagy in cell cultures, and in clinical practice to prevent organ transplant rejection. A recent study presented the case of a woman with severe refractory colonic and perianal Crohn's disease whose symptoms were greatly reduced by administration of sirolimus (Massey et al, Gut, 2008).

Often drugs can fail to reach market because they are not efficacious across a heterogeneous pool of patients or cause severe side effects in a small fraction of individuals. The identification of genetic markers that correlate with treatment response and adverse reactions raises the possibility of a DNA test that can accurately stratify patients as positive or negative responders, and this will be of immediate and direct interest to the pharmaceutical industry. The work will also be central to decisions made by the National Institute for Health and Clinical Excellence (NICE) regarding IBD treatment procedures.

In addition to financial benefits within the pharmaceutical industry, the proposed research has the potential to improve the nation's health and wealth. The research will hopefully, in time, lead to a more personalized approach to IBD therapy and aid the development of more efficacious and cost-effective therapeutics. These will improve clinical management of IBD and significantly improve the quality of life for people suffering from a painful disease that often requires surgical intervention, including complete bowel resection. Furthermore, these developments would likely reduce the cost of IBD treatment to the NHS and get young people back to work; IBD represents a significant cause of morbidity in economically active young people and is a significant burden on healthcare resources, with an estimated annual cost to the NHS of £720 million.

The FTE who works on this project will be excellently placed to stay in academia or move into industry when the post comes to an end. They would be well positioned to become a future leader in the field of complex trait genetics, attract funding support and build their own research team. They will also have developed a wide range of skills including data analysis, project management, computer programming and software development expertise that they could then apply across a broad range of employment sectors.
 
Description scRNA-seq of PBMCs from IBD patients
Amount £2,500,000 (GBP)
Organisation The Wellcome Trust Sanger Institute 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2015 
End 11/2024
 
Title 2000 UC WGS 
Description 2000 ulcerative colitis individuals whole genome sequenced to a mean depth of 2X. The data is available for others to use via the EGA as of 2013. 
Type Of Material Biological samples 
Year Produced 2012 
Provided To Others? Yes  
Impact Added to the Haplotype Reference Consortium panel 
URL https://www.ebi.ac.uk/ega/studies/EGAS00001000329
 
Description Haplotype Reference Consortium 
Organisation University of Michigan
Country United States 
Sector Academic/University 
PI Contribution The Haplotype Reference Consortium is a collaborative effort between the Sanger Institute (Durbin), the Wellcome Trust Centre of Human Genetics (Marchini) and the Univeristy of Michigan (Abecasis) who are leading efforts to combine low-coverage whole-genome sequencing datasets from around the world in order to facilitate accurate imputation of common, low-frequency and rare genetic variants. The ulcerative colitis data generated by funding from this award have been contributed to this effort. This effort will ensure that the whole community of complex disease geneticist benefits from the sequencing of the samples generated using this grant.
Collaborator Contribution The partners are driving the creation of the haplotype reference panel, working out ways to combine the data across many different sequencing strategies and cohorts, and creating the tools that will enable extent research groups to impute genotypes using this haplotype reference panel.
Impact A prototype imputation server, that uses Cloud computing, has been developed by Christian Fuchsberger and Goncalo Abecasis at Michigan. http://imputationserver.sph.umich.edu/
Start Year 2013
 
Description Haplotype Reference Consortium 
Organisation University of Oxford
Department Wellcome Trust Centre for Human Genetics
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution The Haplotype Reference Consortium is a collaborative effort between the Sanger Institute (Durbin), the Wellcome Trust Centre of Human Genetics (Marchini) and the Univeristy of Michigan (Abecasis) who are leading efforts to combine low-coverage whole-genome sequencing datasets from around the world in order to facilitate accurate imputation of common, low-frequency and rare genetic variants. The ulcerative colitis data generated by funding from this award have been contributed to this effort. This effort will ensure that the whole community of complex disease geneticist benefits from the sequencing of the samples generated using this grant.
Collaborator Contribution The partners are driving the creation of the haplotype reference panel, working out ways to combine the data across many different sequencing strategies and cohorts, and creating the tools that will enable extent research groups to impute genotypes using this haplotype reference panel.
Impact A prototype imputation server, that uses Cloud computing, has been developed by Christian Fuchsberger and Goncalo Abecasis at Michigan. http://imputationserver.sph.umich.edu/
Start Year 2013