Estimation of the genetic correlation among human cancers and identification of pleiotropic cancer loci

Lead Research Organisation: University of Edinburgh

Department Name: The Roslin Institute

Abstract

Over the last twenty-five years or so geneticists have been trying to identify those genes that when mutated increase the risk of developing disease. Some of these diseases are rare in the population and in many cases the main gene that increases risk has been identified. Other diseases are common in the population and scientists call them 'common diseases'. These include, among others, most cancers, diabetes and heart disease. Understanding the genetics of common diseases is tricky because they are likely to be caused by the aggregate effect of thousands of mutations in the genome, each increasing risk by very little. Technological developments after the completion of the Human Genome Project have allowed the possibility of trying to identify the regions of the genome that make some people more prone to develop common diseases than others. The strategy has worked to some extend and has identified large numbers of regions for different diseases. Despite that success we know that the identified loci cannot explain all the recurrent risk of disease that we observe within families. There are numerous reasons why the current approach has had limited success. One of them is that the statistical methods applied are not able to model the structure of the genome properly. For instance, there could be multiple mutations on the same region whose combined effect makes one more prone to disease. At the moment the statistical methods applied only investigate one mutation at a time. We will develop more complex statistical methods to solve this problem.
Another limitation of the current strategy is that scientists study one disease at a time. This is quite unsatisfactory because we know that some people tend to have multiple diseases either simultaneously or at different times. For example, being overweight increases one risk of developing cancer and this could be because the same genes that increase susceptibility to being overweight increase the chances of having cancer. We believe that by taking a more general approach and modelling two or more diseases simultaneously we could be more successful in identifying those furtive genomic regions that current methods cannot locate. We propose to develop methodology that can do that.
To make our research more useful we will also apply our methods to different cancers. Different cancers often develop within families and there is a good chance that this is because susceptibility genes are shared among cancers. However, we know very little about this. We wish to learn more because this could be used for predicting cancer risk more accurately or to find new uses for available drugs (i.e. use drugs developed to treat one cancer to treat another one).

Technical Summary

This project proposes to develop methodology for the estimation of the genetic correlation between two traits. These traits could be two diseases, a clinically relevant intermediate trait and a disease or two clinically relevant continuous traits. The methods are targeted to exploit the use of unrelated samples of the population. We will mainly focus on the most challenging scenario where both traits are not observed/measured on the same individuals. The methods will allow obtaining estimates of the degree by which diseases share susceptibility loci and gene pathways represented in current genotyping arrays. The project will set up a general analytical framework for complex diseases and an approach to combine different sources of information where missing data is a problem. For instance, when the clinical intermediate phenotype has been measured only in a subset of samples or when repeated measures are difficult to obtain (eg, measure cognitive ability in childhood and old age). We will use cancer as a paradigm of correlated disease. The incontrovertible evidence for genetically determined susceptibility to cancer has spurred a large number of genome-wide association studies (GWAS) to identify susceptibility loci. Despite their arguable success, the current approach is unsatisfactory in that uses a 'one SNP at a time' approach which might miss combinations of common and/or rare variants at particular loci and that it models 'one trait at a time' thereby missing vital information about the correlation structure of diseases and clinical traits. Our methodology has the potential to overcome the limitations of the current approach and identify some of the large proportion of genetic variation that remains unexplained. We will use restricted maximum likelihood (REML) in a mixed-model framework to estimate the genetic correlation captured by SNP arrays at a global and local genomic level.

Planned Impact

Who might benefit of the proposed research and how will they benefit?
Our research will in the short-term benefit the scientific community working on complex traits and cancer genetics. The community working on complex traits will benefit through the development of new methodology that may help to answer one of the most debated questions in complex traits genetics. Where is the genetic variation that we know contributes to variation in disease risk?
The community working on susceptibility to cancer will benefit from the methodology and from the discovery on new susceptibility loci. Identification of a new locus brings numerous downstream lines of research including but not limited to: replication in other populations or ethnic groups, functional studies to understand the molecular mechanisms underlying disease, fine-mapping and resequencing, expression analyses and development of animal models (e.g. conditional mice KO models). The discovery of pleiotropic loci may have twice the impact because would raise the interest of researchers working on two or more different cancers.
In the mid-term our research could benefit the pharmaceutical industry. Pleiotropic loci would highlight common pathways between diseases that are currently unknown. This offers the possibility of drug repositioning; that is using one drug developed for one disease in another disease. Drug repositioning not only increases the niche market for the drug but also reduces substantially the costs and time associated with developing drugs, thereby maximising profit. The more profit companies make the more taxes they pay, and hence improve the finances of the governments.
Finally, in the long-term, society will benefit. Patients will benefit through better prediction models of risk that could be used to tailor the level of screening to the level of risk. This should reduce incidence and mortality. The National Health Service would benefit through more efficient use of resources (i.e. screening those that need it more) and if drug repositioning worked through cheaper drugs. Employers would benefit too. If cancer patients were diagnosed earlier or even treated when the disease is still premalignant, then their recovery would be faster and the length of sick leave would be substantially reduced. At the personal level it would also reduce the devastating effects cancer has on patients and family.

Funded Value:

£401,864

Funded Period:

Oct 13 - Oct 16

Funder:

MRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

MR/K014781/1

Principal Investigator:

Albert Tenesa

Health Category:

Unclassified

Organisations

People	ORCID iD
Albert Tenesa (Principal Investigator)
Andrew Law (Co-Investigator)
John Woolliams (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Caballero A (2015) The Nature of Genetic Variation for Complex Traits Revealed by GWAS and Regional Heritability Mapping Analyses. in Genetics

Canela-Xandri O (2015) A new tool called DISSECT for analysing large genomic data sets using a Big Data approach in Nature Communications

Canela-Xandri O (2016) Improved Genetic Profiling of Anthropometric Traits Using a Big Data Approach. in PloS one

Cheng TH (2015) Meta-analysis of genome-wide association studies identifies common susceptibility polymorphisms for colorectal and endometrial cancer near SH2B3 and TSHZ1. in Scientific reports

Muñoz M (2016) Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank. in Nature genetics

Orlando G (2016) Variation at 2q35 (PNKD and TMBIM1) influences colorectal cancer risk and identifies a pleiotropic effect with inflammatory bowel disease. in Human molecular genetics

Rawlik K (2017) Evidence of epigenetic admixture in the Colombian population. in Human molecular genetics

Rawlik K (2016) Imputation of DNA Methylation Levels in the Brain Implicates a Risk Factor for Parkinson's Disease. in Genetics

Tanskanen T (2018) Genome-wide association study and meta-analysis in Northern European populations replicate multiple colorectal cancer risk loci. in International journal of cancer

Tenesa A (2016) Genetic determination of height-mediated mate choice. in Genome biology

Collaboration
Software and Technical Products
Spin Outs
Engagement Activities


Description	National Institute of Cancer in Colombia
Organisation	National Cancer Institute of Bari
Country	Italy
Sector	Public
PI Contribution	Expertise in cancer genetic epidemiology and statistical genetics.
Collaborator Contribution	Provision of colorectal cancer case-control study. Provision of normal colonic tissue and blood for DNA methylation studies.
Impact	Papers. One published (doi: 10.1038/ejhg.2013.310), one in press.
Start Year	2011


Description	UK Biobank Research Analysis Platform
Organisation	UK Biobank
Country	United Kingdom
Sector	Charity/Non Profit
PI Contribution	We were invited by Mark Effingham (Depute CEO of UK Biobank) to be one of the avant-garde teams to access the UK Biobank research analysis platform to adapt and deploy some of the tools we have developed for the analysis of genomic data.
Collaborator Contribution	We are working with UK Biobank and DNAnexus to set up the compute configuration to allow fast genome-wide association studies with array genotypes, imputed genotyped, whole exome and whole genome data.
Impact	No outputs yet.
Start Year	2020


Title	DISSECT
Description	Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ~4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes.
Type Of Technology	Software
Year Produced	2015
Open Source License?	Yes
Impact	It is too soon to know the impact of DISSECT. From our point of view it allowed us to improve the prediction accuracy of complex traits substantially.
URL	http://www.dissect.ed.ac.uk


Company Name	OMECU LIMITED
Description	Software development for analysis of big data.
Year Established	2021
Impact	Received support from the Wellcome iTPA programme, participated in the SETSquared ICURe programme, and received Medical Research Council grants. They also received funding from the University's Data-Driven Entrepreneurship Seed Fund and Fast Track Mentor initiatives, supported by the Scottish Funding Council.
Website	https://www.omecu.com


Description	Media interview
Form Of Engagement Activity	A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Radio interviews with BBC Scotland, La Fm ( of the RCN-national radio of Colombia) , Newstalk (Ireland). Almetric score of 259.
Year(s) Of Engagement Activity	2016
URL	http://www.biotechniques.com/news/The-Role-of-Genes-in-the-Pursuit-of-Love/biotechniques-363094.html...


Description	School visit (Sciennes Primary in Edinburgh)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	We talk about the brain to P5 pupils. The main objective of the activity was to demonstrate that different parts of the brain play different roles and that when genes disregulate in one part of the brain (for instance, because of cancer) this lead to a set of specific symptoms depending on the part of the brain. We then made an activity where the pupils used plasticine to represent different parts of the brain.
Year(s) Of Engagement Activity	2016