Estimation of the genetic correlation among human cancers and identification of pleiotropic cancer loci

Lead Research Organisation: University of Edinburgh
Department Name: The Roslin Institute


Over the last twenty-five years or so geneticists have been trying to identify those genes that when mutated increase the risk of developing disease. Some of these diseases are rare in the population and in many cases the main gene that increases risk has been identified. Other diseases are common in the population and scientists call them 'common diseases'. These include, among others, most cancers, diabetes and heart disease. Understanding the genetics of common diseases is tricky because they are likely to be caused by the aggregate effect of thousands of mutations in the genome, each increasing risk by very little. Technological developments after the completion of the Human Genome Project have allowed the possibility of trying to identify the regions of the genome that make some people more prone to develop common diseases than others. The strategy has worked to some extend and has identified large numbers of regions for different diseases. Despite that success we know that the identified loci cannot explain all the recurrent risk of disease that we observe within families. There are numerous reasons why the current approach has had limited success. One of them is that the statistical methods applied are not able to model the structure of the genome properly. For instance, there could be multiple mutations on the same region whose combined effect makes one more prone to disease. At the moment the statistical methods applied only investigate one mutation at a time. We will develop more complex statistical methods to solve this problem.
Another limitation of the current strategy is that scientists study one disease at a time. This is quite unsatisfactory because we know that some people tend to have multiple diseases either simultaneously or at different times. For example, being overweight increases one risk of developing cancer and this could be because the same genes that increase susceptibility to being overweight increase the chances of having cancer. We believe that by taking a more general approach and modelling two or more diseases simultaneously we could be more successful in identifying those furtive genomic regions that current methods cannot locate. We propose to develop methodology that can do that.
To make our research more useful we will also apply our methods to different cancers. Different cancers often develop within families and there is a good chance that this is because susceptibility genes are shared among cancers. However, we know very little about this. We wish to learn more because this could be used for predicting cancer risk more accurately or to find new uses for available drugs (i.e. use drugs developed to treat one cancer to treat another one).

Technical Summary

This project proposes to develop methodology for the estimation of the genetic correlation between two traits. These traits could be two diseases, a clinically relevant intermediate trait and a disease or two clinically relevant continuous traits. The methods are targeted to exploit the use of unrelated samples of the population. We will mainly focus on the most challenging scenario where both traits are not observed/measured on the same individuals. The methods will allow obtaining estimates of the degree by which diseases share susceptibility loci and gene pathways represented in current genotyping arrays. The project will set up a general analytical framework for complex diseases and an approach to combine different sources of information where missing data is a problem. For instance, when the clinical intermediate phenotype has been measured only in a subset of samples or when repeated measures are difficult to obtain (eg, measure cognitive ability in childhood and old age). We will use cancer as a paradigm of correlated disease. The incontrovertible evidence for genetically determined susceptibility to cancer has spurred a large number of genome-wide association studies (GWAS) to identify susceptibility loci. Despite their arguable success, the current approach is unsatisfactory in that uses a 'one SNP at a time' approach which might miss combinations of common and/or rare variants at particular loci and that it models 'one trait at a time' thereby missing vital information about the correlation structure of diseases and clinical traits. Our methodology has the potential to overcome the limitations of the current approach and identify some of the large proportion of genetic variation that remains unexplained. We will use restricted maximum likelihood (REML) in a mixed-model framework to estimate the genetic correlation captured by SNP arrays at a global and local genomic level.

Planned Impact

Who might benefit of the proposed research and how will they benefit?
Our research will in the short-term benefit the scientific community working on complex traits and cancer genetics. The community working on complex traits will benefit through the development of new methodology that may help to answer one of the most debated questions in complex traits genetics. Where is the genetic variation that we know contributes to variation in disease risk?
The community working on susceptibility to cancer will benefit from the methodology and from the discovery on new susceptibility loci. Identification of a new locus brings numerous downstream lines of research including but not limited to: replication in other populations or ethnic groups, functional studies to understand the molecular mechanisms underlying disease, fine-mapping and resequencing, expression analyses and development of animal models (e.g. conditional mice KO models). The discovery of pleiotropic loci may have twice the impact because would raise the interest of researchers working on two or more different cancers.
In the mid-term our research could benefit the pharmaceutical industry. Pleiotropic loci would highlight common pathways between diseases that are currently unknown. This offers the possibility of drug repositioning; that is using one drug developed for one disease in another disease. Drug repositioning not only increases the niche market for the drug but also reduces substantially the costs and time associated with developing drugs, thereby maximising profit. The more profit companies make the more taxes they pay, and hence improve the finances of the governments.
Finally, in the long-term, society will benefit. Patients will benefit through better prediction models of risk that could be used to tailor the level of screening to the level of risk. This should reduce incidence and mortality. The National Health Service would benefit through more efficient use of resources (i.e. screening those that need it more) and if drug repositioning worked through cheaper drugs. Employers would benefit too. If cancer patients were diagnosed earlier or even treated when the disease is still premalignant, then their recovery would be faster and the length of sick leave would be substantially reduced. At the personal level it would also reduce the devastating effects cancer has on patients and family.


10 25 50
Description National Institute of Cancer in Colombia 
Organisation National Cancer Institute of Bari
Country Italy 
Sector Public 
PI Contribution Expertise in cancer genetic epidemiology and statistical genetics.
Collaborator Contribution Provision of colorectal cancer case-control study. Provision of normal colonic tissue and blood for DNA methylation studies.
Impact Papers. One published (doi: 10.1038/ejhg.2013.310), one in press.
Start Year 2011
Description Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ~4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact It is too soon to know the impact of DISSECT. From our point of view it allowed us to improve the prediction accuracy of complex traits substantially. 
Description Media interview 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Radio interviews with BBC Scotland, La Fm ( of the RCN-national radio of Colombia) , Newstalk (Ireland).
Almetric score of 259.
Year(s) Of Engagement Activity 2016
Description School visit (Sciennes Primary in Edinburgh) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact We talk about the brain to P5 pupils. The main objective of the activity was to demonstrate that different parts of the brain play different roles and that when genes disregulate in one part of the brain (for instance, because of cancer) this lead to a set of specific symptoms depending on the part of the brain.
We then made an activity where the pupils used plasticine to represent different parts of the brain.
Year(s) Of Engagement Activity 2016