Estimation of the genetic correlation among human cancers and identification of pleiotropic cancer loci
Lead Research Organisation:
University of Edinburgh
Department Name: The Roslin Institute
Abstract
Over the last twenty-five years or so geneticists have been trying to identify those genes that when mutated increase the risk of developing disease. Some of these diseases are rare in the population and in many cases the main gene that increases risk has been identified. Other diseases are common in the population and scientists call them 'common diseases'. These include, among others, most cancers, diabetes and heart disease. Understanding the genetics of common diseases is tricky because they are likely to be caused by the aggregate effect of thousands of mutations in the genome, each increasing risk by very little. Technological developments after the completion of the Human Genome Project have allowed the possibility of trying to identify the regions of the genome that make some people more prone to develop common diseases than others. The strategy has worked to some extend and has identified large numbers of regions for different diseases. Despite that success we know that the identified loci cannot explain all the recurrent risk of disease that we observe within families. There are numerous reasons why the current approach has had limited success. One of them is that the statistical methods applied are not able to model the structure of the genome properly. For instance, there could be multiple mutations on the same region whose combined effect makes one more prone to disease. At the moment the statistical methods applied only investigate one mutation at a time. We will develop more complex statistical methods to solve this problem.
Another limitation of the current strategy is that scientists study one disease at a time. This is quite unsatisfactory because we know that some people tend to have multiple diseases either simultaneously or at different times. For example, being overweight increases one risk of developing cancer and this could be because the same genes that increase susceptibility to being overweight increase the chances of having cancer. We believe that by taking a more general approach and modelling two or more diseases simultaneously we could be more successful in identifying those furtive genomic regions that current methods cannot locate. We propose to develop methodology that can do that.
To make our research more useful we will also apply our methods to different cancers. Different cancers often develop within families and there is a good chance that this is because susceptibility genes are shared among cancers. However, we know very little about this. We wish to learn more because this could be used for predicting cancer risk more accurately or to find new uses for available drugs (i.e. use drugs developed to treat one cancer to treat another one).
Another limitation of the current strategy is that scientists study one disease at a time. This is quite unsatisfactory because we know that some people tend to have multiple diseases either simultaneously or at different times. For example, being overweight increases one risk of developing cancer and this could be because the same genes that increase susceptibility to being overweight increase the chances of having cancer. We believe that by taking a more general approach and modelling two or more diseases simultaneously we could be more successful in identifying those furtive genomic regions that current methods cannot locate. We propose to develop methodology that can do that.
To make our research more useful we will also apply our methods to different cancers. Different cancers often develop within families and there is a good chance that this is because susceptibility genes are shared among cancers. However, we know very little about this. We wish to learn more because this could be used for predicting cancer risk more accurately or to find new uses for available drugs (i.e. use drugs developed to treat one cancer to treat another one).
Technical Summary
This project proposes to develop methodology for the estimation of the genetic correlation between two traits. These traits could be two diseases, a clinically relevant intermediate trait and a disease or two clinically relevant continuous traits. The methods are targeted to exploit the use of unrelated samples of the population. We will mainly focus on the most challenging scenario where both traits are not observed/measured on the same individuals. The methods will allow obtaining estimates of the degree by which diseases share susceptibility loci and gene pathways represented in current genotyping arrays. The project will set up a general analytical framework for complex diseases and an approach to combine different sources of information where missing data is a problem. For instance, when the clinical intermediate phenotype has been measured only in a subset of samples or when repeated measures are difficult to obtain (eg, measure cognitive ability in childhood and old age). We will use cancer as a paradigm of correlated disease. The incontrovertible evidence for genetically determined susceptibility to cancer has spurred a large number of genome-wide association studies (GWAS) to identify susceptibility loci. Despite their arguable success, the current approach is unsatisfactory in that uses a 'one SNP at a time' approach which might miss combinations of common and/or rare variants at particular loci and that it models 'one trait at a time' thereby missing vital information about the correlation structure of diseases and clinical traits. Our methodology has the potential to overcome the limitations of the current approach and identify some of the large proportion of genetic variation that remains unexplained. We will use restricted maximum likelihood (REML) in a mixed-model framework to estimate the genetic correlation captured by SNP arrays at a global and local genomic level.
Planned Impact
Who might benefit of the proposed research and how will they benefit?
Our research will in the short-term benefit the scientific community working on complex traits and cancer genetics. The community working on complex traits will benefit through the development of new methodology that may help to answer one of the most debated questions in complex traits genetics. Where is the genetic variation that we know contributes to variation in disease risk?
The community working on susceptibility to cancer will benefit from the methodology and from the discovery on new susceptibility loci. Identification of a new locus brings numerous downstream lines of research including but not limited to: replication in other populations or ethnic groups, functional studies to understand the molecular mechanisms underlying disease, fine-mapping and resequencing, expression analyses and development of animal models (e.g. conditional mice KO models). The discovery of pleiotropic loci may have twice the impact because would raise the interest of researchers working on two or more different cancers.
In the mid-term our research could benefit the pharmaceutical industry. Pleiotropic loci would highlight common pathways between diseases that are currently unknown. This offers the possibility of drug repositioning; that is using one drug developed for one disease in another disease. Drug repositioning not only increases the niche market for the drug but also reduces substantially the costs and time associated with developing drugs, thereby maximising profit. The more profit companies make the more taxes they pay, and hence improve the finances of the governments.
Finally, in the long-term, society will benefit. Patients will benefit through better prediction models of risk that could be used to tailor the level of screening to the level of risk. This should reduce incidence and mortality. The National Health Service would benefit through more efficient use of resources (i.e. screening those that need it more) and if drug repositioning worked through cheaper drugs. Employers would benefit too. If cancer patients were diagnosed earlier or even treated when the disease is still premalignant, then their recovery would be faster and the length of sick leave would be substantially reduced. At the personal level it would also reduce the devastating effects cancer has on patients and family.
Our research will in the short-term benefit the scientific community working on complex traits and cancer genetics. The community working on complex traits will benefit through the development of new methodology that may help to answer one of the most debated questions in complex traits genetics. Where is the genetic variation that we know contributes to variation in disease risk?
The community working on susceptibility to cancer will benefit from the methodology and from the discovery on new susceptibility loci. Identification of a new locus brings numerous downstream lines of research including but not limited to: replication in other populations or ethnic groups, functional studies to understand the molecular mechanisms underlying disease, fine-mapping and resequencing, expression analyses and development of animal models (e.g. conditional mice KO models). The discovery of pleiotropic loci may have twice the impact because would raise the interest of researchers working on two or more different cancers.
In the mid-term our research could benefit the pharmaceutical industry. Pleiotropic loci would highlight common pathways between diseases that are currently unknown. This offers the possibility of drug repositioning; that is using one drug developed for one disease in another disease. Drug repositioning not only increases the niche market for the drug but also reduces substantially the costs and time associated with developing drugs, thereby maximising profit. The more profit companies make the more taxes they pay, and hence improve the finances of the governments.
Finally, in the long-term, society will benefit. Patients will benefit through better prediction models of risk that could be used to tailor the level of screening to the level of risk. This should reduce incidence and mortality. The National Health Service would benefit through more efficient use of resources (i.e. screening those that need it more) and if drug repositioning worked through cheaper drugs. Employers would benefit too. If cancer patients were diagnosed earlier or even treated when the disease is still premalignant, then their recovery would be faster and the length of sick leave would be substantially reduced. At the personal level it would also reduce the devastating effects cancer has on patients and family.
Publications
Caballero A
(2015)
The Nature of Genetic Variation for Complex Traits Revealed by GWAS and Regional Heritability Mapping Analyses.
in Genetics
Canela-Xandri O
(2015)
A new tool called DISSECT for analysing large genomic data sets using a Big Data approach.
in Nature communications
Canela-Xandri O
(2016)
Improved Genetic Profiling of Anthropometric Traits Using a Big Data Approach.
in PloS one
Kassam I
(2016)
The autosomal genetic control of sexually dimorphic traits in humans is largely the same across the sexes
in Genome Biology
Muñoz M
(2016)
Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank.
in Nature genetics
Orlando G
(2016)
Variation at 2q35 (PNKD and TMBIM1) influences colorectal cancer risk and identifies a pleiotropic effect with inflammatory bowel disease.
in Human molecular genetics
Rawlik K
(2017)
Evidence of epigenetic admixture in the Colombian population.
in Human molecular genetics
Rawlik K
(2016)
Imputation of DNA Methylation Levels in the Brain Implicates a Risk Factor for Parkinson's Disease.
in Genetics
Tanskanen T
(2018)
Genome-wide association study and meta-analysis in Northern European populations replicate multiple colorectal cancer risk loci.
in International journal of cancer
Description | National Institute of Cancer in Colombia |
Organisation | National Cancer Institute of Bari |
Country | Italy |
Sector | Public |
PI Contribution | Expertise in cancer genetic epidemiology and statistical genetics. |
Collaborator Contribution | Provision of colorectal cancer case-control study. Provision of normal colonic tissue and blood for DNA methylation studies. |
Impact | Papers. One published (doi: 10.1038/ejhg.2013.310), one in press. |
Start Year | 2011 |
Description | UK Biobank Research Analysis Platform |
Organisation | UK Biobank |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | We were invited by Mark Effingham (Depute CEO of UK Biobank) to be one of the avant-garde teams to access the UK Biobank research analysis platform to adapt and deploy some of the tools we have developed for the analysis of genomic data. |
Collaborator Contribution | We are working with UK Biobank and DNAnexus to set up the compute configuration to allow fast genome-wide association studies with array genotypes, imputed genotyped, whole exome and whole genome data. |
Impact | No outputs yet. |
Start Year | 2020 |
Title | DISSECT |
Description | Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ~4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes. |
Type Of Technology | Software |
Year Produced | 2015 |
Open Source License? | Yes |
Impact | It is too soon to know the impact of DISSECT. From our point of view it allowed us to improve the prediction accuracy of complex traits substantially. |
URL | http://www.dissect.ed.ac.uk |
Company Name | Omecu |
Description | Omecu develops a cloud-based platform for the analysis of large-scale genetic and epidemiologic datasets, with the aim of democratising genome data. |
Year Established | 2021 |
Impact | Received support from the Wellcome iTPA programme, participated in the SETSquared ICURe programme, and received Medical Research Council grants. They also received funding from the University's Data-Driven Entrepreneurship Seed Fund and Fast Track Mentor initiatives, supported by the Scottish Funding Council. |
Website | https://omecu.com/ |
Description | Media interview |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Radio interviews with BBC Scotland, La Fm ( of the RCN-national radio of Colombia) , Newstalk (Ireland). Almetric score of 259. |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.biotechniques.com/news/The-Role-of-Genes-in-the-Pursuit-of-Love/biotechniques-363094.html... |
Description | School visit (Sciennes Primary in Edinburgh) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | We talk about the brain to P5 pupils. The main objective of the activity was to demonstrate that different parts of the brain play different roles and that when genes disregulate in one part of the brain (for instance, because of cancer) this lead to a set of specific symptoms depending on the part of the brain. We then made an activity where the pupils used plasticine to represent different parts of the brain. |
Year(s) Of Engagement Activity | 2016 |