Genomic prediction of anthropomorphic traits using hundreds of thousands of individuals
Lead Research Organisation:
University of Edinburgh
Department Name: The Roslin Institute
Abstract
Most of the common diseases that afflict humans and other traits of medical relevance (for instance, blood pressure or cholesterol levels) are determined by the interplay of genetic and environmental factors. Unlike environmental factors, genetic factors can be accurately and inexpensively measured, are constant over time and can be measured as early as at birth. Genetic information could potentially be useful to identify what people are at highest risk of disease and therefore preventative strategies could be designed for those individuals that need them most. However, prediction of risk or other traits that are determined by thousands of genes has been very challenging because, until now, there were not sufficient people with trait and genetic information recorded to yield accurate predictions, or computational tools to analyse the large volumes of data needed to yield accurate predictions. The UK Biobank (a large epidemiological study) has now around 500,000 individuals genotyped (i.e. with genetic information) and with phenotypes, and we have developed computer software to analyse this cohort in the UK national supercomputer called ARCHER. We will develop prediction models for nine exemplar traits (e.g. height or body weight) in this cohort to show that prediction from genetic markers is feasible. If we could show that these traits can be accurately predicted, it would mean that predictions could too work for diseases such as colorectal or breast cancer. This would open the way to personalised medicine.
Technical Summary
Genome-wide association studies have identified a large number of genetic variants associated with complex traits. Despite the importance of these discoveries, it is clear that translation into medically useful tools that could help tailoring disease management to the genetic make-up of the individual has lagged behind. We will use genotypes to predict nine quantitative traits measured in the UK Biobank, which comprises around 500,000 individuals. We have shown by simulation that, for a range of heritabilities and genetic architectures, this sample size should allow us to achieve prediction accuracies that are clinically relevant. We now propose to test this using real data. Both, the large sample size of the UK Biobank and the software that we have developed will allow us to develop accurate genetic predictors of these complex phenotypes.
We will use mixed linear models to model additive and non-additive sources of genetic variation, and will develop computationally efficient approaches that could be used to combine information of multiple datasets when it is not possible to share individual level data across multiple cohorts. Our predictions will be validated using internal and external cross-validation.
This is a proof of principle proposal: if genomic prediction worked for exemplar quantitative traits, then it would likely work for other disease or quantitative traits provided that large training datasets were available for analysis and would take us one step closer to personalised medicine.
We will use mixed linear models to model additive and non-additive sources of genetic variation, and will develop computationally efficient approaches that could be used to combine information of multiple datasets when it is not possible to share individual level data across multiple cohorts. Our predictions will be validated using internal and external cross-validation.
This is a proof of principle proposal: if genomic prediction worked for exemplar quantitative traits, then it would likely work for other disease or quantitative traits provided that large training datasets were available for analysis and would take us one step closer to personalised medicine.
Planned Impact
In the short term, our research will benefit the scientific community working on genomic and phenotypic prediction of complex traits. Extension of our research to other phenotypes is straightforward, and therefore it will benefit researchers working on a broad variety of traits, provided that sufficiently large numbers of samples are available for those traits or diseases. As there are currently GWAS meta-analysis consortia for a wide variety of traits in humans, the number of researchers that might benefit from our research is large, especially if our work on meta-analytical methods demonstrates that predictions made by combining results from individual cohorts can still achieve acceptable accuracies. Researchers in phenotypic prediction in humans will gain insights on the achievable accuracy of prediction in human populations and also from our method comparison work. Both will direct further methodological and translational research in the field. Potential translational research that can stem from our work may look at translating our findings first into cancer GWAS and after that into cancer screening programs.
Our research will clarify if accurate individual prediction of phenotypes is possible or significantly aided, by using genomic information. Extension of this research to other clinically relevant phenotypes is straightforward. Therefore, the outcome of our research will benefit health-care professionals and policy makers, as it will, in the medium term, guide practice and policy with regards to the use of genome-wide genomic information to either develop personalized treatments of individual patients or tailor interventions to specific strata of the population.
The possibility of targeting particular clinical interventions to the individual offers the opportunity of benefiting patients and allows for a more efficient use of health-care resources, which has economic implications.
As well as guiding treatment allocation for patients, accurate genomic prediction could also be beneficial in the context of clinical trials, where patients could be allocated to groups on the basis of their predicted response to drugs or treatments, thereby reducing the costs of the trials.
Our computational tools will be useful for other efforts such as the 100,000 genomes project from Genomics England. We will also have an impact on the plant and animal breeding industry, not only through our genomic prediction research, but also through the provision of our software DISSECT, that can be used by plant animal breeders to predict genomic values of selection candidates, and complement or replace currents selection tools. Our tools would increase the competitiveness of the UK breeding industry as they will facilitate the sustained improvement of the breeding stock in plants and animals, which in turn would lead an increase in sales and market share for the breeding companies, improved margins for their customers, benefit consumers through lower food costs and to the government through an increase in revenue from taxes paid by successful companies.
Our research will showcase the benefits of the UK Biobank to the scientific community and the general public (including the half a million participants of the UK Biobank). The UK Biobank is not only an great scientific resource but also has the potential, partly given the large numbers of volunteers that make-up the cohort, to be a great tool for science communication and public engagement in science. From it, it is clear that the public is involved and needed in research, not only through the public funding provided, but also through active participation in building-up the resources.
Finally, the post-doctoral researcher employed on the grant will benefit from excellent training and exposure to our industrial and academic collaborators, which will increase her/his opportunities of future employment both within and outside of academia.
Our research will clarify if accurate individual prediction of phenotypes is possible or significantly aided, by using genomic information. Extension of this research to other clinically relevant phenotypes is straightforward. Therefore, the outcome of our research will benefit health-care professionals and policy makers, as it will, in the medium term, guide practice and policy with regards to the use of genome-wide genomic information to either develop personalized treatments of individual patients or tailor interventions to specific strata of the population.
The possibility of targeting particular clinical interventions to the individual offers the opportunity of benefiting patients and allows for a more efficient use of health-care resources, which has economic implications.
As well as guiding treatment allocation for patients, accurate genomic prediction could also be beneficial in the context of clinical trials, where patients could be allocated to groups on the basis of their predicted response to drugs or treatments, thereby reducing the costs of the trials.
Our computational tools will be useful for other efforts such as the 100,000 genomes project from Genomics England. We will also have an impact on the plant and animal breeding industry, not only through our genomic prediction research, but also through the provision of our software DISSECT, that can be used by plant animal breeders to predict genomic values of selection candidates, and complement or replace currents selection tools. Our tools would increase the competitiveness of the UK breeding industry as they will facilitate the sustained improvement of the breeding stock in plants and animals, which in turn would lead an increase in sales and market share for the breeding companies, improved margins for their customers, benefit consumers through lower food costs and to the government through an increase in revenue from taxes paid by successful companies.
Our research will showcase the benefits of the UK Biobank to the scientific community and the general public (including the half a million participants of the UK Biobank). The UK Biobank is not only an great scientific resource but also has the potential, partly given the large numbers of volunteers that make-up the cohort, to be a great tool for science communication and public engagement in science. From it, it is clear that the public is involved and needed in research, not only through the public funding provided, but also through active participation in building-up the resources.
Finally, the post-doctoral researcher employed on the grant will benefit from excellent training and exposure to our industrial and academic collaborators, which will increase her/his opportunities of future employment both within and outside of academia.
People |
ORCID iD |
Albert Tenesa (Principal Investigator) | |
Pau Navarro (Co-Investigator) |
Publications
Canela-Xandri O
(2015)
A new tool called DISSECT for analysing large genomic data sets using a Big Data approach.
in Nature communications
Canela-Xandri O
(2015)
Accurate genetic profiling of anthropometric traits using a big data approach
Caballero A
(2015)
The Nature of Genetic Variation for Complex Traits Revealed by GWAS and Regional Heritability Mapping Analyses.
in Genetics
Canela-Xandri O
(2016)
Improved Genetic Profiling of Anthropometric Traits Using a Big Data Approach.
in PloS one
Rowlatt A
(2016)
The heritability and patterns of DNA methylation in normal human colorectum.
in Human molecular genetics
Rawlik K
(2016)
Evidence for sex-specific genetic architectures across a spectrum of human complex traits.
in Genome biology
Rawlik K
(2016)
Imputation of DNA Methylation Levels in the Brain Implicates a Risk Factor for Parkinson's Disease.
in Genetics
Tenesa A
(2016)
Genetic determination of height-mediated mate choice.
in Genome biology
Muñoz M
(2016)
Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank.
in Nature genetics
Sanabria-Salas MC
(2017)
IL1B-CGTC haplotype is associated with colorectal cancer in admixed individuals with increased African ancestry.
in Scientific reports
Title | Database of genetic associations |
Description | This is the largest atlas of genetic associations with complex traits. It includes associations of over 9 million genetic polymorphisms and 778 complex traits. |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | This web site has had over 180,000 queries from around 10,000 researchers from across 100 countries. |
URL | http://geneatlas.roslin.ed.ac.uk |
Description | Collaboration with GSK |
Organisation | GlaxoSmithKline (GSK) |
Country | Global |
Sector | Private |
PI Contribution | We provide the tool for analyses of GWAS and the expertise in mixed linear models. |
Collaborator Contribution | GSK provides curated phenotypic data from UK Biobank |
Impact | None yet. A CDA is being negotiated. |
Start Year | 2017 |
Description | UK Biobank Research Analysis Platform |
Organisation | UK Biobank |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | We were invited by Mark Effingham (Depute CEO of UK Biobank) to be one of the avant-garde teams to access the UK Biobank research analysis platform to adapt and deploy some of the tools we have developed for the analysis of genomic data. |
Collaborator Contribution | We are working with UK Biobank and DNAnexus to set up the compute configuration to allow fast genome-wide association studies with array genotypes, imputed genotyped, whole exome and whole genome data. |
Impact | No outputs yet. |
Start Year | 2020 |
Title | UpdateDISSECT |
Description | The software can perform genome-wide association studies in large structured populations.The software was designed with farm animal populations in mind. |
Type Of Technology | Software |
Year Produced | 2018 |
Open Source License? | Yes |
Impact | We used the software in the largest genotype-phenotype dataset publicly available (UK Biobank) as an exemplar. |
Company Name | Omecu |
Description | Omecu develops a cloud-based platform for the analysis of large-scale genetic and epidemiologic datasets, with the aim of democratising genome data. |
Year Established | 2021 |
Impact | Received support from the Wellcome iTPA programme, participated in the SETSquared ICURe programme, and received Medical Research Council grants. They also received funding from the University's Data-Driven Entrepreneurship Seed Fund and Fast Track Mentor initiatives, supported by the Scottish Funding Council. |
Website | https://omecu.com/ |
Description | Maths and biology. James Gillespies' High School |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Schools |
Results and Impact | 20-30 pupils and 3-4 teachers attended for presentations from my lab on how numerical skills (mathematics and computing) are applied in biological settings. One of these students, now at University has visited since the Roslin Institute to speak to other researchers. |
Year(s) Of Engagement Activity | 2018 |
Description | Michigan State University |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Michigan University Research Seminars with a wide variety of audience ranging from animal breeders and quantitative geneticists to medical doctors. |
Year(s) Of Engagement Activity | 2017 |
Description | Participating in Sciennes Primary Science Fair |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Public/other audiences |
Results and Impact | We used balloons and other materials to create cells, and explain the function of each part of the cell. |
Year(s) Of Engagement Activity | 2016 |
Description | Seminar - MRC Centre for Neuropsychiatric Genetics and Genomics |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Part of research institution seminar series |
Year(s) Of Engagement Activity | 2017 |