HSM Polygenic score methodology in the emerging field of Polygenic Epidemiology

Lead Research Organisation: King's College London
Department Name: Social Genetic and Dev Psychiatry Centre

Abstract

Over the last decade, researchers in the field of medical genetics have conducted a huge number of 'genome-wide association studies' (GWAS), which have identified thousands of genetic variants associated with hundreds of diseases, psychiatric disorders and human traits. While this endeavour has been exceptionally successful in highlighting regions of the human genome that contribute to human disease, it has also revealed that most human diseases are influenced by hundreds of genetic variants that each have a small impact on disease risk. This makes it extremely difficult to exploit an individual's genetic profile to predict how likely they are to contract different diseases or suffer adverse reactions to medical treatments, such as pharmaceutical drugs. As a result, genetics has yet to fulfil its greatest promise of initiating an era of stratified medicine: where disease preventions and treatments are individualised according to genetic profile.

However, methods are now being developed and applied to genetic data that take special account of the 'polygenic' nature of how genetics influences our disease risk. These methods are known collectively as 'polygenic score methods'. Based on their theoretical evaluation and practical application so far, early signs indicate that they may justify renewed hope in the potential of genetics to deliver stratified medicine. However, before this is possible, much theoretical work needs to go in to the development and testing of these polygenic score methods. A greater understanding of their performance is crucial, they need to be refined to produce more accurate disease risk prediction and extended to solve a wider variety of problems in medical genetics, and a set of strategies needs to be developed for how they can be exploited for stratified medicine. This proposal aims to achieve each of these goals.

This proposal begins with a detailed evaluation and comparison of the performance of different polygenic score methods. The results from this study, based on computer simulation, will offer insights into how these methods perform, what inferences they can make, and guide researchers on which method to use depending on their scientific question and study. The software tool that we will produce to perform this study will be made freely available as a web application, from which researchers will be able to: inspect results from our study presented in a customisable format, simulate and download their own polygenic data, and download and extend our simulation code to perform polygenic method evaluations and comparisons of their own.

Next the proposal introduces a set of new polygenic score methods, each one tailored to answer a specific scientific question. For example, while one method is designed to assess whether two diseases share a common genetic basis, useful for guiding which pharmaceutical drugs could be repurposed for other diseases, another helps determine whether an observed association between a risk factor and a disease is truly causal, and yet another infers the most likely direction of causality in such a relationship, which could answer key medical such as: Do high lipid levels increase the risk of Alzheimer's or does the onset of Alzheimer's result in higher lipid levels?

Finally, the proposal investigates a number of strategies for using polygenic score methods to aid in stratified medicine. We will outline and test approaches for using polygenic score methods to stratify heterogenous disorders into sub-sets of more biologically homogenous disorders, which will help make individual diagnosis, and thus subsequent treatment, more specific. We also describe ways in which polygenic scores on individuals can be used for more nuanced selection of participants in clinical trials, reducing their efficacy, cost, and ultimately leading to the development of drugs and treatments more tailored to individual genetic profile.

Technical Summary

The latest analyses of GWAS data reveal a substantial polygenic component to most common human diseases. This poses a huge challenge to using genetics for phenotype prediction, and thus to effectuating stratified medicine. However, the development of prediction methods tailored to the polygenicity of human disease can transform results.

While the stringent GWAS significance threshold has ensured the discovery of genuine genotype-phenotype associations, it can be barrier to prediction. Phenotype prediction from genetic profile is typically optimised when applying a far more liberal threshold for SNP selection. While even polygenic prediction is inaccurate at an individual-level, its aggregation across samples enables a raft of applications that perform association testing.

The first polygenic score methods have now been proposed and applied to genome-wide SNP data. However, the increasing rate in their application is preceding careful examination of their relative power and accuracy. To address this, we perform the first comprehensive comparison study of their performance and build a simulation framework, accompanied by software tool and web application, to enable future benchmarking of polygenic methods across a consistent platform.

The polygenic risk score approach is the focus for method development here. Unlike the alternatives, PRS both exploits large-scale GWAS summary data and permits individual-level prediction. This combination will be key to realising stratified medicine as GWAS sample sizes grow and we outline a range of strategies to exploit the PRS approach for stratified medicine.

To date, only a single PRS method has been developed and utilised for a variety of applications. We introduce a suite of PRS methods, each tailored to their application. This sensitivity to the relevant scientific question is expected to yield substantial gains in power. Novel applications of PRS, such as testing direction of causality, are also proposed.

Planned Impact

Drug development companies: Drug development is a huge national and international industry, which in recent years has recognised the lead that genes and genetic variants discovered in genome-wide association studies may provide for drug and therapeutic developments. A new field, pharmacogenetics, which primarily seeks to reduce the side-effects of developed drugs through knowledge of their genetic causes, has been borne out of the interest in genetic association studies. The focus of this project on disease risk prediction and on methods for using polygenic risk score methods for stratified medicine (Aim 3), which include approaches for disease stratification and on potential insights that could inform drug repurposing, should be of huge interest to drug development companies. Drug development companies are also likely to be hugely interested in any refinements of phenotype definition produces in our work into stratified medicine (Aim 3), and pharmaceuticals could gain immediate utility in particular from our research into using polygenic risk scores to perform screening in clinical trials.

Diagnostic companies: An important part of this project aims to produce new and refined phenotype definitions (Aim 3), which in many cases may necessitate or stimulate new measuring techniques, or an increased volume of previously used techniques, required for diagnosis. There has been a huge growth in molecular diagnostic companies since the emergence of GWAS in 2007, and more recently in the new area of 'companion diagnostics', where tests are performed that inform the efficacy of drugs either after or during drug development. Any additional insights into disease stratification and phenotype definition gained through the development of our polygenic risk score methods could be hugely beneficial to these companies.

Healthcare (NHS) policy makers: Similarly to diagnostic companies, any refinement of phenotype definition or diagnosis produced by this project could have a huge impact on policy decisions within the healthcare system, as well as on governmental public health advise based on epidemiological findings relating to new or refined phenotypes.

Clinicians: The stratified medicine aspect of this project in particular could be extremely interesting to clinicians working in the area of the related phenotypes and also by contributing to more accurate and systematic phenotype definition. More precise and systematic diagnoses can circumvent the problems of clinical subjective judgment in making clinical diagnostic decisions, which can in some cases lead to legislative procedures.

Publications

10 25 50

 
Description Brain2Bee - How dopamine effects social and motor ability - from the human brain to the honey bee
Amount £1,266,034 (GBP)
Funding ID 757583 (EU/ERC reference number) 
Organisation European Research Council (ERC) 
Sector Public
Country Belgium
Start 07/2018 
End 06/2023
 
Description Genomic risk in clinic care to promote health equity in New York City patients
Amount $7,000,000 (USD)
Funding ID 1U01HG011176-01 
Organisation National Institutes of Health (NIH) 
Sector Public
Country United States
Start 06/2020 
End 04/2025
 
Description Longitudinal and genetic evaluation of type 2 diabetes and major depression
Amount £1,100,000 (GBP)
Funding ID MR/X009815/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 05/2023 
End 04/2026
 
Description Longitudinal and genetic evaluation of type 2 diabetes and major depression
Amount £1,100,000 (GBP)
Funding ID MR/X009815/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 05/2023 
End 04/2026
 
Description Longitudinal and genetic evaluation of type 2 diabetes and major depression
Amount £1,100,000 (GBP)
Funding ID MR/X009815/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 05/2023 
End 04/2026
 
Description Next-generation, pathway-specific, polygenic risk scores
Amount $3,100,000 (USD)
Funding ID 1R01MH122866-01 
Organisation National Institutes of Health (NIH) 
Sector Public
Country United States
Start 04/2020 
End 03/2025
 
Description Translation of Anorexia Nervosa variants into genes, pathways and tissues
Amount $750,000 (USD)
Organisation The Klarman Family Foundation 
Sector Charity/Non Profit
Country United States
Start 09/2019 
End 09/2022
 
Description Using genetics to stratify patients and improve prediction of clinically relevant outcomes in psychiatry and neurology
Amount £300,000 (GBP)
Funding ID 222811/Z/21/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 10/2021 
End 10/2025
 
Description Who are the depressed patients that have increased inflammation?
Amount £70,101 (GBP)
Organisation Brain & Behaviour Research Foundation 
Sector Charity/Non Profit
Country United States
Start 03/2018 
End 03/2019
 
Description iCASE studentship
Amount £60,000 (GBP)
Organisation UCB Pharma 
Sector Private
Country United Kingdom
Start 10/2022 
End 09/2026
 
Title PRSet: Polygenic Risk Score Pathway Analyses 
Description Here we introduce a new method and software, PRSet, for performing pathway analyses based on polygenic risk scores, extending our PRSice software (www.PRSice.info). While most pathway analysis methods exploit only GWAS summary results, PRSet calculates 'pathway scores' for each individual using the PRS approach, which leverages both GWAS results and individual-level data. The phenotypic variance explained by pathway-specific PRS in a target sample acts as a natural way to compare the contribution of different pathways to disease aetiology. While PRSet is as powerful as the leading pathway methods for causal pathway identification, it can additionally exploit many of the applications offered by polygenic scores. For example, overlap in pathway aetiology among different diseases can be easily tested. We are currently testing the performance of PRSet via simulation and application to the UK Biobank data, and plan to submit a paper on the method/software within the next month. 
Type Of Material Computer model/algorithm 
Year Produced 2018 
Provided To Others? Yes  
Impact Only recently made available, so none so far. 
URL https://choishingwan.github.io/PRSice/quick_start_prset/
 
Title Supporting data for "PRSice-2: Polygenic Risk Score Software for Large-Scale Data" 
Description Polygenic Risk Score (PRS) analyses have become an integral part of biomedical research, exploited to gain insights into shared aetiology among traits, to control for genomic profile in experimental studies, and to strengthen causal inference, among a range of applications. Substantial efforts are now devoted to biobank projects to collect large genetic and phenotypic data, providing unprecedented opportunity for genetic discovery and applications. To process the large-scale data provided by such biobank resources, highly efficient and scalable methods and software are required. Here we introduce PRSice-2, an efficient and scalable software for automating and simplifying polygenic risk score analyses on large-scale data. PRSice-2 handles both genotyped and imputed data, provides empirical association P-values free from inflation due to overfitting, supports different inheritance models and can evaluate multiple continuous and binary target traits simultaneously. We demonstrate that PRSice-2 is dramatically faster and more memory-efficient than PRSice and alternative polygenic score software, LDpred and lassosum, while having comparable predictive power. This combination of efficiency and power will be increasingly important as data sizes grow and as the applications of PRS become more sophisticated; for example, when incorporated into high-dimensional or gene-set based analyses. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
 
Description Reinsurance Group of America (RGA) collaboration - Prediction of major morbidity and mortality in the UK Biobank 
Organisation Reinsurance Group of America
Country United States 
Sector Private 
PI Contribution I, along with Prof Cathryn Lewis as co-PI at King's College London, have contributed to the design of this 12 month project and will supervise a research assistant who will perform all the analyses and work planned. The project involves prediction of major morbidity (cardiovascular disease, stroke, diabetes, cancer) and mortality of individuals in the UK Biobank data set, and comparing the prediction achieved from environmental risk factors with prediction from genetics (based on polygenic risk scores).
Collaborator Contribution The partners in the collaboration - RGA - have contributed to the design of the project and are providing all funding required, which corresponds to 5% time of myself and the other PI (Prof Cathryn Lewis), 100% costs of the RA employed on the project, as well as computing etc costs - totalling £123k.
Impact The 12 month project that has been created on the basis of this collaboration only began on 01/03/17 and so as yet there have been no outcomes from the collaboration (other than the provision of funding from RGA, recruitment of the Research Assistant). This is a multi-disciplinary project in that it involves Statistics Genetics, Genetic Epidemiology and Epidemiology, and also the perspective of the commercial partner's interests in the project.
Start Year 2016
 
Title GenoPred 
Description GenoPred pipeline that enables reference-standardised polygenic scores to be calcualted from either a single individual or a research study. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Impact Implemented in other Universities in the UK and internationally for calclating polygenic risk scores, for example the Intervene EU consortium, and at the University of Cardiff. 
 
Title PRSet: Pathway-based polygenic risk score analyses and software 
Description We describe a method and accompanying software, PRSet, for computing and analysing pathway-based PRSs, in which polygenic scores are calculated across genomic pathways for each individual. We evaluate the potential of pathway PRSs in two distinct ways, creating two major sections: (1) In the first section, we benchmark PRSet as a pathway enrichment tool, evaluating its capacity to capture GWAS signal in pathways. We find that for target sample sizes of >10,000 individuals, pathway PRSs have similar power for evaluating pathway enrichment as leading methods MAGMA and LD score regression, with the distinct advantage of providing individual-level estimates of genetic liability for each pathway -opening up a range of pathway-based PRS applications, (2) In the second section, we evaluate the performance of pathway PRSs for disease stratification. We show that using a supervised disease stratification approach, pathway PRSs (computed by PRSet) outperform two standard genome-wide PRSs (computed by C+T and lassosum) for classifying disease subtypes in 20 of 21 scenarios tested. As the definition and functional annotation of pathways becomes increasingly refined, we expect pathway PRSs to offer key insights into the heterogeneity of complex disease and treatment response, to generate biologically tractable therapeutic targets from polygenic signal, and, ultimately, to provide a powerful path to precision medicine. 
Type Of Technology Software 
Year Produced 2023 
Impact The PRSet software will enable research teams internationally to apply the pathway-based PRS to identify biological subtypes of disease. 
URL https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010624
 
Title PRSet: Polygenic Risk Score Pathway Analyses 
Description Here we introduce a new method and software, PRSet, for performing pathway analyses based on polygenic risk scores, extending our PRSice software (www.PRSice.info). While most pathway analysis methods exploit only GWAS summary results, PRSet calculates 'pathway scores' for each individual using the PRS approach, which leverages both GWAS results and individual-level data. The phenotypic variance explained by pathway-specific PRS in a target sample acts as a natural way to compare the contribution of different pathways to disease aetiology. While PRSet is as powerful as the leading pathway methods for causal pathway identification, it can additionally exploit many of the applications offered by polygenic scores. For example, overlap in pathway aetiology among different diseases can be easily tested. We are currently testing the performance of PRSet via simulation and application to the UK Biobank data, and plan to submit a paper on the method/software within the next month. 
Type Of Technology Software 
Year Produced 2018 
Impact None so far, since the software has only been released in the last month - however, we note that many in the field are keen to use this approach and have contacted us asking for the software pre-release. 
URL https://choishingwan.github.io/PRSice/quick_start/
 
Title PRSice version 2 (Polygenic Risk Score software) 
Description This is a major update of our polygenic risk score software PRSice, written by a postdoc of the PI (funded separately) and supervised by the PI. This new version is a rewrite of the entire PRSice code in C++, reducing errors, increasing interpretability of code (so that others can extend more easily), and speeding up the program by ~50x (making it more feasible for many runs/tests on large data sets such as the soon to be released N=500k UK Biobank data). The new version has also been coded in such a way that it can be easily extended for producing the new methods planned in this project (outlined in Aim 2 of proposal). 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact The new software has only recently been completed, and is currently in beta form and undergoing testing before wider release (though it is already available on the github page provided below), and so as yet has had no notable impacts. However, given the marked speed-up, we expect it to be extremely useful for running and testing on the upcoming N=500k UK Biobank data in the short term, and we plan to publish a protocol paper on its use and to highlight the new version to the field. 
URL https://github.com/choishingwan/PRSice
 
Title PRSice-2: updated Polygenic Risk Score software [completed] 
Description This updated version of PRSice-2 has now been completed (see entry from last year, when this was work-in-progress). The software has been freely available for use publicly use since October 2017 and has had very large uptake in usage by the field (we have hundreds of PRSice google-group users, hundreds of homepage visitors every week, and PRSice is now the most cited such software - 170 citations according to google on 15/03/18 - and thus the leading polygenic risk score software). The key advance of PRSice-2 is that it is now scalable to extremely large data sets such as the UK Biobank (and compatible in terms of data format), while various technical and running issues have also been improved. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact The PRSice software is the leading PRS software in the field (see above) with the paper on PRSice receiving 170 citations over the 3yrs since its publication, most of which relate to direct use of the software (this represents a minimum bound since many use software without citing). Therefore it has assisted the fields of genetic epidemiology and medical genetics substantially, making some contribution to polygenic scores being considered by many one of the key recent breakthroughs in science (eg. one of the 'top 10 breakthroughs of 2018' by MIT Technology Review). Without the updated version, PRSice-2, the field would have difficulty in performing standard PRS analyses on the extremely large data sets that research is now regularly being performed on in the field (eg. the UK Biobank data). 
URL https://choishingwan.github.io/PRSice/
 
Title Polygenic score visualiser 
Description Web-based interface to visualise polygenic scores and estimate both relative and absolute risk of disease (or continuous trait) from a polygenic z-score and predictive ability 
Type Of Technology Webtool/Application 
Year Produced 2022 
Impact Used to communicate risk in academic settings and as an exploratory tool for teaching. 
URL https://opain.github.io/GenoPred/PRS_to_Abs_tool.html
 
Description Interview for TV company 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact I was contacted by an employee of the TV company 'Renegade Pictures' (https://www.renegadepictures.co.uk/whatwedo.aspx) requesting an interview so that they could find out more about Polygenic Risk Scores and their potential utility for predicting behavioural and personality traits from genetics, for a TV documentary focusing on how nature and nurture combine to produce our characteristics. The TV show is pre-commission stage, but could have a very large audience (millions) considering the success of previous TV shows from this company. We have agreed that I will act as an expert point of contact for the show and will be re-contacted when they next need my advice and may be interviewed as part of the show. The TV company were aware of my expertise in the area due to my publications and software on the topic, much of which has been the direct result of this award funding.
Year(s) Of Engagement Activity 2018
URL https://www.renegadepictures.co.uk/whatwedo.aspx
 
Description Interview on polygenic risk scores for Illumina video 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Recorded video on polygenic risk scores for Illumina descriibing the potential impact of these scores. Hosted on You Tube.
Year(s) Of Engagement Activity 2019
URL https://www.youtube.com/watch?v=amZeqgWfA-M
 
Description Podcast for Association of Child and Adolescent Mental Health 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Podcast episode on the potential for using polygenic risk scores in child and adolescent psychiatry. Generated substantial interest on twitter.
Year(s) Of Engagement Activity 2021
URL https://www.acamh.org/blog/investigating-the-interplay-of-genetics-and-environment-on-development-pr...
 
Description Podcast for SanoGenetic 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Podcast for Sanogenetics, a online genetics company that facilitates genetics research, linking between researchers, patients and study participants.
Year(s) Of Engagement Activity 2019
URL https://podcasts.apple.com/gb/podcast/the-genetics-of-depression/id1462418412?i=1000440112154
 
Description Polygenic Risk Score Summer School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Myself and team member Sam Choi directed a one-day summer school "Polygenic Risk Score Analyses", held in my department, that acted as an introduction to the theory and application of polygenic risk scores. The summer school was oversubscribed with applications and filled to a capacity 45 participants weeks before the event, with participants drawn from across Europe (and South Africa), both from Academia and Industry. The feedback from the day was extremely positive, with 83% of participants responding that they found the workshop lectures "very" (highest of 5 categories) interesting and useful. Some testimonials from the official feedback: "For my level, this introduction was exactly what I needed to understand why we would want to create PRS and how we would go about doing this.". "Pitched well, good introduction". "Excellent talks, very informative and enlightening. Always a pleasure to listen to Paul's presentations". "Sam is super knowledgeable and approachable. It was a pleasure to attend the workshops that he organised". "Excellent workshop and very helpful for my work! I am very happy that I attended and I could not be happier with the content." We will be holding the summer school again in June 2019 (with some spaces filled pre-opening of registration due to requests from those that could not attend last year's summer school).
Year(s) Of Engagement Activity 2018
URL https://www.kcl.ac.uk/ioppn/depts/sgdp-centre/study/summerschool/course-2-polygenic-risk-score-analy...
 
Description Polygenic Risk Score Summer School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Team members Paul O'Reilly and Sam Choi directed a one-day summer school "Polygenic Risk Score Analyses", held in the host institute department at King's College London, that acted as an introduction to the theory and application of polygenic risk scores. This is the second year of running this summer school (see entry for the 2018 summer school for more details). The summer school was again oversubscribed with applications and filled to a capacity 46 participants, with participants drawn from mainly Europe but also internationally, both from Academia and Industry. The feedback from the day was extremely positive.
Year(s) Of Engagement Activity 2019
URL https://www.kcl.ac.uk/events/polygenic-risk-score-analyses
 
Description SGDP Schools Open Day 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Created a polygenic game to predict height as part of a Schools Open Day. Discussed genetics as a quantitative trait with students, teachers and parents.
Year(s) Of Engagement Activity 2018,2020
 
Description The Conversation Polygenic Score article 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Article in The Conversation, as an explainer of polygenic risk scores, highlighting genetic loading as a continuous trait
Year(s) Of Engagement Activity 2022
URL https://theconversation.com/genetics-helps-estimate-the-risk-of-disease-but-how-much-does-it-really-...