HSM:Development of methodology and computationally efficient software for analysis of PGx exome sequencing studies of complex "time-to-event" outcomes

Lead Research Organisation: University of Liverpool
Department Name: Institute of Translational Medicine

Abstract

Personalised medicine is an approach to treating patients where individual information like demographics, clinical and genetic information is used in deciding how to treat them - for example which drug or how much of a drug to give. The ultimate goal is to maximise benefit and minimise harm from treatment.
To identify genetic information to be used in guiding treatment, pharmacogenetics (PGx) studies are used, which involve analysing DNA of patients and looking for correlation between their genetic information and drug response. The most common study design is the genome-wide association study (GWAS), which involves testing patients at hundreds of thousands of genetic variants known as single-nucleotide polymorphisms (SNPs) and measuring their drug response. However, the statistical methods used to analyse GWAS are suitable only for investigating correlation with common SNPs, where the minor allele frequency (MAF)-the frequency of the least common version of SNP-is greater than 5%.
When common SNPs have been associated with outcomes of treatment response, their effect has typically been small, so GWAS have only had very limited success in identifying genetic variants to guide treatment. An example of this is PGx research into anti-epileptic drug response, for which GWAS have been conducted, yet little is still known about its genetic predictors.
Whole-exome sequencing is an alternative approach to GWAS with many advantages and is cheaper than sequencing the whole genome. In this approach, genetic information is collected from areas of the genome (exomic regions) with a significant proportion of genetic variants which are highly likely to have an effect on molecular function, so are biologically feasible as predictors of treatment response. The approach also allows for rare variants (MAF<5%) to be investigated and they can be grouped together into sets believed to have similar molecular and biological effect, then analysed together using what is known as 'gene-based' analyses. These analyses are generally better at identifying correlations with outcome than single-variant approaches.
Analysing whole-exome sequencing data requires different statistical methods to analysing GWAS data, and whilst appropriate methods have been developed, these are only for studies with binary or continuous outcomes. PGx studies are often interested in 'time-to-event' outcomes, for example time to disease remission or time to drug withdrawal, so we are experiencing an analytical bottleneck for identifying genetic variants associated with PGx outcomes. In this project we aim to address this bottleneck by developing novel statistical methods and appropriate software for time to event outcomes that can cope with the scale and complexity of exome sequence data. To achieve these aims, we have set the following objectives:

1. Develop new statistical methods for PGx studies with complex time to event outcomes;
2. Develop user-friendly, computationally efficient, free software implementing the methods, to deal with the scale and complexity of exome sequence datasets.
3. Demonstrate efficiency of the new methods compared to existing analysis approaches by simulating exome-sequence data under a variety of different assumptions.
4. Apply the new methods and software to an exome-sequence dataset to identify biomarkers of anti-epileptic (AED) response.
5. Offer training on the new methods and software through practical workshops.

We will also apply our methods to whole exome sequence data from UK Biobank, to identify genetic variants associated with time to onset of cardiovascular disease and type 2 diabetes. Our proposed methodology and software will allow for more powerful analysis of age of onset of these diseases, pointing to genes that lead to disease occurring earlier in life, where the effect of environmental risk factors are less important, and the impact of treatment is greater. These genes could then be used in drug development.

Technical Summary

Personalised medicine promises to transform healthcare by maximising benefit whilst minimising harm from drugs. Pharmacogenetics (PGx) is of key importance to personalised medicine. Whole exome-sequencing offers compelling advantages over genome-wide association studies due to enrichment with functional variants and ability to investigate rarer SNPs likely to explain a larger proportion of heritability of complex traits. By using annotation, rare coding variants can be aggregated into groups exerting broadly similar molecular effects, and subjected to "gene-based" analyses, offering considerable power advantages over single-variant methodologies. However, available methodologies for gene-based analyses are aimed at binary and quantitative traits, making them unsuitable for PGx studies, which often have time to event outcomes e.g. time to remission or drug failure. An analytical bottleneck therefore exists for identifying genetic factors associated with treatment response.
The proposed research aims to address this bottleneck. We aim to identify suitable models for single-variant analyses, including models to deal with more complex time to event outcomes e.g. involving competing reasons for drug withdrawal or those requiring a multi-component mixture modelling approach to fully reflect different remission status (i.e. immediate remission, delayed remission, no remission). The models will be extended to allow gene-based analyses, borrowing from gene-burden and gene-dispersion tests available for binary and quantitative traits. Of key importance, we will develop user-friendly, computationally efficient software for implementing the methodology. Whilst our methods and software will initially be applied to investigate genetic predictors of anti-epileptic drug response and age of onset of cardiovascular disease and type II diabetes, they will have valuable application across a range of complex disease and PGx traits which are currently not able to be explored optimally.

Planned Impact

Our project will lead to novel, accessible methodologies that allow a more efficient approach to analysing exome sequence studies with time to event outcomes, maximising the likelihood of identifying causal genetic biomarkers. These biomarkers will improve prediction of treatment outcome, inform a personalised approach to treatment, and inform potential targets for drug development. Therefore, it will impact patients, healthcare professionals, health care policies and costs, drug development and diagnostic companies, as well as statisticians and other genetic researchers. Specifically:

Patients and Healthcare professionals: Our novel methodology will be used to analyse an exome sequence study of anti-epileptic drug (AED) response in newly-diagnosed patients. This will maximise the likelihood of identifying genetic biomarkers associated with AED response. These biomarkers will be used by healthcare providers to determine how newly-diagnosed epilepsy patients will likely respond to treatment, informing their approach to treatment. In using a more informed and personalised approach, patients will more likely benefit sooner, and less likely to suffer adverse events thereby improving quality of life.

Our methodology will also be used in other clinical areas to identify genetic biomarkers of treatment response for use by healthcare providers for informing treatment approach, again leading to improved outcome for the patient. Whereas genome-wide association studies have had very limited success in identifying markers of treatment response, it is anticipated that the ability to analyse exome-sequence data more efficiently, with rarer and more biologically plausible variants, will accelerate the advancement of personalised medicine.

Healthcare policy and cost: Personalised medicine promises to transform delivery of healthcare by improving patient outcome whilst reducing costs. However, its clinical implementation has been disappointingly slow, not least because of a lack of efficient methodology to identify predictors of treatment response. Providing the correct tools to analyse genetic datasets efficiently will allow biomarkers of treatment response to be identified sooner and with more accuracy, therefore making a personalised approach to treatment a reality and expediting its implementation. This will ensure that healthcare policies are improved in line with expectations, and that healthcare costs are reduced due to an improvement in drug efficiency and safety.

Drug development and diagnostic companies: Identifying genetic biomarkers influencing drug response or age of disease onset can provide clues about potential drug targets, thus informing drug development. It is hoped that identifying biomarkers of AED response and of age of cardiovascular and type II diabetes onset we will learn about potential drug targets that drug companies can utilise in future development of drugs. Drug targets in other clinical areas will similarly be identified by applying our methods in other studies. Further, for the implementation of a personalised approach to treatment involving genetic markers, companion diagnostics are required and development of such diagnostics will be informed by our findings as well as findings from other pharmacogenetic (PGx) studies applying our methodology.

Statisticians and other genetic researchers: Despite the development of numerous advanced methodologies to deal with complex data, simple statistical approaches remain widespread among applied statisticians, leading to inefficient analysis and potentially missing important associations. One reason for this is that software to easily implement methods is not generally available. The development of software, and its dissemination will ensure our methodology is utilised, allowing efficient analysis of exome sequence data with time to event outcomes.
 
Description Epilepsy Pharmacogenomics: delivering biomarkers for clinical use 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Contribution of patient samples and phenotype data; statistical advice; co-authorship of publications.
Collaborator Contribution Contribution of patient samples and phenotype data; co-authorship of publications.
Impact Perucca, P., Anderson, A., Jazayeri, D., Hitchcock, A., Graham, J., Todaro, M., Tomson, T., Battino, D., Perucca, E., Ferri, M.M., Rochtus, A., Lagae, L., Canevini, M.P., Zambrelli, E., Campbell, E., Koeleman, B.P.C., Scheffer, I.E., Berkovic, S.F., Kwan, P., Sisodiya, S.M., Goldstein, D.B., Petrovski, S., Craig, J., Vajda, F.J.E., O'Brien, T.J. and (2020), Antiepileptic Drug Teratogenicity and De Novo Genetic Variation Load. Ann Neurol, 87: 897-906. https://doi.org/10.1002/ana.25724
Start Year 2011
 
Description Lecture and computer practical on analysis of rare variants 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact We developed teaching material and a computer practical on the topic of rare variant analysis, the contents of which were developed based on findings of a literature review conducted as part of the current project. The material and practical will be delivered to students on an MSc course at the University. They will be able to use the skills learnt when undertaking their dissertation project for the MSc, and in future research projects should they choose to continue working in the field.
Year(s) Of Engagement Activity 2022
 
Description Oral presentation titled, "Whole Exome Sequencing Analysis of 'Time-To-Event' Outcomes In Epilepsy Patients" at annual conference of Indian Society for Medical Statistics(ISMS)-2020, virtual conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact Oral presentation titled, "Whole Exome Sequencing Analysis of 'Time-To-Event' Outcomes In Epilepsy Patients" at annual conference of Indian Society for Medical Statistics(ISMS)-2020, virtual conference
Year(s) Of Engagement Activity 2020
 
Description Presented a poster titled, "Whole Exome Sequencing Analysis of Complex 'Time-To-Event' Outcomes In Epilepsy Patients" at the annual meeting of International Genetic Epidemiology Society (IGES)-2020, virtual conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Presented a poster titled, "Whole Exome Sequencing Analysis of Complex 'Time-To-Event' Outcomes In Epilepsy Patients" at the annual meeting of International Genetic Epidemiology Society (IGES)-2020, virtual conference
Year(s) Of Engagement Activity 2020