Improved methodology for understanding the genetics of complex traits, with particular application to epilepsy.

Lead Research Organisation: University College London
Department Name: UCL Genetics Institute

Abstract

The heritability of a trait determines an upper bound on how well we can understand its underlying genetics. For the case of a human disease, this determines how successfully we can predict an individual's disease risk, and how well we can develop effective drugs and treatments. Many human diseases are known to be highly heritable, based on measurements made in twin, sibling-risk, or other family-based studies. However, at present we are have not been able to fully make use of this heritability. Evidence suggests that this is because disease are more complicated than once thought. It is rarely the case that a single gene determines whether an individual develops a condition. Instead, it has been realised that more often an individual's risk is affected by a large number of genetic factors. This realisation means it is necessary to develop new methods for analysing genetic data. These methods must appreciate that many factors are likely to be important for any given trait. My project outlines new methodologies designed with this in mind.

One of these methods explains how to better predict whether an individual will develop a disease based on their DNA. For example, suppose that an individual experiences an epileptic seizure. There is a 50% chance that this individual will have further seizures and will therefore be diagnosed with epilepsy. In this case, it would be necessary to administer anti-epileptic drugs to treat the condition. However, there is also a 50% chance that the individual will never experience another seizure. However, to be sufficiently certain that this is the case, the individually will have to be observed for a year, and would not be allowed to drive a motor vehicle during that time. I propose a prediction method which will improve our ability to determine whether an individual who experiences a seizure will subsequently develop epilepsy. This will either speed-up the time taken to administer drugs, or speed-up the time taken to receive the all-clear.

For the case that an individual is diagnosed with epilepsy, it is necessary to decide what is the most appropriate type of drugs to provide. This decision depends on what subtype of epilepsy the individual has, as different medications are more suitable for different subtypes. However, it is often difficult to determine what type of epilepsy an individual has. Therefore, I will develop a method for better classifying individuals, again based on their genetic data.

So that my methods as useful as possible, I will make them freely-available, and design them to be used by all types of scientists.

Technical Summary

Heritability analysis supposes a random effects model where the correlation structure of the random effect is specified by a kinship matrix K. When K corresponds to allelic correlations, it is convenient to view the underlying model in terms of an equivalent linear regression model, where each individual's phenotype is determined by a linear combination of its SNP genotypes (plus an environmental noise term), and each SNP's effect size is assumed normally distributed with constant variance. To realise the full potential of heritability analysis, it is necessary to appreciate that there are many varieties of this underlying model, meaning many different K can be computed. For example, I intend to improve BLUP by incorporating more than one kinship matrix, such that each corresponds to a separate variance term. For example, if allowing K1 and K2, the revised BLUP would benefit when K1 tends to correspond to stronger effect SNPs and K2 to weaker ones.

The standard kinship matrix corresponds to assigning each SNP an independent effect size. When rare variants are included, the (effective) degrees of freedom becomes prohibitively large to allow useful estimation of h^2. However, the degrees of freedom can be reduced by placing restrictions on variant effect sizes, such as hierarchical priors.

Two of my proposed methods concern enhancements of the REML algorithm. Although a classical technique, this can be viewed from a Bayesian angle. For example, estimation of h^2 equates to computing the posterior mode of h^2. This task is optimised through use of an eigen decomposition. When using this trick, including a prior on h^2 requires only a slight amendment (the need to differentiate this prior), so computation time will be barely affected. To estimate the distribution of h^2 for use in heritability meta analysis, requires sampling from its posterior distribution. But again, as the bottle-neck of REML is the decomposition, this will result in only a modest delay.

Planned Impact

It is simply the case that everyone can potentially benefit from this type of research. The obvious example is the application of these methods to human diseases (or equally to health-related quantitative traits such as BMI or cholesterol). There is tremendous value in detecting variants, genes or pathways which are causal for a disease; each finding will improve our understanding of the condition, and bring us a step closer to developing adequate treatment. Similarly, there is clear advantage to developing effective prediction models, or better classification of phenotypic subtypes, as these move us nearer to the possibility of personalised medicine.

To give a clear example for epilepsy, through earlier work I have determined that approximately 30% of the trait's variance (on the liability scale) can be explained by common variants. This 30% corresponds to an upper bound on how effectively common variants could explain risk susceptibility. Although to get close to this bound, we would require studies across many millions of individuals, experience with traits such as human height suggest that for reasonable numbers of individuals, using BLUP it should be possible to develop a prediction model explaining between 3 and 6% of underlying variation. Simulations suggest that a model explaining 10% of variation would be clinically useful for selecting which single-seizure individuals are most likely to develop epilepsy. This target is not too far away; potentially the information obtained by performing heritability meta analysis across the 50,000 individuals of the ILAE consortium, integrated into my improved BLUP methodology, would bring this figure within reach.

Moreover, the benefits of improved prediction and detection of causal variants are not just limited to human traits. There is also great benefit in better understanding complex traits for animals and plants. For example, genomic selection uses BLUP-based methods to determine which animals or plants to breed in order to efficiently produce the most desirable breeding stock (here, the phenotype might be, say, weight of bull or yield of wheat). Because of the way heritability analysis and BLUP are inextricably linked, any methodological improvement in one, will invariably benefit the other. A nice example of the power of genomic selection, presented at a recent conference, was the case where maize was bred to increase its vitamin levels, so as to benefit people living in under-nourished areas. Additionally, with organisms such as maize having short breeding cycles, and fewer safety concerns compared to dealing with human traits, this type of benefit can be brought to bear relatively quickly.

Publications

10 25 50

publication icon
International League Against Epilepsy Consortium On Complex Epilepsies (2018) Genome-wide mega-analysis identifies 16 loci and highlights diverse biological mechanisms in the common epilepsies. in Nature communications

publication icon
International League Against Epilepsy Consortium On Complex Epilepsies. Electronic Address: Epilepsy-Austin@unimelb.edu.au (2014) Genetic determinants of common epilepsies: a meta-analysis of genome-wide association studies. in The Lancet. Neurology

 
Description ILAE Taskforce
Geographic Reach Multiple continents/international 
Policy Influence Type Influenced training of practitioners or researchers
Impact The ILAE (International League Against Epilepsy) is the leading body responsible for epilepsy definitions and terminologies. These involve issuing guidelines on defining whether and when an individual has epilepsy, classifying recognised subtypes, and recently was responsible for epilepsy being "upgraded" from a disorder to a disease. I am linked to this taskforce through my involvement in the ILAE Consortium.
URL http://www.ilae.org/Visitors/Centre/Definition_Class.cfm
 
Description AIAS COFUND Fellowship
Amount 3,000,000 kr. (DKK)
Funding ID 754513 
Organisation Marie Sklodowska-Curie Actions 
Sector Charity/Non Profit
Country Global
Start 10/2017 
End 03/2020
 
Description Sapere Aude
Amount 5,600,000 kr. (DKK)
Organisation Danish Council for Independent Research 
Sector Public
Country Denmark
Start 04/2018 
End 03/2022
 
Title GWAS Results 
Description It is now standard (and often required) for researchers to make available p-values whenever they publish an association study. Therefore, I have uploaded to public databases / made available when requested, results from my association studies, which include epilepsy and tuberculosis. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact The p-values from our analysis of epilepsy susceptibility have been requested by over a dozen other research groups, and used in at least one publication. 
 
Description Exploiting the shared genetic basis of autoimmune disease to understand aetiology and improve risk prediction 
Organisation University of Melbourne
Department Department of Chemistry
Country Australia 
Sector Academic/University 
PI Contribution I am a co-principal investigator on NHMRC Grant Application APP1139672 with title "Exploiting the shared genetic basis of autoimmune disease to understand aetiology and improve risk prediction" (submitted March 2017, lead applicant Gad Abraham). If the grant is successful, this project will develop tools to better understand the genetic overlap between complex human diseases, and use this information for better detection of causal variants and more accurate risk prediction. Dr Abraham asked me to join the project due to my expertise in understanding genetic architecture, producing and releasing software, and as I am the developer of MultiBLUP , the world-leading method for SNP-based risk prediction (Speed, D and Balding, DJ, MultiBLUP: Improved SNP-based prediction for complex traits, Genome Research, 2014).
Collaborator Contribution I will work jointly with Dr Abraham, as well as the third collaborator, Professor David Balding on developing methods and analyzing large-scale genetic datasets.
Impact Waiting to hear if grant application successful.
Start Year 2017
 
Description International League Against Epilepsy (ILAE) 
Organisation International League Against Epilepsy (ILAE)
Country Global 
Sector Academic/University 
PI Contribution The ILAE is a global consortium, set up to allow data from multiple groups working on epilepsy to be shared and analysed jointly. I am lead analyst for the consortium, responsible for performing the main analysis (which currently involves over 45,000 individuals) as well as designing the protocols used by individual groups. The ILAE is also responsible for determining epilepsy classifications, and influences policy, such as advising how long an individual's driving licence should be suspended if they experience an epilepsy seizure.
Collaborator Contribution Our personal epilepsy dataset contains 1200 cases; by being part of the consortium, we have increased the number of cases we have available to analyse to over 15,000, greatly increasing our ability to interrogate the disease.
Impact Published first analysis (a meta-analysis of all, partial and generalized epilepsy) in Lancet Neurology. Second analysis (a mega-analysis of six clinically defined epilepsy sub-phenotypes) currently underway.
Start Year 2010
 
Description The effects of genetics, mutation and selection on Evolutionary Rescue in complex environments 
Organisation University College London
Department History of Art
Country United Kingdom 
Sector Academic/University 
PI Contribution I am a co-principal investigator on BBSRC Grant Application BB/R003882/1 with title "The effects of genetics, mutation and selection on Evolutionary Rescue in complex environments" (submitted February 2017, lead applicant Max Reuter). If the grant is successful, this project will study the effects of mutation, the genetic architecture of environmental responses (ER) and selection on ER, primarily using the fission yeast S. pombe. S. pombe is a well-established eukaryotic microbial model organism, so as well as better understanding its biological processes and responses to external factors, our conclusions should carry over to other organisms. Dr Reuter asked me to join the project as I have experience working in S. pombe (was the lead statistician on Jeffares et al, The genomic and phenotypic diversity of Schizosaccharomyces pombe, Nature Genetics, 2015) and we intend to perform statistical analysis using my software LDAK, and the tools I have developed during my MRC Award.
Collaborator Contribution As principal investigator, Dr Reuter will be in charge of performing the experiments, producing the data I will analyse.
Impact Waiting to hear if grant application successful.
Start Year 2017
 
Title LDAK 
Description LDAK is my software for analysing association study data. I first created LDAK in 2012, and have added new features each year. LDAK currently allows researchers to: Perform basic (single-SNP) and linear mixed model tests of association SumHer - the world-leading tool for estimating SNP heritability from summary statistics GBAT - the world-leading tool for gene-based association analysis LDAK-Bolt - the world-leading tool for constructing prediction models from individual-level data MegaPRS - the world-leading tool for constructing prediction models from summary statistics Tools for data manipulation (converting, merging, cleaning data). 
Type Of Technology Software 
Year Produced 2012 
Open Source License? Yes  
Impact Update December 2021- in the last year, I have added five new tools to LDAK. The software has now been downloaded over 2500 times, and cited over 1500 times. I receive requests for support on a weekly basis, about ten of which have resulted in collaborations (including groups in UK, Sweden, China and USA). 
URL http://www.ldak.org
 
Description AIAS public seminar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact I gave a public talk, entitled Solving Genetics. The audience was abotu 30% postgraduates scientists, 30% postgraduate humanities students and 40% members of the public.
Year(s) Of Engagement Activity 2018
URL http://aias.au.dk/events/show/artikel/aias-fellows-seminar-doug-speed-aias-fellow/
 
Description Alan Turing Institute Working Group: Data science challenges in high-throughput biology and precision medicine 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact The Alan Turing Institute is soon to be opened in the British Library, London. It has been grated £42,000,000 seed money from the British Government, with additional funding from other collaborating institutes and sponsors. To decide the research strategy for the institute, a number of working groups were organised, tasked with deciding the most effective use of the money and new facilities. I was invited to the Edinburgh workshop, which spent three days discussing key challenges of big data genetics. The output of the workshop was a five page report on our key opinions, submitted to the head of the Alan Turing Institute.
Year(s) Of Engagement Activity 2015
URL https://turing.ac.uk/data-science-challenges-in-high-throughput-biology-and-precision-medicine/
 
Description Armidale Genetics Summer School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact With David Balding, I organised and taught the Armidale Animal Breeding Summer Course 2016. This is a 5-day Summer workshop, based in Armidale, Australia, which is currently in its 14th year, is open to undergraduate and postdoctoral students, as well as members of industry and agriculture. Slides from the 18 lectures and practicals are made publicly available on the course website.
Year(s) Of Engagement Activity 2016
URL http://jvanderw.une.edu.au/AGSCcourse.htm
 
Description DataKind UK DataDive 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Supporters
Results and Impact DataKind is a charitable organisation, with the aim of "harnessing the power of data science in the service of humanity". One type of event it organises is the DataDive, open to all, but targeted mainly at those experienced with data analysis (whether this be part of their job, their scientific research, or just a hobby). For the November 2014 DataDive, they invited three charities (St Mungo's, Citizen's Advice Bureau and North East Child Poverty Commission) each of which had a specific problem with which they wanted help. For example, St Mungo's is a homeless charity based on London, and wanted an algorithm for better predicting which people would end up homeless. Participants divided themselves into three working groups, and at the end of the three-day event, each group presented their ideas for tacking the charity's problem.
Year(s) Of Engagement Activity 2014
URL http://www.datakind.org/blog/datakind-uk-a-datadive-full-of-firsts/
 
Description Henry Stewart Talk 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact Henry Stewart Talks is a provider of specialist online audio-visual lectures, accompanied by learning and teaching material, subscribed to by schools, research institutes, industry and governments in over 60 countries. While most subscriptions are paid-for, they also have a scheme allowing free-access to institutions in developed countries. I was invited to give a talk on "Heritability and Its Uses".
Year(s) Of Engagement Activity 2016
URL https://hstalks.com/expert/3100/dr-doug-speed/
 
Description MRC Outreach 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact The MRC invited me to speak in the event "Genetics with Drug Discovery and Development" as prt of their series "Cutting Edge Science; Linking research with teaching". These events are advertised primarily to school teachers as "a great way to engage students in the classroom as well as improve teacher's subject knowledge in areas of research linked to the curriculum. Each Cutting Edge Science course includes input from an established researcher as well as practical ways to embed the research into teaching these curriculum areas."
Unfortunately, having agreed to speak and preparted material, the event was postponed the day before, but there are discussions about rearranging.
Year(s) Of Engagement Activity 2015
URL https://www.stem.org.uk/system/files/community-groups/topic-files/legacy_files_migrated/9494-RCUK%20...
 
Description Public lecture explaining statistical tools to understand genealogy (Who Do You Think You Are, Live! 2016) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact I gave a presentation (with Q&A session) to "Who Do You Think You Are, Live!" a large-scale genealogy roadshow which has arisen due to the popularity of the BBC show "Who Do You Think You Are?". This was held in the National Exhibition Centre, Birmingham, and attended by approximately 3000 attendees each day. In addition to corporate stalls, there was a "Scientist's Corner", where researchers such as myself explained the science behind different aspects of genealogy. I received a number of questions an subsequent emails from members of the public, mainly wishing to find relatives or better understand the mathematics behind genealogy companies such as 23&Me.
Year(s) Of Engagement Activity 2016
URL https://www.youtube.com/watch?v=zYAvPuQd0Y8
 
Description Three one-day workshops on statistical methods for analysing genetic data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I have run 3 one-day workshops on statistical methods for analysing genetic data (two jointly with collaborator David Balding in United Kingdom, one in Aarhus). These course teach mainly postgraduates how to use popular statistical tools. Approximately 120 people have attended, about 70% from UK, the rest from Europe.
Year(s) Of Engagement Activity 2017,2018
URL http://aias.au.dk/events/show/artikel/aias-course-statistical-genetics-short-course-methods-for-anal...