Developing methods for identifying clinical phenotypes from routinely collected health data with applications to stroke genetics.

Lead Research Organisation: University of Edinburgh
Department Name: Centre of Population Health Sciences

Abstract

Genetic, and other risk factor associations with stroke are known to be to a large extent type and subtype specific. Systematic review data and UK Biobank pilot work data has shown that stroke type (ischaemic versus haemorrhagic) is not specified for around 40%, and further stroke subtype (TOAST, OCSP, haemorrhage location etc) is not specified for around 70% of stroke cases ascertained from routinely collected coded health data. With expert adjudication it is possible to assign a type and subtype for around 80% of these cases, but this is not a scalable method suitable for very large studies (e.g., UK Biobank). I propose to develop scalable, automated methods that will allow further stroke typing and subtyping from routinely collected health data by investigating the use of various algorithmic code combinations and of natural language processing methods that could be applied to free text medical records and imaging reports. I would then propose to validate these methods directly, as well as indirectly by comparing them with other phenotyping approaches in genetic studies.

Phenome-association studies can be used to systematically examine the impact of one or many genetic variants across a broad range of human phenotypes, and have the potential to reveal novel insights to underlying disease mechanisms, as well as hold great potential for the identification of novel drug targets and drug repurposing opportunities. UK Biobank with its vast and varied phenotypic data is a dataset that is highly suitable for these studies. However, to date there is a relative lack of sophisticated phenotypic methods to select and identify outcomes of interest. I propose to apply existing phenome-wide association study methods to investigate hypothesis-based associations with stroke as a model disease, and to develop these methods further for wider use.

During the past decade, findings of genome-wide association studies have improved our knowledge and understanding of complex disease genetics. Statistical analysis typically looks for association between a phenotype and single genetic variants taken individually via single-variant tests. However, this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. The next steps would be to also consider the interactions between genetic variants, or epistasis. Epistasis detection gives rise to new analytic challenges since analysing every single nucleotide polymorphism combination is at present impractical at a genome-wide scale. I propose to apply existing methods and develop these further for wider use, starting with a hypothesis-driven approach to investigate epistatic associations between selected stroke genes.

Technical Summary

For objective (1) I will start by using data from the UK Biobank participants, building on the pilot work undertaken in the UK Biobank to date, which has generated "case vignettes" composed of relevant medical record free text. Following validation in UK Biobank, the developed algorithms and methods can then be applied to and further validated in other consented cohorts (e.g., Generation Scotland). For objective (2), I will build on an ongoing systematic review that will lead to a set of phenotypes that will form the hypothesis for a phenome-wide association study using data from the UK Biobank. For objective (3) I will use my existing network of collaborations within the International Stroke Genetics Consortium (ISGC) to access relevant datasets to test the proposed methods. I will also collaborate with other researchers from the University of Edinburgh and across the UK, as well as with other HDR UK fellows, to build the skills and capacity to undertake and further develop these proposed directions.
This project aligns with HDR UK priorities, as it will develop national leadership, partnerships, and interdisciplinary skills and capacity through the development of novel analytical methods and tools, which can in the future be applied and taken up for research of health conditions beyond stroke.

Publications

10 25 50
 
Description University of Edinburgh, A. Tenesa team 
Organisation University of Edinburgh
Department The Roslin Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution We are collaborating on a project investigating how different coded phenotype definitions of stroke affect the genetic association results using UK Biobank data. Our team has experience in understanding the accuracy and nuances associated with different stroke definitions.
Collaborator Contribution Our collaborators (Albert Tenesa's team) have bioinformatics skills allowing genetic analyses of complex data.
Impact Multi-disciplinary: quantitative genetics, clinical phenomics
Start Year 2018
 
Description University of Hamilton 
Organisation McMaster University
Country Canada 
Sector Academic/University 
PI Contribution We are collaborating with colleagues from McMaster University genetic and molecular epidemiology laboratory (PI: Guillaume Pare) on a project investigating the penetrance and variable expressivity of rare variants in monogenic stroke genes. Our group has undertaken a systematic review to identify all reported variants and their associated clinical phenotypes.
Collaborator Contribution Our collaborators are investigating the frequency of these variants in publicly available control databases to better understand their role in health and disease.
Impact Multi-disciplinary collaboration: clinical neurology and general medical knowledge and genetic epidemiology and molecular genetics.
Start Year 2018
 
Description University of Munich 
Organisation Ludwig Maximilian University of Munich (LMU Munich)
Country Germany 
Sector Academic/University 
PI Contribution We have a joint project using the UK Biobank data to investigate genetic associations with stroke and its subtypes. Our research team has developed methods for identifying stroke and other disease outcomes from routinely collected electronic health data that we have used in this project.
Collaborator Contribution Our partners have analytical skills that ave allowed them to process complex and large scale genetic data for this project.
Impact Manuscript: pubmed ID 30383316
Start Year 2018
 
Description Presentation at conference (ISGC) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact I gave an oral presentation at the 24th International Stroke Genetics Consortium meeting in Washington, USA, 08.11.2018 - 09.11.2018. I presented the results of a systematic review investigating the associations between phenotypes and genetic variants in genes thought to cause familial stroke, followed by further research plans as part of my award, to researchers across the world.
Year(s) Of Engagement Activity 2017,2018
 
Description Presentation at conference (PQG) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Poster presentation at the Program in Quantitative Genomics 12th annual conference "Biobanks: Study Design and Data Analysis" 01.11.2018 - 02.11.2018 at Harvard Medical School in Boston, MA, USA. I presented the results of our research into the accuracy of routinely collected coded disease diagnoses and further research plans as part of my award to around 100 researchers from different disciplines across the world.
Outcomes: I had a lot of interest in our poster during the poster session of our conference, and interesting and useful discussions with various researchers. A couple of researchers reported improved understanding of the nature and accuracy of routinely collected electronic health data and how it may influence research results using such data.
Year(s) Of Engagement Activity 2018
 
Description Presentation at seminar (CMI) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Other audiences
Results and Impact I gave an oral presentation at the University of Edinburgh, Usher Institute, Centre for Medical Informatics weekly seminar series on 15.10.2018. I presented the results of my research into the accuracy of routinely collected coded stroke diagnoses and further research plans as part of my award to around 30 researchers (all grades ranging from PhD students to professors) across different disciplines and from different institutions and departments, mainly from the University of Edinburgh. The aim of these seminars is to encourage and facilitate discussion, collaboration and learning across The Centre for Medical Informatics which encompasses people from many different backgrounds and disciplines.
Year(s) Of Engagement Activity 2018
 
Description Presentation at seminar (DCN) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact I gave an oral presentation at the Department of Clinical Neurosciences, Western General Hospital, academic afternoon on 28.02.2019. This is a weekly meeting of clinicians (predominantly consultant and trainee neurologists) from hospitals across the South East of Scotland. I presented the results of a systematic review investigating the associations between phenotypes and genetic variants in genes thought to cause familial stroke, followed by further research plans as part of my award. The purpose of the talk was to introduce my research to my clinical colleagues.
Year(s) Of Engagement Activity 2019
 
Description Presentation at seminar (McMaster) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I gave a talk at a collaborator's (Guillaume Pare) institution (Genetic and Molecular Epidemiology Laboratory, McMaster University, Hamilton, Canada) at their seminar about my research.
Year(s) Of Engagement Activity 2018