Developing methods for identifying clinical phenotypes from routinely collected health data with applications to stroke genetics.
Lead Research Organisation:
University of Edinburgh
Department Name: Centre of Population Health Sciences
Abstract
Genetic, and other risk factor associations with stroke are known to be to a large extent type and subtype specific. Systematic review data and UK Biobank pilot work data has shown that stroke type (ischaemic versus haemorrhagic) is not specified for around 40%, and further stroke subtype (TOAST, OCSP, haemorrhage location etc) is not specified for around 70% of stroke cases ascertained from routinely collected coded health data. With expert adjudication it is possible to assign a type and subtype for around 80% of these cases, but this is not a scalable method suitable for very large studies (e.g., UK Biobank). I propose to develop scalable, automated methods that will allow further stroke typing and subtyping from routinely collected health data by investigating the use of various algorithmic code combinations and of natural language processing methods that could be applied to free text medical records and imaging reports. I would then propose to validate these methods directly, as well as indirectly by comparing them with other phenotyping approaches in genetic studies.
Phenome-association studies can be used to systematically examine the impact of one or many genetic variants across a broad range of human phenotypes, and have the potential to reveal novel insights to underlying disease mechanisms, as well as hold great potential for the identification of novel drug targets and drug repurposing opportunities. UK Biobank with its vast and varied phenotypic data is a dataset that is highly suitable for these studies. However, to date there is a relative lack of sophisticated phenotypic methods to select and identify outcomes of interest. I propose to apply existing phenome-wide association study methods to investigate hypothesis-based associations with stroke as a model disease, and to develop these methods further for wider use.
During the past decade, findings of genome-wide association studies have improved our knowledge and understanding of complex disease genetics. Statistical analysis typically looks for association between a phenotype and single genetic variants taken individually via single-variant tests. However, this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. The next steps would be to also consider the interactions between genetic variants, or epistasis. Epistasis detection gives rise to new analytic challenges since analysing every single nucleotide polymorphism combination is at present impractical at a genome-wide scale. I propose to apply existing methods and develop these further for wider use, starting with a hypothesis-driven approach to investigate epistatic associations between selected stroke genes.
Phenome-association studies can be used to systematically examine the impact of one or many genetic variants across a broad range of human phenotypes, and have the potential to reveal novel insights to underlying disease mechanisms, as well as hold great potential for the identification of novel drug targets and drug repurposing opportunities. UK Biobank with its vast and varied phenotypic data is a dataset that is highly suitable for these studies. However, to date there is a relative lack of sophisticated phenotypic methods to select and identify outcomes of interest. I propose to apply existing phenome-wide association study methods to investigate hypothesis-based associations with stroke as a model disease, and to develop these methods further for wider use.
During the past decade, findings of genome-wide association studies have improved our knowledge and understanding of complex disease genetics. Statistical analysis typically looks for association between a phenotype and single genetic variants taken individually via single-variant tests. However, this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. The next steps would be to also consider the interactions between genetic variants, or epistasis. Epistasis detection gives rise to new analytic challenges since analysing every single nucleotide polymorphism combination is at present impractical at a genome-wide scale. I propose to apply existing methods and develop these further for wider use, starting with a hypothesis-driven approach to investigate epistatic associations between selected stroke genes.
Technical Summary
For objective (1) I will start by using data from the UK Biobank participants, building on the pilot work undertaken in the UK Biobank to date, which has generated "case vignettes" composed of relevant medical record free text. Following validation in UK Biobank, the developed algorithms and methods can then be applied to and further validated in other consented cohorts (e.g., Generation Scotland). For objective (2), I will build on an ongoing systematic review that will lead to a set of phenotypes that will form the hypothesis for a phenome-wide association study using data from the UK Biobank. For objective (3) I will use my existing network of collaborations within the International Stroke Genetics Consortium (ISGC) to access relevant datasets to test the proposed methods. I will also collaborate with other researchers from the University of Edinburgh and across the UK, as well as with other HDR UK fellows, to build the skills and capacity to undertake and further develop these proposed directions.
This project aligns with HDR UK priorities, as it will develop national leadership, partnerships, and interdisciplinary skills and capacity through the development of novel analytical methods and tools, which can in the future be applied and taken up for research of health conditions beyond stroke.
This project aligns with HDR UK priorities, as it will develop national leadership, partnerships, and interdisciplinary skills and capacity through the development of novel analytical methods and tools, which can in the future be applied and taken up for research of health conditions beyond stroke.
Organisations
- University of Edinburgh (Fellow, Lead Research Organisation)
- McMaster University (Collaboration)
- University of Tartu (Collaboration)
- UNIVERSITY OF EDINBURGH (Collaboration)
- University College London (Collaboration)
- University Medical Center Utrecht (UMC) (Collaboration)
- Ludwig Maximilian University of Munich (LMU Munich) (Collaboration)
People |
ORCID iD |
Kristiina Rannikmae (Principal Investigator / Fellow) |
Publications
Bakker MK
(2020)
Genome-wide association study of intracranial aneurysms identifies 17 risk loci and genetic overlap with clinical risk factors.
in Nature genetics
Chung J
(2021)
Rare Missense Functional Variants at COL4A1 and COL4A2 in Sporadic Intracerebral Hemorrhage
in Neurology
Ferguson A
(2022)
Frequency and Phenotype Associations of Rare Variants in 5 Monogenic Cerebral Small Vessel Disease Genes in 200,000 UK Biobank Participants
in Neurology Genetics
Georgakis MK
(2019)
Genetically Determined Levels of Circulating Cytokines and Risk of Stroke.
in Circulation
Grami N
(2020)
Global Assessment of Mendelian Stroke Genetic Prevalence
in Stroke
Jaworek T
(2022)
Contribution of Common Genetic Variants to Risk of Early-Onset Ischemic Stroke
in Neurology
Malik R
(2021)
Midlife vascular risk factors and risk of incident dementia: longitudinal cohort and Mendelian randomization analyses in the UK Biobank
in The Journal of the Alzheimer's Association
Description | BHF REA3 pump priming award for project "Clinical consequences of rare variants in Cerebral Small Vessel Disease genes" |
Amount | £44,375 (GBP) |
Organisation | British Heart Foundation (BHF) |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 01/2020 |
End | 06/2021 |
Description | Carnegie Vacation Scholarship for medical student summer project |
Amount | £1,000 (GBP) |
Organisation | Carnegie Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 05/2019 |
End | 07/2019 |
Description | Cerebral phenotypes associated with monogenic cSVDs |
Amount | £1,800 (GBP) |
Organisation | The Genetics Society |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 05/2021 |
End | 07/2021 |
Description | Wellcome Trust University of Edinburgh ISSF funding |
Amount | £24,000 (GBP) |
Organisation | University of Edinburgh |
Sector | Academic/University |
Country | United Kingdom |
Start | 06/2021 |
End | 12/2021 |
Title | Code lists for the HDR UK Phenome Library |
Description | Routinely collected health data disease code lists generated and published in the HDR UK Phenome Library: http://phenotypes.healthdatagateway.org/ |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Generated codes for stroke and other diseases are publicly available for any researcher to use in their work with routinely collected health data. |
URL | https://phenotypes.healthdatagateway.org/ |
Description | Collaboration with Dr Honghan Wu in UCL |
Organisation | University College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We have worked together on two projects. One involving developing automated methods for stroke subtyping based on radiology reports. The second involving using network methods to identify new genetic associations with cerebral small vessel disease. I have provided the clinical perspective for both projects, while Dr Wu has provided the machine learning / informatics perspective. |
Collaborator Contribution | See above. |
Impact | 2 publications |
Start Year | 2018 |
Description | Collaboration with Estonian Biobank |
Organisation | University of Tartu |
Country | Estonia |
Sector | Academic/University |
PI Contribution | I have consulted the Estonian Biobank team about identifying relevant stroke cases to include from their biobank for the Precise4q project. |
Collaborator Contribution | Site for Precise4q project. |
Impact | n/a |
Start Year | 2020 |
Description | Collaboration with Y Ruigrok's team in UMC Utrecht |
Organisation | University Medical Center Utrecht (UMC) |
Country | Netherlands |
Sector | Academic/University |
PI Contribution | The collaborative project is studying genetic associations with subarachnoid haemorrhage and unruptured intracranial aneurysms. Dr Ruigrok leads the working group within the International Stroke Genetics Consortium. My role has been developing methods for informing relevant phenotype identification in UK Biobank, allowing Dr Ruigrok's team to integrate data from UK Biobank to the larger genetic meta-analysis. |
Collaborator Contribution | Please see above. |
Impact | Publication: • Bakker M, (18 authors), Rannikmäe K, (53 authors). Genome-wide association study of intracranial aneurysms reveals 17 risk loci, polygenic architecture, genetic overlap with clinical risk factors, and opportunities for prevention. Nature Genetics 2020;52(12):1303-1313. |
Start Year | 2020 |
Description | McMaster University |
Organisation | McMaster University |
Country | Canada |
Sector | Academic/University |
PI Contribution | We are collaborating with colleagues from McMaster University genetic and molecular epidemiology laboratory (PI: Guillaume Pare) on a project investigating the penetrance and variable expressivity of rare variants in monogenic stroke genes. Our group has undertaken a systematic review to identify all reported variants and their associated clinical phenotypes. |
Collaborator Contribution | Our collaborators are investigating the frequency of these variants in publicly available control databases to better understand their role in health and disease. |
Impact | Multi-disciplinary collaboration: clinical neurology and general medical knowledge and genetic epidemiology and molecular genetics. Two publications: pubmed id 32106772 and 32842921. |
Start Year | 2018 |
Description | University of Edinburgh, A. Tenesa team |
Organisation | University of Edinburgh |
Department | The Roslin Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We are collaborating on a project investigating how different coded phenotype definitions of stroke affect the genetic association results using UK Biobank data. Our team has experience in understanding the accuracy and nuances associated with different stroke definitions. |
Collaborator Contribution | Our collaborators (Albert Tenesa's team) have bioinformatics skills allowing genetic analyses of complex data. |
Impact | Multi-disciplinary: quantitative genetics, clinical phenomics |
Start Year | 2018 |
Description | University of Munich |
Organisation | Ludwig Maximilian University of Munich (LMU Munich) |
Country | Germany |
Sector | Academic/University |
PI Contribution | We have a joint project using the UK Biobank data to investigate genetic associations with stroke and its subtypes. Our research team has developed methods for identifying stroke and other disease outcomes from routinely collected electronic health data that we have used in this project. |
Collaborator Contribution | Our partners have analytical skills that ave allowed them to process complex and large scale genetic data for this project. |
Impact | Manuscript: pubmed ID 30383316; Manuscript in press: Malik et al. Midlife vascular risk factors and risk of incident dementia: longitudinal cohort and Mendelian randomization analyses in the UK Biobank, 2021. In press in Alzheimer's & Dementia: The Journal of the Alzheimer's Association. |
Start Year | 2018 |
Description | Presentation at conference (ISGC) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | I gave an oral presentation at the 24th International Stroke Genetics Consortium meeting in Washington, USA, 08.11.2018 - 09.11.2018. I presented the results of a systematic review investigating the associations between phenotypes and genetic variants in genes thought to cause familial stroke, followed by further research plans as part of my award, to researchers across the world. |
Year(s) Of Engagement Activity | 2017,2018 |
Description | Presentation at conference (PQG) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Poster presentation at the Program in Quantitative Genomics 12th annual conference "Biobanks: Study Design and Data Analysis" 01.11.2018 - 02.11.2018 at Harvard Medical School in Boston, MA, USA. I presented the results of our research into the accuracy of routinely collected coded disease diagnoses and further research plans as part of my award to around 100 researchers from different disciplines across the world. Outcomes: I had a lot of interest in our poster during the poster session of our conference, and interesting and useful discussions with various researchers. A couple of researchers reported improved understanding of the nature and accuracy of routinely collected electronic health data and how it may influence research results using such data. |
Year(s) Of Engagement Activity | 2018 |
Description | Presentation at seminar (CMI) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Other audiences |
Results and Impact | I gave an oral presentation at the University of Edinburgh, Usher Institute, Centre for Medical Informatics weekly seminar series on 15.10.2018. I presented the results of my research into the accuracy of routinely collected coded stroke diagnoses and further research plans as part of my award to around 30 researchers (all grades ranging from PhD students to professors) across different disciplines and from different institutions and departments, mainly from the University of Edinburgh. The aim of these seminars is to encourage and facilitate discussion, collaboration and learning across The Centre for Medical Informatics which encompasses people from many different backgrounds and disciplines. |
Year(s) Of Engagement Activity | 2018 |
Description | Presentation at seminar (DCN) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | I gave an oral presentation at the Department of Clinical Neurosciences, Western General Hospital, academic afternoon on 28.02.2019. This is a weekly meeting of clinicians (predominantly consultant and trainee neurologists) from hospitals across the South East of Scotland. I presented the results of a systematic review investigating the associations between phenotypes and genetic variants in genes thought to cause familial stroke, followed by further research plans as part of my award. The purpose of the talk was to introduce my research to my clinical colleagues. |
Year(s) Of Engagement Activity | 2019 |
Description | Presentation at seminar (McMaster) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | I gave a talk at a collaborator's (Guillaume Pare) institution (Genetic and Molecular Epidemiology Laboratory, McMaster University, Hamilton, Canada) at their seminar about my research. |
Year(s) Of Engagement Activity | 2018 |
Description | School Visit on STEM day |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Schools |
Results and Impact | Gave a presentation and interactive question/answer session to S4 students at the Coatbridge Highschool STEM day on 19.02.2020. |
Year(s) Of Engagement Activity | 2020 |