Simple questions about neurodegenerative disease: Where? When? What?

Lead Research Organisation: Imperial College London
Department Name: Brain Sciences

Abstract

Alzheimer's disease is largely caused by genetics, rather than lifestyle factors. The probability of an identical twin developing the disease, if their co-twin already has is ~79%. By comparing the DNA of people who develop a disease against healthy people, we can identify sites in their DNA which increase the likelihood of getting the disease. In recent years it has been established that thousands of changes to the human genome can contribute to disease risk. Each DNA variant explains only a tiny part of the disease risk, but thousands of these variants can explain much of an individual's disease risk. Biological processes implicated by these loci can be understood as being causally involved in the disease. Understanding the mechanisms of neurodegenerative diseases has thus become a statistical problem: we just need to find the patterns that link the variants together.

There are many open questions about neurodegenerative diseases. Indeed, for most brain diseases, we do not even have answers to seemingly basic questions: which part of the brain has gone wrong, and at which age did this occur? For neurodegenerative diseases, which do not show symptoms until late life, it would be easy to assume that the disease causing changes occur in late life: several lines of evidence suggest that this may not be entirely the case though, and that changes which occur early in brain development may be involved.

In recent years I used genetics to identify the cell types which cause schizophrenia and to explain why it's age of onset occurs in early adulthood. This was done by showing that the variants which cause the disease preferentially affect genes which act in particular cells. Using a similar approach, I published the first paper showing that Alzheimer's genes have enriched expression in a type of cell called 'microglia' which are thought of as the immune system of the brain. This finding was a surprise to the field as Alzheimer's had traditionally been considered a neuronal disease. Contrary to this expectation, no enrichment of Alzheimer's risk genes has been found in neurons. Recently I extended the approach to Parkinson's disease: for this disease, the results confirmed the dominant theory of disease mechanism (showing an enrichment in dopaminergic neurons) but also implicated a type of cell known as oligodendrocytes, which had never previously been associated with the disease. The first part of this project will involve further investigating these two discrepancies: are neurons not involved in Alzheimer's, and are oligodendrocytes involved in Parkinson's?

Once we have identified the cell types which cause neurodegeneration, we can address the issue of the age at which the disease acts. We know that the behaviour of cells changes across the lifespan, but we don't know how or why this occurs molecularly. First we will develop a 'map' of the changes which occur in the causal cell types, then we'll test whether the disease associated genetic variants are more associated with development or old age.

What we really need to know is: what happens within a cell to cause the disease? Once we understand this, we can develop drugs to reverse this process. We can investigate this using statistics again (although we will again need plenty of data describing biological regulatory processes within the relevant cell types). Genetic variants which cause disease are known to mostly act by disrupting molecular interactions between DNA and proteins/RNA. Sites on DNA where other molecules bind tend to have distinct sequences: we can learn these sequences from publicly available datasets, then predict the effect of genetic variants on molecular interactions. We can then test statistically, whether a particular type of molecular interaction (for instance, the binding of a particular transcription factor) has been disrupted. Once identified, we can begin attempts to reverse disease effects on these binding sites.

Planned Impact

Seek commercialisation

The first four years of this project will be focused on identifying what the neurodegenerative diseases are. The next phase will be to investigate how these mechanisms can be manipulated. The results found during the first phase will enable generation of focused efforts aimed at IP protection and commercialisation to begin. In aim 1.vii I will have identified cell types associated with rare disease genes: I will seek funding and patents for commercialising gene therapies to treat these conditions. I plan to have identified master regulators (i.e. transcription factors) involved in neurodegenerative disease during aim 2 then validated their therapeutic usage throughout aims 3--6.

Furthermore, all staff within my lab will be actively encouraged to consider founding start-ups and provided with training. I have prepared a welcome pack for my lab which emphasises my enthusiastic attitude towards on start-ups: it encourages lab members to look into early stage biotech accelerators such as Rebel Bio, Y-Combinator, Entrepreneur First and Age1.

Act as a bridge between genetics and other scientific fields

The rapid advances in genetics of recent years, should disrupt much existing disease research in biology, by putting our knowledge of disease mechanisms onto a more solid foundation. An example of how this can work comes from my work on Alzheimer's: in 2016 I published the first paper showing microglia are the primary cell type associated with the disease's genetics, and in 2019 a substantial fraction of researchers recruited to the UK's Dementia Research Initiative work on this. Today though, too many results from genetics remain trapped within papers specific to that community. By working directly with neuroscientists and writing papers intended to be read by biologists, I hope to accelerate the rate of information flow between these two disciplines. I will also work to facilitate engagement of quantitative researchers and AI specialists with biological problems.

Build on shared interests with representatives of industry and venture capital

My goal is that by after this fellowship I will be able to setup a public-private initiative, similar to Open Targets, but focused on consolidating the advances made through this project, and rolling its strategy out to other complex disorders. The purpose of this would be to share the cost of data generation required to ensure that the right mechanisms are being targeted, such that private investments can be made in genetically supported target mechanisms. If this approach could work for one complex trait, it is likely it would work for most others. For this to become feasible it will be necessary to build relationships with key figures throughout industry who understand the potential of genomics for neuroscience. To enable this, I will organise a symposium once key milestones have been achieved.

Develop VR data 'expedition' to engage the public and patient groups

To build awareness of my research amongst funders, charities and patient groups involved in neurodegenerative disease I will take part in a range of public engagement activities. A virtual reality app will be developed to showcase my work at these events. This will involve showing a 3D UMAP visualisation with clusters, marker gene expression and morphologies overlaid

Facilitate larger studies of this kind in the future by enabling consortia and support organisations

I will become involved with relevant consortia including the Human Cell Atlas (HCA), iPDGC and iGAP. I will support the HCA's open data requirements, uploading all data before publication. I will continue to make all my software available through Github, NF-CORE and TERRA where applicable. I have previously released a number of packages via Github (e.g. EWCE and MAGMA_Celltyping) and these have been used independently in many publications. Tutorials and worked examples will be provided for all published code.

Publications

10 25 50
 
Description We have obtained two commercialisation grants to work on development of a protocol kit, based on single cell epigenetics work done in the lab
First Year Of Impact 2022
 
Description Amplifying genome coverage of single cell epigenetic profiling of the human brain
Amount £10,000 (GBP)
Funding ID PA0227 
Organisation UK Dementia Research Institute 
Sector Charity/Non Profit
Country United Kingdom
Start 11/2021 
 
Description UKRI IAA Healthy Society Award
Amount £80,068 (GBP)
Organisation United Kingdom Research and Innovation 
Sector Public
Country United Kingdom
Start 09/2022 
End 09/2023
 
Description Amplifying genome coverage of single cell epigenetic profiling of the human brain 
Organisation Cardiff University
Country United Kingdom 
Sector Academic/University 
PI Contribution They have access to and experience with the iCell8
Collaborator Contribution We have setup a protocol for linear amplification of DNA from small epigenetic samples (TIP-SEQ) and aim to set it up on the iCell8
Impact A £10,000 grant from the UK DRI COLLABORATIVE SINGLE CELL AND SPATIAL TRANSCRIPTOMICS STUDIES AWARD PROGRAMME
Start Year 2021
 
Description Co-supervision of Guiseppe Alessandro D'Agostino 
Organisation Nanyang Technological University
Country Singapore 
Sector Academic/University 
PI Contribution Co-supervision of Guiseppe
Collaborator Contribution Funding of a postdoctoral fellowship
Impact Guiseppe received an LKC Dean's Postdoctoral Fellowship to develop single cell RNA-seq methods under my co-supervision
Start Year 2020
 
Description Collaboration with Bart de Strooper on analysis of scRNA-seq data 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution We analysed their snRNA-seq data to evaluate whether microglial nuclei are depleted of disease associated transcripts
Collaborator Contribution They generated the data and performed other analyses
Impact https://www.sciencedirect.com/science/article/pii/S2211124720311785
Start Year 2020
 
Title MungeSumstats 
Description The *MungeSumstats* package is designed to facilitate the standardisation of GWAS summary statistics. It reformats inputted summary statisitics to include SNP, CHR, BP and can look up these values if any are missing. It also removes duplicates across SNPs. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact GWAS summary statistics are the result of a huge investment in biology. However, the format that they get released in has never been standardised. MungeSumstats enables the huge variety of formats to be automatically munged into a standard format. This enables both the development of new tools for working with GWAS sumstats. It also enables larger scale analyses across GWAS to be performed. 
URL https://bioconductor.org/packages/release/bioc/html/MungeSumstats.html
 
Title OrthoGene 
Description orthogene is an R package for easy mapping of orthologous genes across hundreds of species. It pulls up-to-date interspecies gene ortholog mappings across 700+ organisms. It also provides various utility functions to map common objects (e.g. data.frames, gene expression matrices, lists) onto 1:1 gene orthologs from any other species. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact Standardising interspecies gene mapping makes development of R packages which integrate data across species easier 
URL https://bioconductor.org/packages/release/bioc/html/orthogene.html
 
Title Rworkflows 
Description GitHub Actions are a powerful way to automatically launch workflows every time you push changes to a GitHub repository. This is a form of Continuous Integration (CI), which helps ensure that your code is always working as expected (without having to manually check each time). Here, we have designed a robust, reusable, and flexible action specifically for the development of R packages. We also provide an R function to automatically generate a workflow file that calls the rworkflows composite action: Currently, rworkflows action performs the following tasks: Builds a Docker container to run subsequent steps within. Builds and checks your R package (with CRAN and/or Bioconductor checks). Runs units tests. Runs code coverage tests and uploads the results to Codecov. (Re)builds and launches a documentation website for your R package. Pushes an Rstudio Docker container to DockerHub with your R package and all its dependencies pre-installed. Importantly, this workflow is designed to work with any R package out-of-the-box. This means you won't have to manually edit any yaml files, just run the rworkflows::use_workflow() function and you're ready to go! 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Since it's recent launch, the lead developer (Brian Schilder) was invited to form a new bioconductor working group on cloud computing. This indicates that it is likely to become a core part of the bioconductor (and thus bioinformatics) infrastructure. 
URL https://github.com/marketplace/actions/rworkflows
 
Title combiz/scFlow: 0.7.1 (Pre-print pre-release) 
Description Pre-print version. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact nf-core/scflow is a bioinformatics pipeline for scalable, reproducible, best-practice analyses of single-cell/nuclei RNA-sequencing data. The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. Full case/control sc/sn-RNAseq analyses can be orchestrated with a single line of code on a local workstation, high-performance computing cluster (HPC), or on Cloud services including Google Cloud, Amazon Web Services, Microsoft Azure, and Kubernetes. It uses Docker/Singularity containers making installation trivial and results highly reproducible. Each new release of nf-core/scflow triggers automated continuous integration tests which run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website. 
URL https://zenodo.org/record/5204514
 
Description Parkinson's UK Patient Group Workshop 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Patients, carers and/or patient groups
Results and Impact We organised a patient working group together with Parkinson's UK. This was somewhat novel for Parkinson's UK as our lab's research is not directly using humans/mouse-models etc. Four patients attended from across the country and have expressed an interest in continuing to be involved in our research.
Year(s) Of Engagement Activity 2022