Leveraging the impact of diversity in neurodevelopmental disability by integrating machine learning in personalized interventions.

Lead Research Organisation: EMBL - European Bioinformatics Institute

Department Name: Open Targets

Abstract

Neurodevelopmental disability (NDD), which is an umbrella term for autism, attention deficit, and intellectual and learning disability, affects 13% of the population. It has major economic and quality-of-life impacts on NDD individuals and families, and substantial economic burden on the healthcare system. So far, treatment is aimed only at general symptoms, which often leads to low efficacy and frequent side effects.

The advent of novel genetic testing methods has provided plenty of evidence of the major impact that genes and their regulation have on clinical presentation in NDD. Nonetheless, there is a large diversity among individuals with NDD, even with the same genetic mutation. This is not unique to NDD as it is seen widely in many other medical conditions. The complexity derived from the genetic heterogeneity and the clinical (neuro) diversity has proven challenging to traditional approaches for treatment.

Recent research in the UK and Canada has led to the development of large databases recording detailed information about individuals with NDD. Artificial intelligence (AI) now provides us with the tools to quickly analyze the information in those datasets. In particular, we will use machine learning (ML) to manage complex information, leading to the acceleration and better prioritization of interventions. Also, our project takes a novel view on the understanding of genomic information in NDD. Instead of directing our focus only on exploring data from a single individual or small group of individuals carrying the same gene mutation, our team will apply ML to large databases to identify features (from genes and their biology) correlated with improved clinical outcomes.

In addition, we will use ML to better understand the interdependence between different symptoms to develop treatments that have a globally positive impact. In other words, we would find solutions that improve cognitive skills without impacting sleep negatively or generating more anxiety, as has been seen in previous clinical trials.

We will finish by providing the entire scientific community with an open access portal, including our research findings, which will be integrated with the current Open Targets platform, a partnership between academia and industry in the UK that allows researchers to access linked data on diseases, genes and drugs in a single site. Researchers will be able to provide further information, which will improve the ML model.

To ensure that we accomplish our objectives, we have assembled a team of experts in clinical and genetics of NDD: Dr. Bolduc (Canada); in computer science of genomics, molecular and pharmacological data: Dr. Dunham (UK); bioinformatics: Dr. Droit; machine learning: Dr. Greiner; social sciences, patient engagement and health economic: Dr. Zwicker. Our team has also developed strong links with NDD patient and research organizations in Canada and the UK, which will provide insight throughout the project. We are supported by collaborators involved in family and government engagement, ethics and data management in the UK and Canada. The project will also be a unique opportunity for multidisciplinary international training.

Our project will show how ML can disassemble the complexity and diversity seen in NDD to develop more successful interventions. It will allow us to develop new ML approaches that will be readily applicable to other disorders where personalized interventions have been lagging behind diagnosis. More importantly, it will bring together families, society and scientists into a shared space where more and better information is exchanged. Finally, our project will embrace responsible implementation of data privacy and confidentiality while recognizing the need for data sharing to develop better interventions.

Planned Impact

Our project will increase awareness of the positive impact of machine learning (ML) in developing treatment informed by patients. While most people associate ML with self-driving cars and facial recognition, its enormous impact in pharma remains largely unknown to the public. Yet ML can help us to quickly process and reliably exchange vast amounts of accurate, relevant and timely information amongst an array of diverse knowledge users. Moreover, the techniques applied here will be immediately transferable to research on the genetic and phenotypic influences on other rare and common disorders.

Our project will show that ML can enhance expert ability to understand complexity and diversity related to neurodegenerative disability (NDD). Affecting 13% of the population, NDD represents a large group of genetically heterogenous disorders with overlapping clinical features 1,4. Thus, the development of drugs for NDD is extremely slow and costly. Our project will show how ML can process large datasets and identify the important targets for treatment in a maximum number of individuals.

Our program will showcase that ML allows for a rational and cost-effective prioritization of candidate treatments. The main scientific impact of our project will be to allow researchers to prioritize candidate treatments for NDD using the input of human genomic data, and limit the unnecessary exposure of children with NDD to drugs. In addition, it will avoid the repeated failure of clinical trials that rely too much on trial and error.

Our project will illustrate how confidentiality and privacy are respected while using ML responsibly. With recent events, including the use of large amounts of data and AI, a negative view of ML has developed with the public about respect of privacy. Our team therefore includes a major focus on ethics and individual privacy. It's a focus that we share with all investigators and knowledge users because we believe that, not only must privacy be ensured, but it must be seen to be ensured. We will also develop protocols, in collaboration with data privacy expert Dr. Mouratidis (Brighton,UK) for harmonization between UK (General Data Protection Regulation-GDPR)4 and Canadian (Personal Information Protection and Electronic Documents Act-PIPEDA) 5 datasets that will serve as a model for future international data sharing.

Our project will build capacity in our understanding of the genomic basis of NDD. Dr. Dunham, as Director of Open Targets has created a unique platform combining the functional data necessary for this project. In his clinical practice and research lab, Dr. Bolduc has developed a very successful rapport with NDD individuals and their families, especially understanding their needs and conditions. His lab has also developed good relationships with artificial intelligence (AI) experts in searching for data on NDD. By connecting with international colleagues and using machine learning on large databases, Drs. Dunham and Bolduc and other team members will substantially accelerate their capacity building. The result, in both the UK and Canada, will be the training of several highly qualified people, development of better investigative techniques and significant advances in our understanding of NDD.

Finally, our project will build synergies between researchers/clinicians and families of those with NDD and allow for a sustained development of data stored and managed responsibly.

Funded Value:

£379,892

Funded Period:

Feb 20 - Jul 24

Funder:

FIC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

ES/T013435/1

Principal Investigator:

Ian Dunham

Ellen Mary McDonagh

Research Subject:

Info. & commun. Technol. (20%)

Omic sciences & technologies (80%)

Research Topic:

Artificial Intelligence (20%)

Functional genomics (20%)

Genomics (20%)

Metabolomics / Metabonomics (20%)

Proteomics (20%)

Organisations

People	ORCID iD
Ian Dunham (Principal Investigator)
Ellen Mary McDonagh (Principal Investigator)	http://orcid.org/0000-0001-5806-6174
Francois Bolduc (Co-Investigator)
Russell Greiner (Co-Investigator)	http://orcid.org/0000-0001-8327-934X
Arnaud Droit (Co-Investigator)
Jennifer Zwicker (Co-Investigator)	http://orcid.org/0000-0002-0722-5929

Publications

Author Name

Title Publication Date Published

10 25 50

Cuppens T (2023) Sex difference contributes to phenotypic diversity in individuals with neurodevelopmental disorders. in Frontiers in pediatrics

Cuppens T (2023) Developing a cluster-based approach for deciphering complexity in individuals with neurodevelopmental differences. in Frontiers in pediatrics

Key Findings
Research Databases and Models
Collaboration
Engagement Activities


Description	Please note that this award has been extended until July 2024, and so these key findings are not the complete final outcomes of the award. Objective 1: Identifying genes with modifier effect in the individuals affected with global developmental delay (GDD) in large scale DDD cohort. Key findings: A list of 2113 genes previously known to be associated with global developmental delay (GDD) were curated manually from disease databases (such as OpenTargets, DisGeNet etc) and the scientific literature. Rare coding variants (MAF <=1%, CADD>=25, missense, protein truncating) pertaining to these GDD-linked genes were retrieved from DDD individuals having phenotypic severity mild (n=639) and severe (n=1038). Burden analysis of these rare variants (collapsing by gene strategy) between mild and severe individuals resulted in 43 significant genes (FDR <=0.05). Among these, 10 genes were unique and were not found in the DDG2P curated list (retrieved on Jan 2024). Three of these genes were found in DDG2P list previously curated for Intellectual disability (ID) and Developmental Delay (DD). Remaining 30 genes were found in DDG2P list previously curated for disorders other than GDD. Variants of the genes that are shared between mild and severe individuals were further prioritized for functional validation (currently work under progress as part of the collaboration). Objective 2: Leveraging unsupervised machine learning models to detect gene interaction networks in individuals with neurodevelopmental differences. Key findings: Development of a probabilistic model-based clustering approach that captures dependencies between phenotypes, variants/genes and individuals. This can be further used in conjuction with established clustering methods such as k-means and hierarchical clustering to identify underlying gene networks. Objective 3: Investigating the impact of rare coding variants linked to Intellectual disability genes in healthy adults: exome wide analysis of UKBB cohort (n=450K) Key findings: Monoallelic genes (n=573) linked to Intellectual disability were collected from a curated list of genes from DDG2P and PanelApp resources. Rare coding heterozygous variants (MAF <=1%), deleterious (CADD >=25) in highly intolerant ID genes (pLI >=0.90) under separate mutational classes: missense, protein truncating and synonymous were retrieved from the UKBB individuals (n=450K). Formulation of a linear regression based statistical model was carried out to explain how in healthy individuals these variant classes collectively impact cognitive functioning/performances (measured by set of 9 cognitive tests such as education attainment, reaction time test, fluid intelligence test etc). Regression analysis resulted in enrichment of rare coding PTVs, missense and synonymous variant classes in 6 genes negatively impact cognitive performances (p <10e-5). Including the polygenic risk scores (PRS) of common variants along with the rare variants in the regression model, significantly affected the cognitive test scores. This clearly indicate the background role of common variants as modifiers. Please explain for a non-specialist audience what has been discovered or achieved as a result of the work funded through this award: Global developmental delay (GDD)/Intellectual disability (ID) is complex disorder and primarily characterized by a delay in achieving developmental milestones in pertaining to motor skills, speech and language or cognitive skills etc. The prevalence of GDD/ID in general population is 1-3%. Both the genetic and environmental factor contribute towards manifestation of GDD/ID in an individual. Advent of next generation sequencing (NGS) technologies has enabled discovery of range of genetic variants (monoallelic mode of inheritance) linked clinically to GDD/ID. However, there are individuals who carry the same genetic variants but are asymptomatic or have different range of GDD/ID severity (mild, moderate, severe, profound). This could be due to the role of genetic variants in other genes that provide protection against the disease-associated variants, or modify the effect of the disease-associated variant. Hence, the core objective of this grant is to leverage the artificial intelligence (AI) or machine learning (ML) techniques to identify such modifier genes/genetic variants pertaining to protection from GDD/ID. Firstly, from the DDD cohort, using a simple genetic burden analysis we identified set of fully penetrant, rare genetic variants that were significantly enriched between individuals with milder and severe GDD/ID. In order to explain why certain individuals who carry same genetic variants have milder GDD and some have severe GDD, we implemented a discriminatory ensemble AI/ML models to identify interacting partners of these rare genetic variants. The work is currently in progress as part of the extension of the grant. Secondly, from the large scale UK Biobank cohort (n=450K individuals) of an adult healthy population, we identified a set of rare protein coding genetic variants with reduced penetrance which negatively impacted the performances of individuals in a given set of cognitive tasks. Through our rigorous statistical model, we found that the interaction of common variants with rare variants affects cognitive performance. This could be attributed to the role of multiple genetic variants that modulate the function of the driver genes. In summary, with both these approaches we tried to identify candidate genetic modifiers from large scale population cohort and their role in protecting from GDD/ID.
Exploitation Route	a) Candidate genes variants linked to GDD/ID identified from this analysis can be used for functional validation work b) Statistical modelling approaches incorporated in current work to integrate rare and common variants can be easily adapted to studying modifiers in other complex disorders such as cardiovascular diseases.
Sectors	Construction Healthcare
URL	https://doi.org/10.3389/fped.2023.1171920


Title	Analysis of large scale exome sequencing datasets
Description	Variant annotation and filtering, filtering for a curated set of intellectual disability genes and mode of inheritance, rare variant association analysis (RVAS), genome wide association analysis (GWAS), polygenic risk score (PRS) using exome sequencing from DDD cohort (n=15,000), UKBB (n=450,000), SFARI (n=100,000).
Type Of Material	Data analysis technique
Year Produced	2024
Provided To Others?	No
Impact	Annotation sources that can be used to annotate future exome/genome datasets.A curated set of intellectual disability-related genes that will help in analysing CNVs from exome/genome sequencing that overlap these genes.


Description	Leveraging AI to identify genetic modifiers pertaining to Neurodevelopmental disorders (NDD)
Organisation	University of Alberta
Country	Canada
Sector	Academic/University
PI Contribution	Data acquisition: A joint application together with our collaborators for accessing exome/genome data set from a large scale cohort which includes from Deciphering Developmental Disorders (DDD), Simon Fraser Autism Research Institute (SFARI) and UK Biobank (UKBB). Data analysis pipeline: An exomes sequencing pipeline which includes steps of variant calling, annotation and filtering was developed at our institute (EBI). All these large scale cohorts were processed using in-house high-performance cluster (HPC) network systems. Machine learning/AI methodology: (a) An unsupervised machine learning (UML) approach based on a probabilistic model to cluster genes and phenotypes was developed by our research team. Eventually, this model enabled the detection of distinct clusters of genes which explained the diversity in neurodevelopmental disorders in the affected patients in DDD cohort. (b) Our research team contributed towards generation of annotation matrices (Gene Ontology, Pathway, PPI etc) that would be used inside the ensemble based machine learning model (implemented by the collaborators) to predict the severity of patients phenotypes.
Collaborator Contribution	Data acquisition: A joint application together with our collaborators for accessing exome/genome data set from a large scale cohort which includes from Deciphering Developmental Disorders (DDD), Simon Fraser Autism Research Institute (SFARI) and UK Biobank (UKBB). Machine learning/AI methodology: An ensemble based ML pipeline was developed by our collaborators (Prof. Arnaud Droit research group) which were customised to learn annotation features of variant/genes etc that were found to be associated with patients affected with global developmental delay (GDD) from DDD cohort. Functional validation: Collaborators contributed for functional validation of variants/genes that were identified in patients from DDD cohort having mild and severe GDD phenotypes.
Impact	Conference presentations: ASHG 2022, ESHG 2023, Research Summit GeL 2023 Papers Published so far: Cuppens, Tania, Manpreet Kaur, Ajay A. Kumar, Julie Shatto, Andy Cheuk-Him Ng, Mickael Leclercq, Marek Z. Reformat, Arnaud Droit, Ian Dunham, and François V. Bolduc. "Developing a cluster-based approach for deciphering complexity in individuals with neurodevelopmental differences." Frontiers in Pediatrics 11 (2023). Cuppens, T., Shatto, J., Mangnier, L., Kumar, A.A., Ng, A.C.H., Kaur, M., Bui, T.A., Leclercq, M., Dunham, I., Droit, A. and Bolduc, F., 2023. Sex difference contributes to phenotypic diversity in individuals with neurodevelopmental disorders. Frontiers in Pediatrics, 11, p.1172154.
Start Year	2020


Description	Genomics England Research Summit
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Patients, carers and/or patient groups
Results and Impact	Engaging with patient group members at the Genomics England Research Summit to explain the research being carried out: our postdoctoral fellow Ajay Kumar presented a poster describing the research being undertaken for this project at the Genomics England Research Summit, which included explaining the research to a broad attendee audience including patient parents/participants.
Year(s) Of Engagement Activity	2023

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications