Developing Artificial Intelligence and Deep Learning for the analysis of correlation spectroscopy data

Lead Research Organisation: University College London
Department Name: Structural Molecular Biology

Abstract

Nuclear magnetic resonance (NMR) spectroscopy is an unprecedented technique to obtain detailed information - at atomic resolution - about macromolecular machines in an environment similar to the cell. NMR spectroscopy has therefore become an imperative tool for the characterisation of large proteins, for the discovery of new molecular interactions, and also for the discovery of new drug-leads.

Large proteins and macromolecular machines contain thousands of atoms. High-dimensional (3D, 4D, ..) NMR spectra are therefore required in order to separate the NMR signals for the individual atoms and to facilitate a characterisation large proteins. A common hurdle with high-dimensional NMR spectra is the time required to obtain these, because essentially a 1D NMR spectrum is required for each point within a 100x100 square (3D spectra) or within a 100x100x100 cube (4D spectra). This makes it very time-consuming and nearly impossible to obtain high-dimensional (> 3D) spectra for large proteins.

During the proposed project we will leverage the immense power and strength of Artificial Intelligence (AI) and Deep Learning to allow for fast acquisition of high-dimensional NMR spectra to characterise macromolecular machines. We will develop a new tool, where a new deep neural network will be designed and trained so that the required information about the macromolecule can be extracted orders of magnitude faster than using the traditional workflow because only a fraction of the points are recorded. For 4D spectra only about 1% of the points within the 100x100x100 cube need to be sampled. This is possible because the theoretical background of NMR spectroscopy is so well defied that sufficient training data easily can be generated to train the neural networks.

The new tools, anchored in deep learning and AI, will not only allow for fast and accurate characterisation of molecular interactions but will also facilitate ultra-high-dimensional NMR that will allow for completely new NMR ventures and for even larger molecular machines to be characterised.

Technical Summary

Non-uniformly sampled (NUS) NMR spectra has become important for obtaining ultra-high dimensional NMR spectra with high resolution. Being able to accurately reconstruct NUS NMR spectra allows for high-dimensional NMR spectra to characterise macromolecular machines, amongst others. Various algorithms have been developed to reconstruct the full dataset from sparsely sampled data, however, these are often slow and require many more sampled points than what theoretically is necessary.

The development of deep neural networks (DNN) have seen an impressive growth recently with many and highly different applications. The current biggest challenge for training complex DNNs is the availability of sufficient training data. Reconstruction and analysis of high-dimensional NUS NMR data are well suited for DNNs, because sufficient training data can easily be generated. It is therefore now time to take advantage of the immense potential of deep learning for the analysis of complex NMR spectra.

During the proposed research project DNNs will be designed and trained to analyse high-dimensional NMR spectra, such as, 3D triple resonance spectra and 4D methyl-methyl NOESY spectra. Whereas initial objectives aim at 'old-style' reconstruction, subsequent objectives aim at providing a multi-way decomposition of sparsely sampled high-dimensional NMR spectra. Such decompositions contain the same information as the fully reconstructed spectrum, but the information is concentrated and kept in shapes. Because of the high flexibility of deep learning and since a sufficient amount of training data can be produced, one can start to aim at entirely new perspectives for the analysis of NMR spectra where, for example, a trained DNN performs the entire analysis of a series of biomolecular NMR spectra in one single step. An objective is also to perform a combined analysis of triple-resonance NMR spectra to provide chemical shifts assignments of proteins - quickly, robustly and in a single step.

Planned Impact

The outcome of our proposed research is in the form of new Deep Neural Networks (DNNs) with associated optimised parameters to fast and reliably extract information from ultra-sparsely sampled high-dimensional NMR spectra.

Different mechanisms will be in place for disseminating our new tools to the identified beneficiaries. Firstly, we will publish our new deep learning analysis tools and their applications in archives such as arXiv.org and bioRxiv.org and in peer-reviewed international journals with as high impact as possible. Such publications will be available via open access to the international research community, including academia and industry, and to the general public. As we have done previously, we will aim at combining our developments with applications to challenging biological or biochemical problems, since publications of this combined nature generally will appeal to a broader audience and better showcase the strength of the developed tools. Both the PI and the PDRA will also present the results at national and international meetings; such meetings also allow for in-depth conversations that promote new collaborations.

Specific impact:
Researchers from both academia and the industrial sector in fields of structural biology, protein-ligand interactions, and NMR spectroscopy will benefit directly from our research. This group of researchers can immediately incorporate our new analysis tools into their own research programme. Specifically, our new tools will allow for substantially faster acquisition of NMR data because ultra-sparse sampling can be used, which in turn will make workflows faster and more efficient. Importantly, the automated chemical shift assignment tools, which we will develop, will allow for non-expert NMR spectroscopists to easily analyse protein-NMR spectra and obtain chemical shift assignments of proteins. These assignments can subsequently be used to determine structure, dynamics and quantify interactions, such as drug-protein interactions. It is anticipated that the proposed DNNs for automated analysis of sparsely sampled NMR spectra will be particularly beneficial in industrial settings.

The research proposed will also have a significant impact on the fields of artificial intelligence and deep learning. One of the biggest challenges for developing new neural network architectures is training data and having sufficient data to properly train and cross-validate a new network architecture. Using deep learning to analyse and decompose high-dimensional NMR spectra provides a unique example where (i) sufficient training data can easily be generated, since the theory behind NMR spectroscopy is well known and (ii) large amount of experimental cross-validation data can be generated using standard NMR experiments. Thus, the case of analysis of NMR spectra can form a robust example where new and even more elaborate deep learning architectures can be designed, improved, and cross-validated.

The proposed research will indirectly benefit the general public. For example, our previous NMR-based tools and methods were used to improve the stability of coagulation factor VIII for the treatment of haemophilia A, which is of great societal impact. It is highly likely that the tools developed during the proposed research will both make analysis of NMR spectra faster and also applicable to a broader audience. NMR spectroscopy is used in many fields of science, including, material science, chemistry, and drug-discovery. Our new tools will have the ability to impact on all areas where NMR spectroscopy is used, which subsequently will benefit the general public.

Publications

10 25 50
 
Description We have shown that Deep Neural Networks (DNNs) can be developed and trained to transform and otherwise analyse biomolecular NMR spectra. One application has been published in J. Biolmol. NMR, one published in J. Am. Chem. Soc, and one published on chemRxiv.

We also have several DNNs network that are currently going through final training and evaluations. These networks and applications will be published in the near future.

Generally, a key strength of the neural networks developed, compared to classical manual analysis, is their robustness and ability to work effectively in a wide range of scenarios without a requirement for further retraining and no user adjustable parameters. This flexibility paves the way for these neural networks to be incorporated as part of automated or semi-automated processing schemes and the use of deep learning analyses within the NMR community more generally. We are collaborating with several NMR centres worldwide (Francis Crick, University of Toronto, IISc India, ...) to implement these network in large settings.
Exploitation Route - Semi-automated analysis of complicated NMR spectra.
- Allow for the acquisition of ultra-high dimensional NMR spectra. We have developed an efficient compression that can be decoded efficiently by a deep neural network.
- Develop Deep Neural Networks for the analysis of time-domain spectroscopic data in general.
Sectors Digital/Communication/Information Technologies (including Software),Education,Healthcare,Pharmaceuticals and Medical Biotechnology

 
Description Non-academic publication [written in Danish]: http://paper.ipapercms.dk/TechMedia/DanskKemi/2021/
First Year Of Impact 2021
Sector Digital/Communication/Information Technologies (including Software),Education
Impact Types Societal

 
Description Collaboration with Bruker Biospin 
Organisation Bruker Corporation
Department Bruker (United Kingdom)
Country United Kingdom 
Sector Private 
PI Contribution This collaboration is about implementation of our recently developed Deep Learning tools for NMR spectroscopy into the standard software used by a majority of NMR spectroscopist worldwide (TopSpin). My group provides the trained deep neural networks and knowledge about the execution of these.
Collaborator Contribution Sill being negotiated.
Impact No output yet.
Start Year 2023
 
Description Collaboration with Bruker Biospin 
Organisation Bruker Corporation
Department Bruker BioSpin
Country Germany 
Sector Private 
PI Contribution This collaboration is about implementation of our recently developed Deep Learning tools for NMR spectroscopy into the standard software used by a majority of NMR spectroscopist worldwide (TopSpin). My group provides the trained deep neural networks and knowledge about the execution of these.
Collaborator Contribution Sill being negotiated.
Impact No output yet.
Start Year 2023
 
Description Collaboration with Francis Crick NMR centre. 
Organisation Francis Crick Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution We have been developing deep learning networks and trained these.
Collaborator Contribution The Francis Crick NMR centre has provided experimental NMR spectra, NMR allocations, and computational resources.
Impact Deep neural network, published here: https://doi.org/10.26434/chemrxiv.13295888.v2
Start Year 2020
 
Description NMRBox 
Organisation University of Connecticut
Department Health Center (Uconn Health)
Country United States 
Sector Academic/University 
PI Contribution The deep neural networks developed during to process biomolecular NMR spectra have been made available to the NMRBox, a global resource for biomolecular NMR software.
Collaborator Contribution The partner, NMRBox (nmrbox.org) is currently hosting our software and tools on their large computational resource, so researcher can easily and freely directly use our developed tools.
Impact Our tools are easily available to a large group science, internationally.
Start Year 2020
 
Title FID-Net: A Versatile Deep Neural Network Architecture for NMR Spectral Reconstruction and Virtual Decoupling 
Description Deep Neural Network for the reconstruction and homonuclear decoupling of NMR spectra 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Not yet.