Objective extraction of protein conformational landscapes for SAR through deep learning on large-scale X-ray crystallographic ligand-protein data

Lead Research Organisation: UNIVERSITY OF OXFORD
Department Name: Sustain Approach to Biomedical Sci CDT

Abstract

Recent advances in computational structural biology heavily rely on accurate protein structures and deep geometric learning algorithms, such as AlphaFold3. While conventional protein structure models offer many advantages and impressive results, the primary output of X-ray crystallography is electron densities. These densities contain valuable information about energetically favorable protein conformations and the binding of ligands. Additionally, this data offers an implicit understanding of charge distribution in continuous space around the protein. By combining ensembles of electron densities from crystallographic fragment screening experiments and distributions over models from machine learning methods, we can further explore possibilities for evaluating structure-activity relationships (e.g., activity cliffs), sampling the conformational space of the binding pocket, identifying hidden pockets, and pinpointing crystallographic water molecules. The fact that significant signals from experiments in macromolecular crystallography remain unmodelled has long been known from the discrepancy between observed and predicted structure factors, known as the R-factor gap. Studies have shown that this gap is largely due to model failures rather than experimental data limitations, and improved structure modeling approaches have helped unlock insights into catalysis, molecular recognition, and allostery. Developing methods using electron density data instead of refined structures may help unlock hidden information about short-range interactions, which is crucial for structure-based drug design. Accurately modeled contacts hugely affect the efficiency of virtual screenings and de novo design, as demonstrated in many studies.
During the short rotation project, we developed DiffEMA, an equivariant graph neural network diffusion model for experimental model assessment and auto-completion. The approach involves the generation of amino acid structures conditioned on electron density maps, which offer a less biased representation of experimentally observed diffraction patterns and allow the sampling of possible amino acid conformations from the prior distribution. Though early in development, the method has already demonstrated the tremendous power of this approach by showing generalization to other proteins after being trained on a fragment screen of a single example protein.
In the DPhil project, we will focus on developing methods and automated tools to analyze protein structures obtained from high-throughput crystallographic fragment screenings conducted at XChem. By integrating electron density maps, which provide a less biased representation of diffraction patterns, with experimental models, we aim to comprehensively map conformational diversity by sampling the underlying distribution resulting from thermal motion and binding events. Learning prior conformational distribution will help understand small conformational changes leading to the formation of cryptic pockets, which is impossible when using single structures acquired by regression-based refinement. Additionally, implementing deep geometric learning techniques will help identify crucial low-level electron density features, leading to a better understanding of structure-activity relationships and improving the efficacy of drug discovery efforts.
This project falls within the EPSRC's ""Computational and theoretical chemistry,"" ""Software engineering,"" and ""AI and Data Science for Engineering, Health and Government"" research areas and ties strongly to the council's outlined strategies within these fields. This research is highly interdisciplinary and will include collaborations with beamline scientists, crystallographers, and medicinal chemists.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S024093/1 30/09/2019 30/03/2028
2882322 Studentship EP/S024093/1 30/09/2023 29/09/2027 Hamlet Khachatryan