📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

New horizons in computational structural biology with CCP4: multiple states, models and methods, use of AI and validation

Lead Research Organisation: University of Southampton
Department Name: Sch of Biological Sciences

Abstract

Macromolecules such as proteins, DNA and RNA mediate the vast majority of processes that constitute and sustain life, including photosynthesis, metabolism, exchange of information between cells, and cellular replication. These processes depend crucially on the dynamic 3D structures that macromolecules adopt. Insights from studying the 3D structures of macromolecules have transformed both our understanding of living systems and our ability to use that understanding to promote health and use in biotechnology.

Overwhelmingly, 3D structures were experimentally determined by macromolecular crystallography (MX, >85% of PDB entries), with additional contributions from nuclear magnetic resonance (NMR; dynamic, but typically limited to smaller structures) and rapidly growing input from electron microscopy (EM; typically membrane protein structures and macromolecular complexes, but studied under cryogenic conditions). The database of over 200,000 experimental structures is now informing deep learning approaches to predict macromolecular structure. In this exciting scientific landscape, MX plays a central discovery role, as well as validating and improving structure predictions, and complementing capabilities of other techniques.

The present proposal deals with current challenges in the computational aspect of MX. In the process of MX, a crystal, formed from billions of copies of the macromolecule, is used to diffract X-rays or electrons; computational techniques then determine the underlying atomic structure. Knowledge of the molecule's 3D structure not only allows us to understand its function but also critically to design chemicals to interfere with it. Pharmaceutical research depends on the accuracy of experimental structures as the basis for designing drugs to turn the molecules on or off, or tune their function when required. Whilst the experimental pipeline today is partially automated and thus tremendously successful, key future challenges remain.

Interactions of macromolecules with small and/or other macromolecules change their structure in ways that help to explain their function. There is now an opportunity to improve the strategies by which we capture and describe the family of structures that a macromolecule can adopt, especially with room-temperature methods. It is timely to develop tools that allow identification of different structural states present in a single experiment or in sets of related experiments. By recasting MX as a multi data-multi model process, this proposal will address a weakness that has previously limited the ability of MX to define the dynamics of macromolecules, and so to infer and predict functional properties.

The work proposed here will improve structure analysis from both X-ray diffraction - the current predominant technique - and electron diffraction - a technique that can work with far smaller crystals, and so extend the utility of MX. Moreover, we propose to harness the power of deep learning approaches into the process of structure determination and validation for proteins, carbohydrates, DNA and RNA, as well as complexes containing one or more of these molecule-types. With a multi-technique, multi-data and multi model approach, we aim to deliver a dynamic description of the macromolecules that is closer to life, and therefore more descriptive of their function.

The Collaborative Computing Project 4 was established in 1979 and continues to underpin world class macromolecular structural science in the UK. Effective use of data collected at synchrotron, XFEL and electron microscopy facilities is at the heart of the project's mission. User communities benefiting from such research include academics as well as industries. At the interface of the two, CCP4 enables discoveries that underlie vaccine and therapy discovery (including therapies and vaccines for SARS-CoV-2) and may equally be applied to tackle modern challenges in biotechnology and adaptation to climate change.

Technical Summary

This proposal integrates across a number of approaches in a molecule-centric vision rather than the traditional discipline-centric approach. This reframing is made possible by the complementary expertise and skills of the collaborating research teams. Recognising current limitations, multi data - multi model scenarios are a major focus, and are tackled with novel and transformational approaches. These will allow better understanding of the small changes in data and in models that reflect the dynamic function of macromolecules. In this way, macromolecular crystallography (MX) will move beyond the static view of the classic one-dataset one-structure approach, addressing the challenge of using unmerged data and joint refinement techniques. The availability of novel AI based methods, in particular for RNA, and the fast development in electron diffraction are each exploited to complete the portfolio.

This proposal comprises four connected work packages
WP1 A statistical framework for analysing the significance of change in data as a function of ligand, time, dose or other state, with a view to giving live feedback to influence data collection and insight to use in structure refinement in WP2
WP2 Joint refinement of related structures; transformation of models into a common coordinate frame, refinement against unmerged intensities, modelling posttranslational modifications and ligands, and Bayesian decomposition of mixtures of states
WP3 Methods for electron diffraction data for the refinement of macromolecular structures, using machine learning approaches to filter out dynamical scattering, and procedures for taking crystal defects and inelastic scattering terms into account
WP4 Exploiting Deep Learning-based structural bioinformatics: use of covariance-based distance and contact predictions to validate protein and nucleic acid structures; development of rational editing of RNA models for molecular replacement

Publications

10 25 50