📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

FUNCLAN - FUNctional annotations through Conformational Landscape Analysis

Lead Research Organisation: Science and Technology Facilities Council
Department Name: Scientific Computing Department

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

We will develop FUNCLAN, a framework to provide comparative analyses of conformations and associated annotations. This will be achieved through major improvements to the superposition software GESAMT and will deliver a robust process for superposing and clustering of macromolecular assemblies, protein chains and ligand-binding sites across the Protein Data Bank archive. We will perform a comprehensive analysis of the clustered molecular entities, link them to experimental validation information and map annotations of biological and biophysical contexts to them. We will use this enriched and integrated data to refine the superposition and clustering processes, provide a representative structure for each cluster and design metrics that can be used to evaluate clustered assemblies, protein chains or ligand-binding sites. The FUNCLAN framework will support superposing macromolecular assemblies, where the challenge is partly due to the possibility of changes in topologies accompanied by changes in the conformation of individual components. The project will tackle challenges unique to the superposition of ligand-binding sites, such as superposing amino acid residues interacting with the same small molecule versus superposing small molecules bound in different binding sites.
The project will provide:
1. Software suite and web server for analysing assemblies, proteins chains and ligand binding sites;
2. High-quality manually curated benchmarking datasets of conformational clusters and their biological and biophysical annotations;
3. A robust and iteratively improved pipeline for superposing macromolecular assemblies, proteins chains and ligand-binding sites;
4. Data standards and evaluation metrics for superposed and clustered molecular entities;
5. Clustered molecular entities linked to their validation information and their biological annotations which will be made available programmatically via API and will be displayed on the PDBe-KB entry page

Publications

10 25 50
 
Description In this project, we developed a novel approach for identifying conformational changes in protein complexes using unsupervised machine learning, and implemented it in the FUNCLAN software. FUNCLAN quantifies structural similarities and differences in large complex datasets, related to the same target protein, enabling the clustering and classification of macromolecular structures to identify distinct conformational states. Originally designed to study concerted movements in large complexes, FUNCLAN can also be applied to single chains and protein-ligand binding sites. It detects correlations in the relative orientation of complex chains (or structural domains within chains), then scores and clusters them accordingly. Our advancement enables the extraction of dynamic information from macromolecular structures determined by X-ray crystallography and electron microscopy, found in the Protein Data Bank, as well as predicted with AI-based AlphaFold software. Such insights are crucial for understanding macromolecular interactions and ligand binding - key factors in structure-based drug discovery.

Currently, no existing tools match FUNCLAN's capability to analyze large protein datasets, available from 200,000 experimentally determined structures and 220 million AlphaFold-predicted structures. Our key achievement is the development of a universal, systematic, and robust approach for identifying and scoring protein conformations. We have translated this development into a scalable algorithm and software tool, rigorously calibrated, tested, and validated using manually curated data from diverse biological systems.

As a case study, we applied FUNCLAN to two critical protein complexes-the SARS-CoV-2 Spike Protein and Polytomella F-ATP synthase - demonstrating its capabilities and the breadth of insights it can generate. The corresponding publication is in its final stages of preparation.

FUNCLAN is already in use at the Protein Data Bank in Europe (PDBe), aiding in the curation and annotation of newly deposited structures.
Exploitation Route Our outcomes provide valuable insights for structural biologists and bioinformaticians across both academic and commercial sectors. They enable comprehensive comparative analyses of protein complexes, facilitating the exploration of a wide range of fundamental questions, including:

- Inferring biological function - Understanding the roles of proteins within cellular processes.
- Characterizing conformational states and transitions - Identifying the principal energetically favored structural conformations and the dynamic shifts between them.
- Determining structural flexibility and rigidity - Pinpointing flexible and rigid regions within protein structures to better understand their mechanical properties.
- Investigating protein binding sites - Gaining atomic-level insights into how proteins interact with ligands and other biomolecules.
- Assessing the impact of genetic variants - Explaining how mutations and polymorphisms influence protein structure and function.
- Facilitating drug discovery and design - Aiding in the development of novel therapeutic molecules by analyzing protein-drug interactions.
- Unraveling mechanisms of drug resistance - Elucidating how structural changes contribute to resistance at the molecular level.

Furthermore, our outcomes enhance the structural analysis of protein complexes by providing optimal structural alignments, allowing researchers to detect functionally relevant differences across related structures. This facilitates the clear and concise presentation of large datasets, making it easier to interpret structural variations and their biological implications.
Sectors Agriculture

Food and Drink

Digital/Communication/Information Technologies (including Software)

Pharmaceuticals and Medical Biotechnology

URL https://gitlab.com/CCP4/gesamt/-/tree/main/dcg-project/FunCLAN
 
Title FUNCLAN Module for alignment and superposition of macromolecular complexes 
Description The module performs alignment and superposition of protein complexes in 3 dimensions, under the assumption of medium to high homology of compared structures. While this problem is solved for covalently-linked macromolecular structures (polymeric chains), existing solutions cannot be applied to non-covalently bound complexes because there is no canonical ordering of chains in 3 dimensions. A novel algorithm has been developed such that equivalent chains between structures are found, their relationships analysed, measured, and categorised, giving a picture of how the relative positions of chains change when going from one structure to another. These are then used to give a similar picture, but for entire complexes. The algorithm only has two free parameters, the similarity threshold, and the function used for the hierarchical classification, which makes it very robust. The most expensive operations are IO, therefore, it is suitable for mass-screening the entire PDB archive containing ca. 200,000 entries. The algorithm is implemented in C++, optimised for maximum efficiency and is currently tested for sensitivity and selectivity of matches for a wide variety of structures. The application is used for automatic selection of homologous macromolecular complexes from the PDB for the subsequent coordinate analysis aimed at detecting and classification of conformational changes occurring at protein and ligand binding, as well as from the crystallisation in different symmetry groups. Furthermore, the underlying algorithm can be applied to secondary structures or domains instead of chains, which are avenues for future work. 
Type Of Technology Webtool/Application 
Year Produced 2023 
Open Source License? Yes  
Impact The software is being put in use and continuous development and improvement; no reportable impact has been generated to date 
URL https://gitlab.com/CCP4/gesamt/-/tree/main/dcg-project/FunCLAN