FUNCLAN - FUNctional annotations through Conformational Landscape Analysis

Lead Research Organisation: Science and Technology Facilities Council
Department Name: Scientific Computing Department

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

We will develop FUNCLAN, a framework to provide comparative analyses of conformations and associated annotations. This will be achieved through major improvements to the superposition software GESAMT and will deliver a robust process for superposing and clustering of macromolecular assemblies, protein chains and ligand-binding sites across the Protein Data Bank archive. We will perform a comprehensive analysis of the clustered molecular entities, link them to experimental validation information and map annotations of biological and biophysical contexts to them. We will use this enriched and integrated data to refine the superposition and clustering processes, provide a representative structure for each cluster and design metrics that can be used to evaluate clustered assemblies, protein chains or ligand-binding sites. The FUNCLAN framework will support superposing macromolecular assemblies, where the challenge is partly due to the possibility of changes in topologies accompanied by changes in the conformation of individual components. The project will tackle challenges unique to the superposition of ligand-binding sites, such as superposing amino acid residues interacting with the same small molecule versus superposing small molecules bound in different binding sites.
The project will provide:
1. Software suite and web server for analysing assemblies, proteins chains and ligand binding sites;
2. High-quality manually curated benchmarking datasets of conformational clusters and their biological and biophysical annotations;
3. A robust and iteratively improved pipeline for superposing macromolecular assemblies, proteins chains and ligand-binding sites;
4. Data standards and evaluation metrics for superposed and clustered molecular entities;
5. Clustered molecular entities linked to their validation information and their biological annotations which will be made available programmatically via API and will be displayed on the PDBe-KB entry page

Publications

10 25 50
 
Title FUNCLAN Module for alignment and superposition of macromolecular complexes 
Description The module performs alignment and superposition of protein complexes in 3 dimensions, under the assumption of medium to high homology of compared structures. While this problem is solved for covalently-linked macromolecular structures (polymeric chains), existing solutions cannot be applied to non-covalently bound complexes because there is no canonical ordering of chains in 3 dimensions. A novel algorithm has been developed such that equivalent chains between structures are found, their relationships analysed, measured, and categorised, giving a picture of how the relative positions of chains change when going from one structure to another. These are then used to give a similar picture, but for entire complexes. The algorithm only has two free parameters, the similarity threshold, and the function used for the hierarchical classification, which makes it very robust. The most expensive operations are IO, therefore, it is suitable for mass-screening the entire PDB archive containing ca. 200,000 entries. The algorithm is implemented in C++, optimised for maximum efficiency and is currently tested for sensitivity and selectivity of matches for a wide variety of structures. The application is used for automatic selection of homologous macromolecular complexes from the PDB for the subsequent coordinate analysis aimed at detecting and classification of conformational changes occurring at protein and ligand binding, as well as from the crystallisation in different symmetry groups. Furthermore, the underlying algorithm can be applied to secondary structures or domains instead of chains, which are avenues for future work. 
Type Of Technology Webtool/Application 
Year Produced 2023 
Open Source License? Yes  
Impact The software is being put in use and continuous development and improvement; no reportable impact has been generated to date 
URL https://gitlab.com/CCP4/gesamt/-/tree/main/dcg-project/FunCLAN