FUNCLAN - FUNctional annotations through Conformational Landscape Analysis

Lead Research Organisation: European Bioinformatics Institute
Department Name: MSCB Macromolec, structural and chem bio

Abstract

The dynamic nature of proteins leading to multiple conformational states is critical in many biological processes from forming macromolecular complexes with other proteins, small molecules (ligands) or nucleic acids to switching between active and inactive forms for enzymatic activity. To gain improved mechanistic insights into the function of proteins, structural characterisation of their three-dimensional (3D) structures and their conformational states is critical. Knowledge of the transition between different energetically favoured conformational states is fundamental to the understanding of the principles of protein structure and evolution and can help in explaining the effects of genetic variants, in designing new drug molecules and in elucidating drug resistance at the molecular level.

Although the PDB has archived more than 165,000 individual structures, the number of unique proteins based on the number of UniProt accession cross-references grows at a slower pace and totals only ~50,000, with a considerable variation in the redundancy rate amongst different sequences. This is because each protein may have multiple representatives in the PDB: ligand-bound and unbound forms; structures in multiple space groups or sample conditions; in complex with other macromolecules (proteins or nucleic acid) or structures determined of smaller domains or sequence variants. Thus, the structures in the PDB provide a valuable resource for understanding the conformational flexibility of ligand binding sites, individual protein molecules as well as large macromolecular machines. Understanding the similarities and differences in ligand binding sites, individual protein molecules and the large macromolecular complexes using the ensemble of available structures can assist in deciphering the molecular level details of macromolecular function. The availability of data on distinct conformational states will also assist in characterising the particles in whole-cell tomograms, thus allowing molecular phenotyping of whole cells in different disease or development states.

In this project we will enhance GESAMT, the structure comparison algorithm, to derive conformational flexibility of ligand binding sites, individual proteins or domains and macromolecular assemblies. The new framework, FUNCLAN, will include the necessary metrics to realise meaningful clustering and the necessary scheme to describe the structural similarities and differences between members of different clusters. Each cluster will have a representative structure and using the structural and functional annotations from PDBe-KB, we will characterise each cluster and provide biological context. The new functionality will be validated against a dataset of known examples from the literature of macromolecules and complexes exhibiting specific conformational states. A pipeline for a PDB archive-wide clustering of ligand binding sites, individual macromolecules and macromolecular complexes will be implemented. The resulting data will be made available programmatically via a REST API, an FTP site, and also via a novel web-based application.

Technical Summary

We will develop FUNCLAN, a framework to provide comparative analyses of conformations and associated annotations. This will be achieved through major improvements to the superposition software GESAMT and will deliver a robust process for superposing and clustering of macromolecular assemblies, protein chains and ligand-binding sites across the Protein Data Bank archive. We will perform a comprehensive analysis of the clustered molecular entities, link them to experimental validation information and map annotations of biological and biophysical contexts to them. We will use this enriched and integrated data to refine the superposition and clustering processes, provide a representative structure for each cluster and design metrics that can be used to evaluate clustered assemblies, protein chains or ligand-binding sites. The FUNCLAN framework will support superposing macromolecular assemblies, where the challenge is partly due to the possibility of changes in topologies accompanied by changes in the conformation of individual components. The project will tackle challenges unique to the superposition of ligand-binding sites, such as superposing amino acid residues interacting with the same small molecule versus superposing small molecules bound in different binding sites.
The project will provide:
1. Software suite and web server for analysing assemblies, proteins chains and ligand binding sites;
2. High-quality manually curated benchmarking datasets of conformational clusters and their biological and biophysical annotations;
3. A robust and iteratively improved pipeline for superposing macromolecular assemblies, proteins chains and ligand-binding sites;
4. Data standards and evaluation metrics for superposed and clustered molecular entities;
5. Clustered molecular entities linked to their validation information and their biological annotations which will be made available programmatically via API and will be displayed on the PDBe-KB entry page

Publications

10 25 50
 
Title Conformational states benchmark dataset 
Description Included is a manually curated dataset of monomeric proteins in distinct open-closed conformations. Where present, intermediary conformers have also been noted. The dataset can be downloaded from http://ftp.ebi.ac.uk/pub/databases/pdbe-kb/benchmarking/distinct-monomer-conformers/benchmarking_monomeric_open_closed_conformers.csv 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact This is a benchmark dataset. 
URL https://ftp.ebi.ac.uk/pub/databases/pdbe-kb/benchmarking/distinct-monomer-conformers/
 
Title PDB superposition dataset 
Description Superposed structures for the whole PDB archive. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact The superposed structures are displayed on PDBe-KB web pages. 
URL https://ftp.ebi.ac.uk/pub/databases/pdbe-kb/superposition/
 
Description FunCLAN & ELIXIR 3D-BioInfo 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution We provide benchmark datasets to the ELIXIR 3D-BioInfo community, Activity III.
Collaborator Contribution They provide feedback and interesting use cases which might be relevant to the FunCLAN project.
Impact No specific outcomes yet.
Start Year 2022