📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

FUNCLAN - FUNctional annotations through Conformational Landscape Analysis

Lead Research Organisation: European Bioinformatics Institute
Department Name: MSCB Macromolec, structural and chem bio

Abstract

The dynamic nature of proteins leading to multiple conformational states is critical in many biological processes from forming macromolecular complexes with other proteins, small molecules (ligands) or nucleic acids to switching between active and inactive forms for enzymatic activity. To gain improved mechanistic insights into the function of proteins, structural characterisation of their three-dimensional (3D) structures and their conformational states is critical. Knowledge of the transition between different energetically favoured conformational states is fundamental to the understanding of the principles of protein structure and evolution and can help in explaining the effects of genetic variants, in designing new drug molecules and in elucidating drug resistance at the molecular level.

Although the PDB has archived more than 165,000 individual structures, the number of unique proteins based on the number of UniProt accession cross-references grows at a slower pace and totals only ~50,000, with a considerable variation in the redundancy rate amongst different sequences. This is because each protein may have multiple representatives in the PDB: ligand-bound and unbound forms; structures in multiple space groups or sample conditions; in complex with other macromolecules (proteins or nucleic acid) or structures determined of smaller domains or sequence variants. Thus, the structures in the PDB provide a valuable resource for understanding the conformational flexibility of ligand binding sites, individual protein molecules as well as large macromolecular machines. Understanding the similarities and differences in ligand binding sites, individual protein molecules and the large macromolecular complexes using the ensemble of available structures can assist in deciphering the molecular level details of macromolecular function. The availability of data on distinct conformational states will also assist in characterising the particles in whole-cell tomograms, thus allowing molecular phenotyping of whole cells in different disease or development states.

In this project we will enhance GESAMT, the structure comparison algorithm, to derive conformational flexibility of ligand binding sites, individual proteins or domains and macromolecular assemblies. The new framework, FUNCLAN, will include the necessary metrics to realise meaningful clustering and the necessary scheme to describe the structural similarities and differences between members of different clusters. Each cluster will have a representative structure and using the structural and functional annotations from PDBe-KB, we will characterise each cluster and provide biological context. The new functionality will be validated against a dataset of known examples from the literature of macromolecules and complexes exhibiting specific conformational states. A pipeline for a PDB archive-wide clustering of ligand binding sites, individual macromolecules and macromolecular complexes will be implemented. The resulting data will be made available programmatically via a REST API, an FTP site, and also via a novel web-based application.

Technical Summary

We will develop FUNCLAN, a framework to provide comparative analyses of conformations and associated annotations. This will be achieved through major improvements to the superposition software GESAMT and will deliver a robust process for superposing and clustering of macromolecular assemblies, protein chains and ligand-binding sites across the Protein Data Bank archive. We will perform a comprehensive analysis of the clustered molecular entities, link them to experimental validation information and map annotations of biological and biophysical contexts to them. We will use this enriched and integrated data to refine the superposition and clustering processes, provide a representative structure for each cluster and design metrics that can be used to evaluate clustered assemblies, protein chains or ligand-binding sites. The FUNCLAN framework will support superposing macromolecular assemblies, where the challenge is partly due to the possibility of changes in topologies accompanied by changes in the conformation of individual components. The project will tackle challenges unique to the superposition of ligand-binding sites, such as superposing amino acid residues interacting with the same small molecule versus superposing small molecules bound in different binding sites.
The project will provide:
1. Software suite and web server for analysing assemblies, proteins chains and ligand binding sites;
2. High-quality manually curated benchmarking datasets of conformational clusters and their biological and biophysical annotations;
3. A robust and iteratively improved pipeline for superposing macromolecular assemblies, proteins chains and ligand-binding sites;
4. Data standards and evaluation metrics for superposed and clustered molecular entities;
5. Clustered molecular entities linked to their validation information and their biological annotations which will be made available programmatically via API and will be displayed on the PDBe-KB entry page
 
Title Supplementary Online Material 
Description Formal description of method outlined in main article. 
Type Of Art Image 
Year Produced 2024 
URL https://aip.figshare.com/articles/figure/Supplementary_Online_Material/25772661
 
Description Understanding protein conformation is crucial for understanding the function of biological macromolecules. The insights can help in understanding variation and treating disease. The data on macromolecular structures in the Protein Data Bank or using AI-based structure prediction tools such as AlphaFold2, RoseTTAFold, and ESMFold provide static representations that fail to fully capture macromolecular motion. We have developed methods to analyse experimental structures to explore the conformational landscapes that manifest protein function. The outcome is made freely accessible via the PDBe knowledge base.
Exploitation Route As part of the project, we also developed curated datasets for protein conformational states. These were made accessible via the FTP area (http://ftp.ebi.ac.uk/pub/databases/pdbe-kb/benchmarking/distinct-monomer-conformers/) and Hugging Face (https://huggingface.co/datasets/PDBEurope/protein_chain_conformational_states/tree/main) and downloaded multiple times to develop AI methods.
Sectors Education

Pharmaceuticals and Medical Biotechnology

URL https://github.com/PDBeurope/protein-cluster-conformers
 
Title Conformational states benchmark dataset 
Description Included is a manually curated dataset of monomeric proteins in distinct open-closed conformations. Where present, intermediary conformers have also been noted. The dataset can be downloaded from http://ftp.ebi.ac.uk/pub/databases/pdbe-kb/benchmarking/distinct-monomer-conformers/benchmarking_monomeric_open_closed_conformers.csv 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact This is a benchmark dataset. 
URL https://ftp.ebi.ac.uk/pub/databases/pdbe-kb/benchmarking/distinct-monomer-conformers/
 
Title PDB superposition dataset 
Description Superposed structures for the whole PDB archive. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact The superposed structures are displayed on PDBe-KB web pages. 
URL https://ftp.ebi.ac.uk/pub/databases/pdbe-kb/superposition/
 
Title Supplementary Online Material 
Description Dataset accompanying the algorithm used for testing and might be useful to community. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://aip.figshare.com/articles/dataset/Supplementary_Online_Material/25772655
 
Description FunCLAN & ELIXIR 3D-BioInfo 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution We provide benchmark datasets to the ELIXIR 3D-BioInfo community, Activity III.
Collaborator Contribution They provide feedback and interesting use cases which might be relevant to the FunCLAN project.
Impact No specific outcomes yet.
Start Year 2022
 
Title Structural conformation clustering pipeline 
Description These scripts can be used to cluster a parsed set of monomeric protein chains via a global conformational change metric based on CA distances. Once the peptide chains destined for clustering have been specified, a pairwise CA distance matrix for each chain is produced. Distance difference matrices are then generated, again, pairwise but between CA distance matrices here. Therefore, for N unique peptide chains, N CA distance matrices and N^2 distance difference matrices are generated. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Users can take a ready-made structure clustering pipeline to identify distinct conformations of a protein. 
URL https://github.com/PDBeurope/protein-cluster-conformers
 
Description 16th Annual International Biocuration Conference - Protein Data Bank in Europe Knowledge Base (PDBe-KB) - Creating knowledge from macromolecular structures and functional annotations 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Deborah and Joseph presented a poster about PDBe-KB. They engaged with the biocuratio community to discuss about practices and how to better link our ressources or take advantage of each other's data, procedures or expertise.
Year(s) Of Engagement Activity 2023
URL https://biocuration2023.github.io/
 
Description A presentation at National Institute of Immunology, Delhi 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact The presentation included information on the PDB archive management by the wwPDB and its associate members. It also described the work done at PDBe on enriching the macromolecular structure data with structural and functional annotations for complexes, domains and small molecule ligands. The impact of AlphaFold was discussed, and the infrastructure developments to support the use of predicted models were described.
Year(s) Of Engagement Activity 2024
URL https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjj4Oero9-EAxXBTEEAHQGGDPgQ...
 
Description Biocuration Conference 2023, Padua 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact JE attended the conference on biocuration, co-presented poster.
Year(s) Of Engagement Activity 2023
URL https://biocuration2023.github.io
 
Description CCPBioSim workshop 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A workshop to introduce trainees to the resources we provide and help them understand how to learn more about them.
Year(s) Of Engagement Activity 2023
 
Description EBI Summer School group projects session 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Course on how to use the PDBe and PDBe-KB web resources to access and interrogate structural biology data. The participants came from a variety of backgrounds. They also learned how to access these data via the PDBe(-KB)'s APIs, picking up Python programming skills in the process. Many of them found the programming section challenging due to little previous programming experience.
Year(s) Of Engagement Activity 2023
 
Description EBI Summer School group projects session 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Course on how to use the PDBe and PDBe-KB web resources to access and interrogate structural biology data. The participants came from a variety of backgrounds. They also learned how to access these data via the PDBe(-KB)'s APIs, picking up Python programming skills in the process.
Year(s) Of Engagement Activity 2023
URL https://www.ebi.ac.uk/panda/jira/browse/PDBE-7033
 
Description EBI webinar: Finding and interpreting protein structure and function data using PDBe-KB 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A workshop to introduce trainees to the resources we provide and help them understand how to learn more about them.
Year(s) Of Engagement Activity 2023
 
Description ECCB 2024 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Deliver a two-part workshop on AFDB and the PDBe-KB during the pre-conference sessions. Then attended the conference proper in the following days.
Year(s) Of Engagement Activity 2024
URL https://www.ebi.ac.uk/panda/jira/browse/PDBE-6319
 
Description EIPP Predocs course 2024 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Training course for pre-doctoral (PhD) students on how to use the PDBe and PDBe-KB web UI (including related APIs) for structural biology and bioinformatics research. Presented was a case study style of exercise on interrogating the data available on acetylcholine from the PDB and partner resources via the PDBe-KB. They learned what data is available via the PDBe(-KB) and also how to access and parse this data programatically.
Year(s) Of Engagement Activity 2024
URL https://www.ebi.ac.uk/training/materials/eipp-bioinformatics-predocs-handbook-2024/
 
Description Elixir All Hands 2023, Dublin 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact JE attended conference on ELIXIR tools. Presented poster.
Year(s) Of Engagement Activity 2023
URL https://elixir-europe.org/events/elixir-all-hands-2023
 
Description Elixir All Hands 2023, Dublin 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Attended conference on ELIXIR tools. Gave presentation on the PDBe-KB to attending audience.
Year(s) Of Engagement Activity 2023
URL https://elixir-europe.org/events/elixir-all-hands-2023
 
Description Hot Topics in Contemporary Crystallography (HTCC6) course - in Dubrovnik, Croatia 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Lectures & Tutorials on: (1) Structural bioinformatics touching on PDBe, PDBe-KB, 3d beacons, and (2) AlphaFold.
Year(s) Of Engagement Activity 2024
URL https://htcc6.org/
 
Description Hot Topics in Contemporary Crystallography (HTCC6) course - in Dubrovnik, Croatia 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Lectures & Tutorials on: (1) Structural bioinformatics touching on PDBe, PDBe-KB, 3d beacons, Molstar and (2) AlphaFold. The closing speech mentioned using Molstar to generate images and animations as something the speaker (Terese Bergfors) would be interested in learning more about.
Year(s) Of Engagement Activity 2024
 
Description Hot Topics in Contemporary Crystallography course - Access and interpretation of structural biology data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A workshop to introduce trainees to the resources we provide and help them understand how to learn more about them.
Year(s) Of Engagement Activity 2023
URL https://htcc5.org/
 
Description Imperial College Computational Biology Society: Summer Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Undergraduate students
Results and Impact Talk at a student-led conference organised by the Computational Biology Society at Imperial College. Joseph will give a 15-20 minute talk followed by questions. There will also be a panel session where students can discuss careers.
Year(s) Of Engagement Activity 2023
URL https://www.imperialcollegeunion.org/shop/csp/computational-biology/summer-conference-2023
 
Description Interpreting the effects of genetic variants on protein structure and function 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A joint workshop with other EMBL-EBI resources to introduce a workflow to support understanding and interpretation of variants.
Year(s) Of Engagement Activity 2023
URL https://www.ebi.ac.uk/training/events/interpreting-effects-genetic-variants-protein-structure-and-fu...
 
Description Primers for Predocs course - Introduction to the PDBe, PDBe-KB and AlphaFold DB resources 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A workshop to introduce trainees to the resources we provide and help them understand how to learn more about them.
Year(s) Of Engagement Activity 2023
 
Description Structural bioinformatics course 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A workshop to introduce trainees to the resources we provide and help them understand how to learn more about them.
Year(s) Of Engagement Activity 2023
 
Description UniAndes/UCR bioinformatics training course 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact A workshop to introduce trainees to the resources we provide and help them understand how to learn more about them.
Year(s) Of Engagement Activity 2023
 
Description Workshop at ELIXIR All Hands Meeting 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We hosted a workshop at the ELIXIR AH 2023 meeting which was attended by members of other ELIXIR communities. The workshop itself was in collaboration with the IDP and the Proteomics communities. We had discussions about future collaborations and ways of making new interactions between the 3D-BioInfo community, and other communites. The outcomes of the FunClan project were the show cased examples of how structure data can help researchers from various fields solve scientific problems.
Year(s) Of Engagement Activity 2023