FUNCLAN - FUNctional annotations through Conformational Landscape Analysis
Lead Research Organisation:
European Bioinformatics Institute
Department Name: MSCB Macromolec, structural and chem bio
Abstract
The dynamic nature of proteins leading to multiple conformational states is critical in many biological processes from forming macromolecular complexes with other proteins, small molecules (ligands) or nucleic acids to switching between active and inactive forms for enzymatic activity. To gain improved mechanistic insights into the function of proteins, structural characterisation of their three-dimensional (3D) structures and their conformational states is critical. Knowledge of the transition between different energetically favoured conformational states is fundamental to the understanding of the principles of protein structure and evolution and can help in explaining the effects of genetic variants, in designing new drug molecules and in elucidating drug resistance at the molecular level.
Although the PDB has archived more than 165,000 individual structures, the number of unique proteins based on the number of UniProt accession cross-references grows at a slower pace and totals only ~50,000, with a considerable variation in the redundancy rate amongst different sequences. This is because each protein may have multiple representatives in the PDB: ligand-bound and unbound forms; structures in multiple space groups or sample conditions; in complex with other macromolecules (proteins or nucleic acid) or structures determined of smaller domains or sequence variants. Thus, the structures in the PDB provide a valuable resource for understanding the conformational flexibility of ligand binding sites, individual protein molecules as well as large macromolecular machines. Understanding the similarities and differences in ligand binding sites, individual protein molecules and the large macromolecular complexes using the ensemble of available structures can assist in deciphering the molecular level details of macromolecular function. The availability of data on distinct conformational states will also assist in characterising the particles in whole-cell tomograms, thus allowing molecular phenotyping of whole cells in different disease or development states.
In this project we will enhance GESAMT, the structure comparison algorithm, to derive conformational flexibility of ligand binding sites, individual proteins or domains and macromolecular assemblies. The new framework, FUNCLAN, will include the necessary metrics to realise meaningful clustering and the necessary scheme to describe the structural similarities and differences between members of different clusters. Each cluster will have a representative structure and using the structural and functional annotations from PDBe-KB, we will characterise each cluster and provide biological context. The new functionality will be validated against a dataset of known examples from the literature of macromolecules and complexes exhibiting specific conformational states. A pipeline for a PDB archive-wide clustering of ligand binding sites, individual macromolecules and macromolecular complexes will be implemented. The resulting data will be made available programmatically via a REST API, an FTP site, and also via a novel web-based application.
Although the PDB has archived more than 165,000 individual structures, the number of unique proteins based on the number of UniProt accession cross-references grows at a slower pace and totals only ~50,000, with a considerable variation in the redundancy rate amongst different sequences. This is because each protein may have multiple representatives in the PDB: ligand-bound and unbound forms; structures in multiple space groups or sample conditions; in complex with other macromolecules (proteins or nucleic acid) or structures determined of smaller domains or sequence variants. Thus, the structures in the PDB provide a valuable resource for understanding the conformational flexibility of ligand binding sites, individual protein molecules as well as large macromolecular machines. Understanding the similarities and differences in ligand binding sites, individual protein molecules and the large macromolecular complexes using the ensemble of available structures can assist in deciphering the molecular level details of macromolecular function. The availability of data on distinct conformational states will also assist in characterising the particles in whole-cell tomograms, thus allowing molecular phenotyping of whole cells in different disease or development states.
In this project we will enhance GESAMT, the structure comparison algorithm, to derive conformational flexibility of ligand binding sites, individual proteins or domains and macromolecular assemblies. The new framework, FUNCLAN, will include the necessary metrics to realise meaningful clustering and the necessary scheme to describe the structural similarities and differences between members of different clusters. Each cluster will have a representative structure and using the structural and functional annotations from PDBe-KB, we will characterise each cluster and provide biological context. The new functionality will be validated against a dataset of known examples from the literature of macromolecules and complexes exhibiting specific conformational states. A pipeline for a PDB archive-wide clustering of ligand binding sites, individual macromolecules and macromolecular complexes will be implemented. The resulting data will be made available programmatically via a REST API, an FTP site, and also via a novel web-based application.
Technical Summary
We will develop FUNCLAN, a framework to provide comparative analyses of conformations and associated annotations. This will be achieved through major improvements to the superposition software GESAMT and will deliver a robust process for superposing and clustering of macromolecular assemblies, protein chains and ligand-binding sites across the Protein Data Bank archive. We will perform a comprehensive analysis of the clustered molecular entities, link them to experimental validation information and map annotations of biological and biophysical contexts to them. We will use this enriched and integrated data to refine the superposition and clustering processes, provide a representative structure for each cluster and design metrics that can be used to evaluate clustered assemblies, protein chains or ligand-binding sites. The FUNCLAN framework will support superposing macromolecular assemblies, where the challenge is partly due to the possibility of changes in topologies accompanied by changes in the conformation of individual components. The project will tackle challenges unique to the superposition of ligand-binding sites, such as superposing amino acid residues interacting with the same small molecule versus superposing small molecules bound in different binding sites.
The project will provide:
1. Software suite and web server for analysing assemblies, proteins chains and ligand binding sites;
2. High-quality manually curated benchmarking datasets of conformational clusters and their biological and biophysical annotations;
3. A robust and iteratively improved pipeline for superposing macromolecular assemblies, proteins chains and ligand-binding sites;
4. Data standards and evaluation metrics for superposed and clustered molecular entities;
5. Clustered molecular entities linked to their validation information and their biological annotations which will be made available programmatically via API and will be displayed on the PDBe-KB entry page
The project will provide:
1. Software suite and web server for analysing assemblies, proteins chains and ligand binding sites;
2. High-quality manually curated benchmarking datasets of conformational clusters and their biological and biophysical annotations;
3. A robust and iteratively improved pipeline for superposing macromolecular assemblies, proteins chains and ligand-binding sites;
4. Data standards and evaluation metrics for superposed and clustered molecular entities;
5. Clustered molecular entities linked to their validation information and their biological annotations which will be made available programmatically via API and will be displayed on the PDBe-KB entry page
Publications
Ellaway JIJ
(2024)
Identifying protein conformational states in the Protein Data Bank: Toward unlocking the potential of integrative dynamics studies.
in Structural dynamics (Melville, N.Y.)
Varadi M
(2022)
PDBe and PDBe-KB: Providing high-quality, up-to-date and integrated resources of macromolecular structures to support basic and applied research and education.
in Protein science : a publication of the Protein Society
| Title | Supplementary Online Material |
| Description | Formal description of method outlined in main article. |
| Type Of Art | Image |
| Year Produced | 2024 |
| URL | https://aip.figshare.com/articles/figure/Supplementary_Online_Material/25772661 |
| Description | Understanding protein conformation is crucial for understanding the function of biological macromolecules. The insights can help in understanding variation and treating disease. The data on macromolecular structures in the Protein Data Bank or using AI-based structure prediction tools such as AlphaFold2, RoseTTAFold, and ESMFold provide static representations that fail to fully capture macromolecular motion. We have developed methods to analyse experimental structures to explore the conformational landscapes that manifest protein function. The outcome is made freely accessible via the PDBe knowledge base. |
| Exploitation Route | As part of the project, we also developed curated datasets for protein conformational states. These were made accessible via the FTP area (http://ftp.ebi.ac.uk/pub/databases/pdbe-kb/benchmarking/distinct-monomer-conformers/) and Hugging Face (https://huggingface.co/datasets/PDBEurope/protein_chain_conformational_states/tree/main) and downloaded multiple times to develop AI methods. |
| Sectors | Education Pharmaceuticals and Medical Biotechnology |
| URL | https://github.com/PDBeurope/protein-cluster-conformers |
| Title | Conformational states benchmark dataset |
| Description | Included is a manually curated dataset of monomeric proteins in distinct open-closed conformations. Where present, intermediary conformers have also been noted. The dataset can be downloaded from http://ftp.ebi.ac.uk/pub/databases/pdbe-kb/benchmarking/distinct-monomer-conformers/benchmarking_monomeric_open_closed_conformers.csv |
| Type Of Material | Database/Collection of data |
| Year Produced | 2022 |
| Provided To Others? | Yes |
| Impact | This is a benchmark dataset. |
| URL | https://ftp.ebi.ac.uk/pub/databases/pdbe-kb/benchmarking/distinct-monomer-conformers/ |
| Title | PDB superposition dataset |
| Description | Superposed structures for the whole PDB archive. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2022 |
| Provided To Others? | Yes |
| Impact | The superposed structures are displayed on PDBe-KB web pages. |
| URL | https://ftp.ebi.ac.uk/pub/databases/pdbe-kb/superposition/ |
| Title | Supplementary Online Material |
| Description | Dataset accompanying the algorithm used for testing and might be useful to community. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://aip.figshare.com/articles/dataset/Supplementary_Online_Material/25772655 |
| Description | FunCLAN & ELIXIR 3D-BioInfo |
| Organisation | ELIXIR |
| Country | United Kingdom |
| Sector | Charity/Non Profit |
| PI Contribution | We provide benchmark datasets to the ELIXIR 3D-BioInfo community, Activity III. |
| Collaborator Contribution | They provide feedback and interesting use cases which might be relevant to the FunCLAN project. |
| Impact | No specific outcomes yet. |
| Start Year | 2022 |
| Title | Structural conformation clustering pipeline |
| Description | These scripts can be used to cluster a parsed set of monomeric protein chains via a global conformational change metric based on CA distances. Once the peptide chains destined for clustering have been specified, a pairwise CA distance matrix for each chain is produced. Distance difference matrices are then generated, again, pairwise but between CA distance matrices here. Therefore, for N unique peptide chains, N CA distance matrices and N^2 distance difference matrices are generated. |
| Type Of Technology | Software |
| Year Produced | 2023 |
| Open Source License? | Yes |
| Impact | Users can take a ready-made structure clustering pipeline to identify distinct conformations of a protein. |
| URL | https://github.com/PDBeurope/protein-cluster-conformers |
| Description | 16th Annual International Biocuration Conference - Protein Data Bank in Europe Knowledge Base (PDBe-KB) - Creating knowledge from macromolecular structures and functional annotations |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | Deborah and Joseph presented a poster about PDBe-KB. They engaged with the biocuratio community to discuss about practices and how to better link our ressources or take advantage of each other's data, procedures or expertise. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://biocuration2023.github.io/ |
| Description | A presentation at National Institute of Immunology, Delhi |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Postgraduate students |
| Results and Impact | The presentation included information on the PDB archive management by the wwPDB and its associate members. It also described the work done at PDBe on enriching the macromolecular structure data with structural and functional annotations for complexes, domains and small molecule ligands. The impact of AlphaFold was discussed, and the infrastructure developments to support the use of predicted models were described. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjj4Oero9-EAxXBTEEAHQGGDPgQ... |
| Description | Biocuration Conference 2023, Padua |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | JE attended the conference on biocuration, co-presented poster. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://biocuration2023.github.io |
| Description | CCPBioSim workshop 2023 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | A workshop to introduce trainees to the resources we provide and help them understand how to learn more about them. |
| Year(s) Of Engagement Activity | 2023 |
| Description | EBI Summer School group projects session |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Course on how to use the PDBe and PDBe-KB web resources to access and interrogate structural biology data. The participants came from a variety of backgrounds. They also learned how to access these data via the PDBe(-KB)'s APIs, picking up Python programming skills in the process. Many of them found the programming section challenging due to little previous programming experience. |
| Year(s) Of Engagement Activity | 2023 |
| Description | EBI Summer School group projects session |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Course on how to use the PDBe and PDBe-KB web resources to access and interrogate structural biology data. The participants came from a variety of backgrounds. They also learned how to access these data via the PDBe(-KB)'s APIs, picking up Python programming skills in the process. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.ebi.ac.uk/panda/jira/browse/PDBE-7033 |
| Description | EBI webinar: Finding and interpreting protein structure and function data using PDBe-KB |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | A workshop to introduce trainees to the resources we provide and help them understand how to learn more about them. |
| Year(s) Of Engagement Activity | 2023 |
| Description | ECCB 2024 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Deliver a two-part workshop on AFDB and the PDBe-KB during the pre-conference sessions. Then attended the conference proper in the following days. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.ebi.ac.uk/panda/jira/browse/PDBE-6319 |
| Description | EIPP Predocs course 2024 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | Training course for pre-doctoral (PhD) students on how to use the PDBe and PDBe-KB web UI (including related APIs) for structural biology and bioinformatics research. Presented was a case study style of exercise on interrogating the data available on acetylcholine from the PDB and partner resources via the PDBe-KB. They learned what data is available via the PDBe(-KB) and also how to access and parse this data programatically. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.ebi.ac.uk/training/materials/eipp-bioinformatics-predocs-handbook-2024/ |
| Description | Elixir All Hands 2023, Dublin |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | JE attended conference on ELIXIR tools. Presented poster. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://elixir-europe.org/events/elixir-all-hands-2023 |
| Description | Elixir All Hands 2023, Dublin |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Attended conference on ELIXIR tools. Gave presentation on the PDBe-KB to attending audience. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://elixir-europe.org/events/elixir-all-hands-2023 |
| Description | Hot Topics in Contemporary Crystallography (HTCC6) course - in Dubrovnik, Croatia |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Lectures & Tutorials on: (1) Structural bioinformatics touching on PDBe, PDBe-KB, 3d beacons, and (2) AlphaFold. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://htcc6.org/ |
| Description | Hot Topics in Contemporary Crystallography (HTCC6) course - in Dubrovnik, Croatia |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | Lectures & Tutorials on: (1) Structural bioinformatics touching on PDBe, PDBe-KB, 3d beacons, Molstar and (2) AlphaFold. The closing speech mentioned using Molstar to generate images and animations as something the speaker (Terese Bergfors) would be interested in learning more about. |
| Year(s) Of Engagement Activity | 2024 |
| Description | Hot Topics in Contemporary Crystallography course - Access and interpretation of structural biology data |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | A workshop to introduce trainees to the resources we provide and help them understand how to learn more about them. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://htcc5.org/ |
| Description | Imperial College Computational Biology Society: Summer Conference |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Undergraduate students |
| Results and Impact | Talk at a student-led conference organised by the Computational Biology Society at Imperial College. Joseph will give a 15-20 minute talk followed by questions. There will also be a panel session where students can discuss careers. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.imperialcollegeunion.org/shop/csp/computational-biology/summer-conference-2023 |
| Description | Interpreting the effects of genetic variants on protein structure and function |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | A joint workshop with other EMBL-EBI resources to introduce a workflow to support understanding and interpretation of variants. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.ebi.ac.uk/training/events/interpreting-effects-genetic-variants-protein-structure-and-fu... |
| Description | Primers for Predocs course - Introduction to the PDBe, PDBe-KB and AlphaFold DB resources |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | A workshop to introduce trainees to the resources we provide and help them understand how to learn more about them. |
| Year(s) Of Engagement Activity | 2023 |
| Description | Structural bioinformatics course 2023 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | A workshop to introduce trainees to the resources we provide and help them understand how to learn more about them. |
| Year(s) Of Engagement Activity | 2023 |
| Description | UniAndes/UCR bioinformatics training course |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Postgraduate students |
| Results and Impact | A workshop to introduce trainees to the resources we provide and help them understand how to learn more about them. |
| Year(s) Of Engagement Activity | 2023 |
| Description | Workshop at ELIXIR All Hands Meeting 2023 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | We hosted a workshop at the ELIXIR AH 2023 meeting which was attended by members of other ELIXIR communities. The workshop itself was in collaboration with the IDP and the Proteomics communities. We had discussions about future collaborations and ways of making new interactions between the 3D-BioInfo community, and other communites. The outcomes of the FunClan project were the show cased examples of how structure data can help researchers from various fields solve scientific problems. |
| Year(s) Of Engagement Activity | 2023 |
