UKRI/BBSRC-NSF/BIO: Unifying Pfam protein sequence and ECOD structural classifications with structure models
Lead Research Organisation:
European Bioinformatics Institute
Department Name: MSCB Macromolec, structural and chem bio
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
Evolutionary classification of proteins is essential for all aspects of protein science. Inference of functional properties of an uncharacterized protein from a better studied homolog is a powerful way to generate hypotheses. Protein classifications were separated into two categories: the first, relies on sequence similarity (Pfam); the second, uses 3D structures (ECOD). Sequence classifications are more comprehensive and relevant to protein function, while structure classifications reveal distant evolutionary relationships between protein families. The disparity between them arises from the lack of 3D structures for most proteins with known sequences. However, AlphaFold (AF) removes the barrier between sequence and structure classifications. We will develop cyberinfrastructure to integrate Pfam and ECOD and to classify millions of AF models. We will do this by (i) refactoring the ECOD infrastructure to meet the need of classifying millions of AF models. The revised pipeline will classify domains by sequence, remove disordered or poorly predicted segments, and classify remaining domains by structure comparison augmented with sequence and function evidence, and expert curation. In close collaboration with Pfam, using the newly developed infrastructure we will 1) incorporate all currently released AF models into ECOD, and 2) adapt Pfam families in ECOD. To improve Pfam we will also (ii) develop tools to compare the two classifications and introduce a number of changes to Pfam. We will 1) add new families detected by ECOD, 2) refine domain boundaries using protein structures, and 3) group families into clans by homology identified in ECOD. We will (iii) harmonise Pfam and ECOD. We will resolve inconsistencies between the two classifications, converge to common nomenclature of domains, exchange information, and cross-reference between the two resources.
Publications
Blum M
(2025)
InterPro: the protein sequence classification resource in 2025.
in Nucleic acids research
Costa F
(2024)
Keeping it in the family: using protein family templates to rescue low confidence AlphaFold2 models
in Bioinformatics Advances
Durairaj J
(2023)
Uncovering new families and folds in the natural protein universe.
in Nature
Durairaj, Janani
(2023)
Uncovering new families and folds in the natural protein universe
Paysan-Lafosse T
(2025)
The Pfam protein families database: embracing AI/ML.
in Nucleic acids research
Pei J
(2024)
Bridging the Gap between Sequence and Structure Classifications of Proteins with AlphaFold Models.
in Journal of molecular biology
Schaeffer R
(2025)
ECOD: integrating classifications of protein domains from experimental and predicted structures
in Nucleic Acids Research
| Title | Create new Pfam families |
| Description | Creating new Pfam families that: - were missing in comparison to the ECOD classification - have been identified by the ECOD Domain Parser of AlphaFold Models pipeline |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | 1029 Pfam families were created; some of them were made available in Pfam 36.0, the rest will be available in the upcoming Pfam 37.0 release. |
| URL | https://www.ebi.ac.uk/interpro/entry/pfam/#table |
| Title | Update Pfam entries |
| Description | We have developed multiple pipelines to help identify inconsistencies between the ECOD and Pfam classifications and have updated Pfam entries as a result. |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | 702 Pfam families have been updated so far, some of them have been made available as part of Pfam 36.0 release, others will be made available as part of the upcoming Pfam 37.0 release. |
| URL | https://www.ebi.ac.uk/interpro/entry/pfam/#table |
| Title | Pfam |
| Description | Protein Family database |
| Type Of Material | Database/Collection of data |
| Provided To Others? | Yes |
| Impact | The annotation of the millions of sequences that are generated by modern DNA sequencing technologies. |
| URL | http://pfam.xfam.org |
| Description | ECOD |
| Organisation | University of Texas |
| Country | United States |
| Sector | Academic/University |
| PI Contribution | Harmonisation of the Pfam and ECOD classifications. |
| Collaborator Contribution | Building new ECOD using AlphaFold predicted structures. |
| Impact | Refactored ECOD pipeline to incorporate predicted structures. Developed tools to compare the Pfam and ECOD classification to identify inconsistencies. Developed new Pfam entries to classify ECOD domains lacking Pfam assignment. Updated Pfam clan classification based on ECOD classification. Created and expanded Wikipedia articles. Updated Pfam online training materials. |
| Start Year | 2023 |
| Description | Creation of 3 new Wikipedia articles describing protein superfamilies |
| Form Of Engagement Activity | Engagement focused website, blog or social media channel |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Wikipedia articles created for OB-fold, SH3 and HotDog domains |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://en.wikipedia.org/wiki/Category:Protein_superfamilies |
| Description | InterPro and Pfam resources in the context of EBI structural bioinformatics course |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | 30 professionals received an introduction to the InterPro and Pfam resources, including lecture and practical, in the context of the EBI structural bioinformatics course. |
| Year(s) Of Engagement Activity | 2022,2023,2024 |
| URL | https://www.ebi.ac.uk/training/events/structural-bioinformatics2021/ |
| Description | Pfam release blog posts |
| Form Of Engagement Activity | Engagement focused website, blog or social media channel |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | For each Pfam release, we write a blog post article presenting the release data content, source of new entries, and interesting cases. |
| Year(s) Of Engagement Activity | 2020,2021,2022,2023,2024 |
| URL | https://xfam.wordpress.com |
| Description | Pfam/ECOD in person meeting |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Pfam/ECOD in person meeting was held at EMBL-EBI |
| Year(s) Of Engagement Activity | 2024 |
| Description | UCL postgraduates training about InterPro, Pfam and HMMER |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Postgraduate students |
| Results and Impact | Postgraduate and undergraduate students from UCL attended a lecture and practical session on how to use InterPro, Pfam and HMMER resources. |
| Year(s) Of Engagement Activity | 2022,2023,2024 |
| Description | Update of the annotation for 259 Wikipedia articles related to Pfam and ECOD classifications |
| Form Of Engagement Activity | Engagement focused website, blog or social media channel |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | "67 - Added ECOD link to protein family infobox 133 - Converted Pfam box into Protein family infobox 35 fixes in ""category"", of those 28 - Added/updated to Category:Protein superfamilies 24 Pfam clan links added/fixed" |
| Year(s) Of Engagement Activity | 2023,2024,2025 |
| URL | https://en.wikipedia.org/wiki/Category:Protein_superfamilies |
