Exploiting data driven computational approaches for understanding protein structure and function in InterPro and Pfam

Lead Research Organisation: European Bioinformatics Institute

Department Name: MSCB Macromolec, structural and chem bio

Abstract

Proteins are biological macromolecules that perform a diverse array of crucial functions, from enzymes (e.g. the entities responsible for fermentation) to transporters (e.g. hemoglobin in the blood) to mechanical structures (e.g. actin and myosin in muscle). Proteins are synthesized as linear polymers of building blocks called amino acids. They usually fold into complex three-dimensional (3D) structures, and typically interact with other proteins and molecules to perform their function. Knowledge of protein sequences can facilitate insights into hitherto undiscovered enzymes with potential applications in the biotechnology sector, or novel drugs of interest to the pharmaceutical industry. Detailed understanding of the functional architecture of proteins, including the arrangement of amino acids in a 3D structure, enables scientists to diagnose diseases as well as design more effective enzymes.

These days, our ability to generate new protein sequences based on modern high-throughput DNA sequencing (HTS) techniques far outstrips our ability to functionally characterise them. Thus, most sequences are computationally annotated, by identifying similarities between new sequences and the few experimentally characterised examples, using these to infer function (i.e. annotate). More recently, HTS has been applied directly to environmental samples to discover previously uncultured bacteria and single cell eukaryotes, and to enable the reconstruction of large and complex genomes, like plants. Such approaches are correcting many of the historical biases in the protein sequence databases. However, for humankind to understand and utilise these data, sequences need to be functionally annotated, which is best accomplished using the information gleaned from sets of related sequences (known as protein families).

InterPro is a world leading protein family resource that merges information from 13 different specialist databases to present the user with comprehensive functional analysis of sequences. One of its member databases, Pfam, is a collection of protein domain families containing functional annotations. Both InterPro and Pfam are well-established primary resources in the field of protein research. In this application, we propose crucial developments to both of these resources in order to augment their utility, functionality and scalability, as well as uniquely position them to tackle imminent advances in the field. We will leverage pre-established links with other protein databases and concurrently build additional pipelines to develop and exchange the latest information between these existing and new resources.

We will improve coverage of protein sequences originating from environmental sources by building families for novel sets (or clusters) of related proteins. Considering the fundamental association between protein structure and function, we will develop a pipeline that will not only import structural models for Pfam entries and present them via the website, but will also ensure that the models remain up to date. To increase coverage and functional annotations in both resources, we will integrate new resources to provide sub-domain classifications, and improve annotations through combined literature searches and enhanced curation tools. To refine annotations, we will adopt a new algorithm called TreeGrafter to InterProScan (our software package that performs automatic annotations of protein sequences), and integrate controlled vocabularies for protein attributes from databases like PANTHER with those already in InterPro. We will evaluate the performance of an upgraded version of the HMMER software that is widely used to build protein families, including Pfam, to improve future scalability. Finally, we will focus on eight genomes of agricultural importance, including chicken, salmon, and wheat, by systematically annotating 2000 associated entries in Pfam and by extension, InterPro.

Technical Summary

InterPro and Pfam are preeminent complementary resources in the field of protein research. InterPro draws its information from a compendium of 13 expert member databases, including Pfam, enabling classification of protein sequences into families and prediction of functional domains and sites. Pfam generates protein families, with each curated entry represented by an alignment and profile hidden Markov model (HMM).
In light of the sheer volume of novel protein sequences being constantly discovered, especially through metagenomics, this proposal devises key developments to further improve functionality and scalability of these resources. We will enhance coverage of environmentally derived sequences (MGnify database, Tara Oceans and MMETSP projects) by generating families for the largest novel sequence clusters. We will incorporate de novo structural models and produce deep sequence alignments (using metagenomics sequences) necessary for the detection of co-evolutionary residues, which in turn will be used for structural modelling. The websites will facilitate visualization of these structural models and display co-variance contact sites. We will use a combination of known structures and models to classify additional Pfam entries into clans, as well as review domain boundaries. To increase InterPro coverage and functional annotations, we will integrate new resources (CATH FunFams) to provide sub-domain classifications, improve annotations (especially domains of unknown function) and maximise member database integrations. To enable scaling and refine annotations, we will adopt a new algorithm (TreeGrafter) in InterProScan, harmonise PANTHER and FunFams-based Gene Ontology terms within InterPro, and evaluate performance of an upgraded version of the HMMER software. Finally, we will focus annotation efforts on eight genomes of agricultural importance, including chicken, salmon, and wheat, generating 1000s of Pfam and InterPro entries.

Planned Impact

The field of protein research has witnessed an explosion in novel protein sequences due to advances in sequencing technologies. However, these sequences are meaningless without functional annotation. This proposal focuses on the world leading protein databases, InterPro and Pfam, which are routinely used for protein annotation. Due to their extensive use by researchers worldwide, this application will impact most BBSRC strategic priorities - especially agriculture and food security, industrial biotechnology, and bioscience for health. To maximise the impact of these resources, we propose to exploit multiple computational approaches to (i) improve annotation of metagenomics datasets and eukaryotic marine microbes; (ii) provide co-evolutionary structural models for Pfam entries using deep alignments to build additional models and permit their visualization; (iii) integrate and improve annotations from current and new InterPro databases, such as PANTHER, CDD, and CATH FunFams; (iv) improve scaling and refine annotations by adopting new algorithms and software, like TreeGrafter and HMMER4, and reconcile Gene Ontology terms across databases; (v) systematically annotate eight genomes of agricultural importance. These developments will ensure users in the UK and world over can derive the maximum benefit from these resources while further cementing their position as exceptional databases of immense importance to the scientific community at large.

Developing new pipelines to build new entries for proteins derived from metagenomics provides a unique exploitable opportunity for InterPro and Pfam. The fact that these resources will extend coverage of marine eukaryotic microbes will have significant, far reaching impacts on other fields and analytical disciplines. This is especially true for the UK Darwin Tree of Life project, which forms part of a global initiative to sequence all eukaryotic species, aiming to revolutionize our understanding of biology, evolution and biodiversity. However, this will only be realised through detailed and accurate functional annotation, such as that provided by InterPro and Pfam.

The agricultural sector represents another area of considerable impact. Providing comprehensive functional annotations for proteins from widely farmed animal and plant species in the UK and worldwide will facilitate insights into the molecular basis of biological features including yield characteristics, capacity to resist disease and tolerance to the vagaries of nature. This will lead to socioeconomic benefits, through maximising land utilisation for growing crops such as wheat and sugar beet (the latter providing nearly 30% of the world's annual sugar production and forming an important source for bioethanol and animal feed), or enhancing the global aquaculture market, projected to reach $20 billion by 2022, where salmon is a substantial component.

Furthermore, the project outputs will be of exceptional value to the commercial sector, eventually benefiting the public. Improved annotations of proteins originating from microbes will lead to new discoveries, such as novel antibiotics for humans and livestock, higher agricultural yields from the understanding of ecological interplay (e.g. food chain microbes), expanded discovery of novel enzymes (e.g. psychrophilic enzymes for detergents) or those with novel catalytic functionality.

We will ensure impact on all academic and industrial audiences by the publication of software, data, and peer reviewed articles. To ensure that resource developments are disseminated as widely as possible, we will deliver onsite training, webinars, participate in community workshops and produce online training materials. We will leverage our professional networks and collaborations, conference platforms and social media channels to further publicise key developments. The public sector will also be engaged, via specific events and the publication of non-specialist articles and interviews.

Funded Value:

£815,771

Funded Period:

Nov 19 - Oct 23

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/S020381/1

Principal Investigator:

Alex Bateman

Robert Finn

Research Subject:

Omic sciences & technologies (30%)

Tools, technologies & methods (60%)

Research Topic:

Bioinformatics (60%)

Genomics (12%)

Proteomics (18%)

Organisations

People	ORCID iD
Alex Bateman (Principal Investigator)
Robert Finn (Principal Investigator)

Publications

Author Name Title Publication Date Published

10 25 50

Blum M (2021) The InterPro protein families and domains database: 20 years on. in Nucleic acids research

Mistry J (2021) Pfam: The protein families database in 2021. in Nucleic acids research

Monzon V (2022) Reciprocal best structure hits: using AlphaFold models to discover distant homologues. in Bioinformatics advances

Cantelli G (2022) The European Bioinformatics Institute (EMBL-EBI) in 2021. in Nucleic acids research

Thakur M (2023) EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022. in Nucleic acids research

Paysan-Lafosse T (2023) InterPro in 2022. in Nucleic acids research

Key Findings
Research Databases and Models
Research Tools and Methods
Collaboration
Software and Technical Products
Engagement Activities


Description	A major outcome of this award was to make available accurate structure prediction models for Pfam families in collaboration with David Baker. These models were released several months in advance of the AlphaFold models that are widely used. The release of these models help to inform DeepMind of the high levels of interest in protein structure models and helped lead to our fruitful colaboration with them. We also developed and integrated the TreeGrafter Algorithm into InterProScan. This likely halved our total compute and thus had benefits in reducing our carbon emissions.
Exploitation Route	The InterProScan tool and new families that we have built as part of this award are already in significant use by the community.
Sectors	Agriculture, Food and Drink,Environment,Pharmaceuticals and Medical Biotechnology


Title	Automatic pipeline to generate potential Pfam profile-HMMs for clusters from MGnify protein sequence set and UniProtKB
Description	The pipeline performs a co-clustering of the MGnify protein sequence set and UniProtKB and generates candidate profile-HMMs for potential inclusion in Pfam. It uses mmseqs to carry out the clustering of MGnify and UniProt which generated a set of 434,651,340 clusters. We kept clusters with at least 1 UniProt and 1 MGnify sequence and generated 10,000 clusters of automatically generated candidate Pfam families that were put forward for curation.
Type Of Material	Improvements to research infrastructure
Year Produced	2020
Provided To Others?	Yes
Impact	382 new families where included in Pfam following the first iteration.
URL	https://github.com/ProteinsWebTeam/mgnify-clustering


Title	Automatic pipeline to generate potential Pfam profile-HMMs for clusters from Marine Eukaryotic microbiomes protein sequence sets found in UniProtKB
Description	The pipeline clusters UniProt sequences from the Marine eukaryotic microbiome from the MMETS project. This pipeline uses mmseqs to carry out the clustering; it generated a set of 620,056 clusters. We kept clusters with at least 2 UniProt sequences and generated 10,000 clusters of automatically generated candidate Pfam families that were put forward for curation.
Type Of Material	Improvements to research infrastructure
Year Produced	2021
Provided To Others?	Yes
Impact	16 new families where included in Pfam following the first iteration and will be made available publicly in Pfam release 36.0.


Title	DeDuF Pfam entries
Description	Improving Pfam annotations and coverage through the identification of functions for Domains of Unknown Function.
Type Of Material	Improvements to research infrastructure
Year Produced	2020
Provided To Others?	No
Impact	1405 Pfam families with previously unknown function have been re-annotated and assigned a function. The updated annotation are available in Pfam release 35.0.


Title	DeDuF Pfam entries
Description	Improving Pfam annotations and coverage through the identification of functions for Domains of Unknown Function.
Type Of Material	Improvements to research infrastructure
Year Produced	2020
Provided To Others?	No
Impact	44 Pfam families with previously unknown function have been re-annotated and assigned a function. The updated annotation will soon be made available to the public in Pfam release 34.0.


Title	Import of co-evolutionary models for Pfam entries with no known structure in Pfam and InterPro
Description	We have provided the Baker group with Deep alignments based on UniProtKB sequence alignments for Pfam families with no PDB structure (approx 6,500 families). They have calculated models for all of these and calculated IDDT scores for reliability of the model. The vast majority of models give the correct fold, with the vast majority having an lDDT score higher than 0.6 (considered as reasonable models) and some of the models have an lDDT higher than 0.8 (considered as great models). We have made the models and their contact map available through the InterPro website for the Pfam families with no structure under the "Structure models" tab. In this tab the contact map between the residues is available for the Pfam SEED alignment. We also display the 3D structure of the model, where the contacts between residues can be highlighted by hovering over the contacts in the alignment. The method used to predict the models and a description of the information available in InterPro pages has been included in the InterPro documentation.
Type Of Material	Improvements to research infrastructure
Year Produced	2021
Provided To Others?	Yes
Impact	Providing structural models for Pfam families with no PDB structure allows a better understanding of the three-dimensional (3D) arrangement of amino acids, which can provide key insights into protein function, and allow very distant homologues to be identified.
URL	https://www.ebi.ac.uk/interpro/entry/pfam/PF01050/model/


Title	Increased annotation of eight key agricultural genomes
Description	We have developed a pipeline to generate lists of proteins that are not covered by integrated entries in InterPro in each of the targeted agriculturally relevant genomes that have associated member database signatures.
Type Of Material	Improvements to research infrastructure
Year Produced	2020
Provided To Others?	Yes
Impact	Newly integrated proteins in InterPro for each key organism as between September 2019 and September 2020: Wheat: 154, Maize: 385, Chicken: 104, Cow: 498, Salmon: 83, Pig: 18239, Sugar beet: 8, Miscanthus: 20 Newly integrated proteins in Pfam entries for each key organism as between September 2019 and September 2020: Wheat: 237, Maize: 302, Chicken: 203, Cow: 247, Salmon: 519, Pig: 253
URL	https://www.ebi.ac.uk/interpro/


Title	Increased annotation of eight key agricultural genomes
Description	We have developed a pipeline to generate lists of proteins that are not covered by integrated entries in InterPro in each of the targeted agriculturally relevant genomes that have associated member database signatures.
Type Of Material	Improvements to research infrastructure
Year Produced	2021
Provided To Others?	Yes
Impact	Newly integrated proteins in InterPro for each key organism as between September 2019 and March 2022: Wheat: 4500, Maize: 31878, Chicken: 147, Cow: 340, Salmon: 154, Pig: 216144, Sugar beet: 9, Miscanthus: 19. Newly integrated proteins in Pfam entries for each key organism as between September 2019 and March 2022: Wheat: 3223, Maize: 2060, Chicken: 749, Cow: 920, Salmon: 1850, Pig: 980
URL	https://www.ebi.ac.uk/interpro/


Title	Preliminary work to investigate using CATH-Gene3D matches to speed up the searches of CATH-FunFams
Description	In order to speed-up the calculation process of the CATH-Funfams models for their integration in InterPro a preliminary work has been done to investigate whether CATH-Gene3D matches could be used as a pre-filter to speed up the searches of CATH-Funfams.
Type Of Material	Improvements to research infrastructure
Year Produced	2020
Provided To Others?	No
Impact	It has been found that the number of matches for the CATH-Funfams and CATH-Gene3D profiles against the UniProtKB database aren't significantly different. Following those good results, the InterPro team has decided to go ahead on the integration of the CATH-Funfams profiles in its database. Those will be provided as an automatic annotation of protein sequences which will be displayed in a similar way than the MOBI-DB database.


Title	InterPro
Description	InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites. We combine protein signatures from a number of member databases into a single searchable resource, capitalising on their individual strengths to produce a powerful integrated database and diagnostic tool.
Type Of Material	Database/Collection of data
Provided To Others?	Yes
Impact	All of the annotations provided by InterPro underpin the automatic annotation pipeline within the UniProt database. InterPro provides tens of millions of sequences to UniProt through the InterPro2Go pipeline. InterPro is the most widely used web service at EMBL-EBI, performing ~15,000,000 searches per month, from around the world. Since November 2019, we have released 9 updates of the InterPro data, in total 1637 new InterPro entries have been created, representing a coverage of 97% of the proteins found in UniProtKB. The InterPro website is continually updated and a number of new features have been added, including the structural models for 6370 families from Pfam 33.1 without PDB structures. This data was generated following a collaboration with the Baker group from the University of Washington.
URL	https://www.ebi.ac.uk/interpro/


Title	Pfam
Description	The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function. Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pfam entries which are related by similarity of sequence, structure or profile-HMM.
Type Of Material	Database/Collection of data
Provided To Others?	Yes
Impact	Pfam is widely used within the research community. In the past year we have been working actively on migrating the data available in the pfam website (pfam.xfam.org) into the InterPro website. Two data releases were made available since November 2019, the total number of Pfam entries is 18259, included in 635 clans.
URL	http://pfam.xfam.org/


Description	Providing AlphaFold structural models through Pfam and InterPro
Organisation	Alphabet
Department	Deepmind
Country	United Kingdom
Sector	Private
PI Contribution	We updated the Pfam and InterPro websites to display the AlphaFold structural models.
Collaborator Contribution	DeepMind provided a large collection of structural models to EMBL-EBI which became part of the AlphaFold Database. They also discussed the design and implemnentation of their structural models in the Pfam and InterPro websites.
Impact	https://proteinswebteam.github.io/interpro-blog/2021/07/22/AlphaFold-structure-predictions-available-in-InterPro/ Dr Bateman took part in a joint websinar with DeepMind to train scientists in interpreting AlphaFold models.
Start Year	2021


Description	Structural Models for Pfam
Organisation	University of Washington
Country	United States
Sector	Academic/University
PI Contribution	We have made trRosetta initially structural models from the Baker group available via the InterPro and Pfam website. More recently these have been replaced by RoseTTAfold structural models.
Collaborator Contribution	The group of David Baker produced a collection of 6,370 trRosetta models of Pfam families with no known structure for Pfam release 33.1 which increases the fraction of Pfam families with structural data to 88%. For more recent Pfam releases the group have provided more accurate RoseTTAfold structural models.
Impact	The collection of structural models have been made available via the InterPro and Pfam websites. This work has been described in a blog post and press release.
Start Year	2020


Title	InterProScan
Description	InterProScan combines different protein signature recognition methods from the InterPro. Sequences are submitted in FASTA format. Matches are then calculated against all of the required member database's signatures and the results are then output in a variety of formats.
Type Of Technology	Software
Year Produced	2023
Open Source License?	Yes
Impact	InterProScan now integrates TreeGrafter to annotate proteins with PANTHER, and CATH FunFams to provide sub-domain classifications and improve annotations, especially domains of unknown function


Description	Development of a public engagement activity: Protein families card game
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Development of a card game "Protein families" thanks to the support of the Welcome genome campus public engagement fund. The game is available as a printed version for face to face events, as well as an online version. It is currently under testing.
Year(s) Of Engagement Activity	2021


Description	Finding Pfam's protein families data in the InterPro website webinar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	After 20 years of good and faithful service, we have decided to retire the Pfam website. We are still planning to do Pfam releases and the data will still be available through the InterPro website.This webinar aims to ease the transition from Pfam to InterPro by showing different ways to search and access the Pfam annotations in the InterPro website.
Year(s) Of Engagement Activity	2022
URL	https://www.ebi.ac.uk/training/events/finding-pfam-protein-families-data-interpro-website/


Description	InterPro and Pfam resources in the context of EBI structural bioinformatics course
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	30 professionals received an introduction to the InterPro and Pfam resources, including lecture and practical, in the context of the EBI structural bioinformatics course.
Year(s) Of Engagement Activity	2021,2022
URL	https://www.ebi.ac.uk/training/events/structural-bioinformatics2021/


Description	InterPro and Pfam resources in the context of the EBI Protein course
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	An introduction to the InterPro and Pfam resources, including lecture and practicals, was given to professional scientists in the context of the EBI Protein course.
Year(s) Of Engagement Activity	2022,2023
URL	https://www.ebi.ac.uk/training/events/bioinformatics-resources-protein-biology-2022/


Description	InterPro resource in the context of EBI structural bioinformatics course
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	30 professionals received an introduction to the InterPro resource, including lecture and practical, in the context of the EBI structural bioinformatics course. A user testing session of the InterPro website was organised with 3 of the participants.
Year(s) Of Engagement Activity	2020


Description	InterPro resource in the context of the EBI Protein course
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	An introduction to the InterPro resource, including lecture and practicals, was given to professional scientists in the context of the EBI Protein course.
Year(s) Of Engagement Activity	2020


Description	InterPro workshop at the Women in Bioinformatics and Data Science Latin America conference
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	2 hours workshop with 30 participants providing an introduction to InterPro. Very positive feedback received.
Year(s) Of Engagement Activity	2022
URL	https://wbds.la/conferences/3WBDSLAC/workshops.html#


Description	InterPro/Pfam presentation at the EMBL Structural Biology Retreat
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	InterPro/Pfam presented how structure prediction is transforming the classification of protein families to an EMBL-wide audience, focused on structural biology.
Year(s) Of Engagement Activity	2022


Description	Introduction to InterPro and Pfam for the UniAndes module series
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Undergraduate students
Results and Impact	Introduction to InterPro and Pfam for the UniAndes module series for students in Bogota, Columbia. 1,5 hours session, mixed of lecture and hands-on exercises.
Year(s) Of Engagement Activity	2022


Description	Keynote speaker at XII Argentinian Congress of Bioinformatics and Computational Biology
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Approximately 30 attendees at national conference in Argentina. Presented remote keynote on the topic, "Structure Predictions Transform Protein Family Classification"
Year(s) Of Engagement Activity	2022
URL	http://2022.a2b2c.org.ar/


Description	Presentation of the AlphaFold applications for InterPro and Pfam at the EBI Industry workshop
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Presentation on how we use AlphaFold structure predictions to improve protein classification in InterPro and Pfam at the EBI Industry workshop.
Year(s) Of Engagement Activity	2022


Description	Presentation of the InterPro2GO progress at the GO consortium meeting
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	General practices, new developments, and highlights of InterPro curation and presentation of the latest InterPro2GO developments.
Year(s) Of Engagement Activity	2022


Description	Presentation of the protein families game at the COBLET conference
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Presentation of the Protein families game to the community of bioinformatics trainers through a poster and talk. Well received, Poster price award granted.
Year(s) Of Engagement Activity	2022


Description	Seminar at EMBO members meeting
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	I presented recent work on structure prediction and family classification at the EMBO members meeting.
Year(s) Of Engagement Activity	2022


Description	Seminar in Nordic EMBL Partnership Tools of the Trade Data Science Webinar Series
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presenation at the Nordic EMBL Partnership Tools of the Trade Data Science Webinar Series on structure prediction and protein family classification
Year(s) Of Engagement Activity	2022
URL	https://projects.au.dk/nordic-embl-partnership/collaborations/tools-of-the-trade-nordic-embl-partner...


Description	UCL postgraduates training about InterPro and HMMER
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	Postgraduate and undergraduate students from UCL attended a lecture and practical session on how to use InterPro and HMMER resources.
Year(s) Of Engagement Activity	2020,2021


Description	UCL postgraduates training about InterPro, Pfam and HMMER
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	Postgraduate and undergraduate students from UCL attended a lecture and practical session on how to use InterPro, Pfam and HMMER resources.
Year(s) Of Engagement Activity	2022,2023


Description	UniProt/InterPro joint webinar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	This webinar gave a brief introduction to the UniProt and InterPro websites and highlight resources available that proteomic scientists or other users with protein datasets may find useful to analyse their data. This encompasses searching by protein sequence, identifying protein peptides, and retrieving sequence-specific features and functional information both curated and predicted.
Year(s) Of Engagement Activity	2021
URL	https://www.ebi.ac.uk/training/events/guide-proteomics-data-analysis-using-uniprot-and-interpro/


Description	Webinar series on InterPro resources
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	A series of 4 webinars about InterPro resources was organised: - Understanding InterPro families, domains and functions - Using the InterPro website in your research - Accessing InterPro programmatically - InterProScan
Year(s) Of Engagement Activity	2020
URL	https://www.ebi.ac.uk/interpro/help/tutorial/