Keeping pace with protein sequence annotation; consolidating and enhancing Pfam and InterPro's methodologies for functional prediction

Lead Research Organisation: European Bioinformatics Institute
Department Name: Sequence Database Group

Abstract

New technologies, developed in the last few years, have greatly increased the amount of biological sequence information that it is possible for laboratories to produce. As a result, there is now a very large and ever-growing amount of sequence data entering public databases. The overwhelming majority of these sequences have not been examined by scientists, nor is there any experimental information to suggest what their function might be. The Pfam and InterPro resources help plug this gap, using probabilistic models to predict the function of proteins by examining their amino acid sequences. Pfam is arguably the most well-known and one of the largest producers of such models. InterPro, meanwhile, does not produce models directly, but takes them from Pfam and 10 other complementary databases, integrating them together and adding functional information. InterPro is regularly run against the full contents of the main public repository for protein sequences, the UniProt Knowledgebase, so that its functional predictions can be transferred.

In order that InterPro and Pfam can continue to cover the growing number of sequences and remain accurate in their predictions, new models need to be made and integrated, existing models need to be checked and the proteins that they match evaluated. One aim of the project is to support this effort. Another aim is to look at other prediction methods, not currently used by either Pfam or InterPro, that identify the individual amino acids in a protein sequence that are responsible for the protein's functions. We will add this functionality to the resources and use it to make their predictions more accurate. This will in turn improve the quality of information associated with large numbers of proteins in the UniProt Knowledgebase. Adding to the resources in this way will require changes to some of the underlying software. At the same time, we will update the InterPro and Pfam web sites, so that users can easily see the new and improved data, and understand what it means. Finally, we will prepare and organise training materials and courses to introduce new users to the resources and educate existing users about the new and updated features.

Technical Summary

Pfam and InterPro are two widely used databases containing thousands of protein signatures. Both databases provide websites and services so that user-submitted protein sequences can be searched for identification of conserved functional modules. In this proposal, we intend to improve the accuracy of functional annotation provided by Pfam and InterPro by annotating catalytic and ligand-binding residues for sequences in Pfam, and offering on-the-fly functional residue predictions as part of the InterProScan software. We will also use iPfam to expand the protein interaction information in both resources to the residue level. The domain- and ligand-binding data will be used in combination with other signatures to improve the accuracy of GO term assignment via the InterPro2GO pipeline. We will apply new approaches to expand and improve existing Pfam families, and annotate and integrate these families into InterPro, together with signatures from other member databases, improving and extending annotative coverage.
Methods for calculating, storing and propagating this additional tier of functional residue information in Pfam and InterPro will be developed with future computational scalability key to the design. Existing web interfaces will be extended to enable discovery of this new data. Open source libraries for the graphical representation of the data will also be produced and shared. Mechanisms for producing meaningful, representative multiple sequence alignments for displaying functional residue data will be designed. We will implement a range of web services to provide both large-scale, programmatic access and facilitate data exchange between the two databases and source databases.
The strong links between InterPro, the Gene Ontology and UniProt ensure that all annotations produced as a result of this project will be propagated to a large spectrum of protein resources, thus improving researchers' capability to predict protein function.

Planned Impact

Pfam and InterPro are long-established bioinformatics resources that are widely used to predict the function of protein sequences. Commercial and academic scientists with a wide variety of research focuses (e.g. human, animal and plant health) use both resources. In particular, these services are regularly used in the annotation of genomes and metagenomes. Data produced by InterPro and Pfam are consumed by a number of internationally-important databases, such as Ensembl, Ensembl Genomes, UniProtKB, and model organism-specific databases (including Vectorbase, Pombase, Flybase, Wormbase, TAIR and MGI). These databases, in turn, serve many hundreds of thousands of users on a monthly basis. There are also a variety of widely-used analysis platforms (such as DAVID, Blast2GO and CDD) that incorporate InterPro and Pfam's data and/or search software. Additionally, as evidenced by the large number of sequence searches and visits to Pfam and InterPro's websites, a significant number of users also choose to access these resources directly.

One vital impact of the project will be the continued provision of annotation for sequences entering public protein databases, in the face of ever increasing data volume. Improved annotation of proteins by adding new, residue-level function prediction methodologies to Pfam and InterPro is also critical, since it will allow very fine-grained analysis of proteins (for example, distinguishing catalytically inactive enzymes from their active counterparts). This will improve accuracy of annotation, and help to remove misleading information from the public databases. Augmenting users' capabilities to visualise relevant functional traits on sequences and multiple sequence alignments is highly important, since it will allow them to scrutinise conservation of such traits across families, which will help power annotation transfer based on homology.

The benefits of this project will be felt almost immediately. While InterPro has a two month release cycle, it provides annotation for UniProtKB on a monthly basis. This project would feed into that annotation pipeline. In the medium term, further benefits will come from the inclusion of improved and/or new families into Pfam and from their subsequent integration into InterPro. This will be supplemented by the addition of the novel residue-level annotations and interaction data. Users will be able to explore these data more effectively as the modifications to the user interfaces and viewers are implemented. There will also be long term benefits, in that the new functionalities added to the resources will continue to be offered following the project's completion. Erroneous annotation already in the public databases will be corrected by the inclusion of more accurate data, produced by this project.

It is anticipated that we would employ 2 different types of scientist to work on this project. Firstly, a scientific data curator would be required in order to add to and improve the content of both resources. Data curation is a highly specialised career, but the skills learned as a curator can be transferred to other sectors. For example, curators gain exceptional scientific writing skills and typically attain the ability to precis complex scientific information into a format easily understandable by others, without losing accuracy. These skills are particularly useful in positions requiring scientific (or other) communication. Curators also gain data management and mining expertise, which can be useful in a range of jobs, not limited to scientific fields. A software engineer would also be employed to implement necessary changes to the infrastructure. This may necessitate training in programming languages and software frameworks. Both staff members would be expected to learn how to present their work to others, regardless of their audience's background knowledge or expertise.

Publications

10 25 50

publication icon
Finn R (2014) Pfam: the protein families database in Nucleic Acids Research

publication icon
Finn RD (2017) InterPro in 2017-beyond protein family and domain annotations. in Nucleic acids research

publication icon
Gene Ontology Consortium (2015) Gene Ontology Consortium: going forward. in Nucleic acids research

publication icon
Gene Ontology Consortium (2021) The Gene Ontology resource: enriching a GOld mine. in Nucleic acids research

publication icon
Sangrador-Vegas A (2016) GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations. in Database : the journal of biological databases and curation

publication icon
The Gene Ontology Consortium (2019) The Gene Ontology Resource: 20 years and still GOing strong. in Nucleic acids research

 
Description InterPro has now been expanded to include per residue informations. Within a protein individual amino acid residues perform critical roles in the function of the sequence, for example active site residues are brought together in three-dimensional space to perform a specific chemical reaction. Having these fine grained annotations allows for a deeper understanding of a protein function, over and above understanding the presence of domains on a sequence. In addition, MobiDB-Lite has been added to InterProScan. This resource combines eight different predictors to generate a consensus focusing on long range intrinsically disordered regions, which are important mediators of protein binding and interactions, and medically important, being associated with diseases, such as neurodegeneration and cancer. The provision of intrinsic disorder prediction allows fine grained annotation and facilitates greater understanding of protein sequence data.

We have also met the targets for InterPro and Pfam curation, increasing the coverage of both resources.
Exploitation Route These developments touch on virtually all areas of modern molecular biology - increasing the breadth and depth of annotations in InterPro and Pfam - examples of sectors where this information is essential include understanding host pathogen interactions, protein engineering, enzyme discovery, and so on. The annotations provided by Pfam and InterPro are widely using in other informatics resources and are central to the automatic annotation procedures performed in UniProtKB. As new genomes are generated, most will be annotated with one or both of these resources. This allows the generation of biological knowledge and hypothesis, allow the transfer of annotation from a few experimentally characterised sequences to many new sequences as they are produced. Having key residue information will allow the design of more efficient enzymes or proteins that can be used as part of molecular machines.
Sectors Agriculture

Food and Drink

Digital/Communication/Information Technologies (including Software)

Environment

Healthcare

Manufacturing

including Industrial Biotechology

Pharmaceuticals and Medical Biotechnology

 
Description We now have intrinsic disorder (ID) prediction, as well as a huge number of per site residues, which the scientific community can use to understand the role of ID regions and specific residues within a protein. An analysis of open patent data (https://www.surechembl.org) shows that over 7,000 patents mention Pfam, with 30 patents specifically referring to Pfam database entries. Similarly, over 1,000 patents refer to InterPro.
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology
Impact Types Societal

Economic

 
Description Biomedical Resources
Amount £1,154,000 (GBP)
Funding ID 108433/Z/15/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 08/2015 
End 08/2020
 
Title InterPro 
Description InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites. We combine protein signatures from a number of member databases into a single searchable resource, capitalising on their individual strengths to produce a powerful integrated database and diagnostic tool 
Type Of Material Database/Collection of data 
Provided To Others? Yes  
Impact All of the annotations provided by InterPro underpin the automatic annotation pipeline within the UniProt database. InterPro provides tens of millions of sequences to UniProt through the InterPro2Go pipeline. InterPro is the most widely used web service at EMBL-EBI, performing ~15,000,000 searches per month, from around the world. 
URL http://www.ebi.ac.uk/interpro/
 
Title Pfam 
Description The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function. Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pfam entries which are related by similarity of sequence, structure or profile-HMM. 
Type Of Material Database/Collection of data 
Provided To Others? Yes  
Impact Pfam is widely used within the research community. 
URL http://pfam.xfam.org
 
Description CATH-Gene3D database 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Integration of CATH-Gene3D profile hidden Markov models into InterPro.
Collaborator Contribution Provision of CATH-Gene3D profile hidden Markov models to InterPro.
Impact Integration of CATH-Gene3D profile hidden Markov models into InterPro, helping provide structural classification of protein sequences.
 
Description Conserved domain database (CDD) 
Organisation National Center for Biotechnology Information (NCBI)
Country United States 
Sector Public 
PI Contribution CDD profiles and active site information has been integrated into InterPro. Within CDD, each domain is modelled as a multiple sequence alignment, which is converted into a position-specific scoring matrix (PSSM) that allows fast identification of conserved domains in protein sequences via RPS-BLAST. InterProScan, the RPS-BLAST has been substituted by a piece of software called 'rpsbproc' ensures that InterPro can faithfully reproduce the results from CDD.
Collaborator Contribution CDD profiles and active site information has been provided to InterPro
Impact Integration of CDDs profiles and residue information helps InterPro provide functional and structural information about protein sequences. It also allows the mark up of important residues within sequences, such as those contributing to active sites or binding sites, enabling the finest grained annotation of sequences.
Start Year 2016
 
Description HAMAP database 
Organisation Swiss Institute of Bioinformatics (SIB)
Country Switzerland 
Sector Charity/Non Profit 
PI Contribution Integration of HAMAP profiles into InterPro.
Collaborator Contribution Provision of HAMAP profiles to InterPro.
Impact HAMAP profiles have been integrated into InterPro, helping provide functional classification of protein sequences.
Start Year 2009
 
Description MobiDB-lite 
Organisation University of Padova
Country Italy 
Sector Academic/University 
PI Contribution Implementation of the MobiDB-lite software for intrinsic disorder prediction to the InterProScan software.
Collaborator Contribution Provision of the MobiDB-lite software and underlying signatures to provide prediction of intrinsically disordered regions in protein sequences.
Impact Intrinsically disordered (ID) protein regions do not fold into defined tertiary structure. They mediate numerous functions, including flexible linkers, linear motifs that mediate interactions, & coupled folding and binding. These regions are highly important from a medical perspective, as they are associated with neurodegeneration and enriched in genes that participate in cell signaling and cancer-associated proteins. However, they display little evolutionary conservation and are therefore hard to model. Implementation of the MobiDB-lite software into InterProScan enables the prediction of ID regions on protein sequences for the first time, allowing fine-grained annotations and improving the coverage of residues by the resource.
Start Year 2016
 
Description PANTHER Database 
Organisation University of Southern California
Department Keck School of Medicine
Country United States 
Sector Academic/University 
PI Contribution Integration of PANTHER HMMs into InterPro resource. Inclusion of PANTHER software within InterProScan to permit monthly calculation of protein matches to UniProt. This is turn allows the automatic annotation of protein sequences which is an integral component of UniProt.
Collaborator Contribution Supply of protein family HMMs and post-processing software for InterPro integration. Provision of reference trees for use in comparison of protein classifications in InterPro between PANTHER, SFLD and TIGRFAM. Supplier of
Impact Harmonization of protein family definitions. Use of PANTHER reference trees as a scaffold for comparing classifications from disparate databases.
 
Description PIRSF database 
Organisation Georgetown University
Country United States 
Sector Academic/University 
PI Contribution Integration of PIRSF profile hidden Markov models into InterPro
Collaborator Contribution Provision of PIRSF profile hidden Markov models to InterPro
Impact PIRSF HMMs have been integrated into InterPro, helping provide functional characterisation of protein sequences.
 
Description PRINTS database 
Organisation University of Manchester
Country United Kingdom 
Sector Academic/University 
PI Contribution Integration of PRINTS signatures into the InterPro database
Collaborator Contribution Provision of PRINTS signatures to InterPro
Impact PRINTS signatures have been integrated into InterPro, providing functional classification of proteins.
 
Description PROSITE database 
Organisation Swiss Institute of Bioinformatics (SIB)
Country Switzerland 
Sector Charity/Non Profit 
PI Contribution Integration of PROSITE profiles and patterns to InterPro.
Collaborator Contribution Provision of of PROSITE profiles and patterns to InterPro.
Impact PROSITE patterns and profiles have been integrated into InterPro and are included in the effort to harmonize protein names and functions between disparate resources.
 
Description Pfam database 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution Pfam provides a large number of profile hidden Markov models aiming to model protein families and domains. These have been Integrated into InterPro.
Collaborator Contribution Provision of Pfam profile hidden Markov models to InterPro
Impact Integration of Pfam HMMs into InterPro, helping provide structure and functional classification of proteins.
 
Description SFLD added to InterPro Consortium 
Organisation University of California, San Francisco
Department Department of Bioengineering and Therapeutic Sciences
Country United States 
Sector Academic/University 
PI Contribution We have helped SFLD move to a more formal database design, and provided them with software tools and advice to enable the systematic transfer of annotations. We have begun to integrate SFLD into InterPro. This process has functioned as a QC on the SFLD data, and we have fed back any issues we identified. We have focused on the integration of the SFLD subset of gold-standard entries for comparison with TIGRFAM and PANTHER.
Collaborator Contribution SFLD provide the underlying knowledge and data to InterPro, which take the form of multiple sequence alignments, functional annotations and structured ontologies. SFLD provided a subset list of gold-standard families for use in the comparison between PANTHER and TIGRFAM, towards a harmonization of protein family names and functional annotations in InterPro.
Impact The SFLD resource is in the process of being added to InterPro, where it will provide fine grained protein annotations associated with enzymes with chemical reactions.
Start Year 2015
 
Description SMART database 
Organisation European Molecular Biology Laboratory
Department European Molecular Biology Laboratory Heidelberg
Country Germany 
Sector Academic/University 
PI Contribution Integration of SMART profile hidden Markov models into InterPro
Collaborator Contribution Provision of SMART profile hidden Markov models to InterPro
Impact SMART HMMs have been integrated into InterPro, helping provide functional classification of protein sequences.
 
Description SUPERFAMILY database 
Organisation Medical Research Council (MRC)
Department MRC Laboratory of Molecular Biology (LMB)
Country United Kingdom 
Sector Academic/University 
PI Contribution Integration of SUPERFAMILY profile hidden Markov models into the InterPro database.
Collaborator Contribution Provision of SUPERFAMILY profile hidden Markov models to InterPro.
Impact Integration of SUPERFAMILY HMMs into InterPro, helping provide structural classification of protein sequences.
 
Description TIGRFAM database 
Organisation J Craig Venter Institute
Country United States 
Sector Charity/Non Profit 
PI Contribution Integration of TIGRFAM HMMs to InterPro. Generation of editable DESCfile format files for Genome Properties, and subsequent curation of the DESCfiles. Production of visualisation system for Genome Properties.
Collaborator Contribution Provision of TIGRFAM HMMs to InterPro. Provision of Genome Properties flat file data for inclusion in InterPro.
Impact TIGRFAM HMMs have been integrated into InterPro and are included in the effort to harmonize protein names and functions between disparate resources. A database of Genome Properties has been established at InterPro. The properties are stored as an editable DESCfile format (generated form the flat file data provided) and are currently being curated for presentation within InterPro.
 
Title InterProScan5 
Description Allow the user to compare either a DNA or protein sequence and compare it against the collection of InterPro member databases, assign InterPro annotations and associated GO terms. 
Type Of Technology Software 
Open Source License? Yes  
Impact This software is widely downloaded and users (assessed through citations, distributed annotations and helpdesk interactions). This tools is widely used in other analysis pipelines, such as genomics and metagenomics analysis. This tool is updated with every release (bi-monthly) of InterPro to include both data updates and software updates. These software updates take the form of both scientific developments imposed by changes in member databases post-processing. The others are general software maintenance. 
URL https://www.ebi.ac.uk/interpro/interproscan.html
 
Title PfamScan 
Description Analyse protein or DNA sequence against Pfam hmm library, perform post processes to assign Clans (family hierarchy) and allow the identification of active site residues. 
Type Of Technology Software 
Open Source License? Yes  
Impact Used widely to assign Pfam, and to reliably reproduce Pfam results. 
URL http://ftp://ftp.ebi.ac.uk/pub/databases/Pfam/Tools
 
Description Bioinformatics resources for protein biology - InterPro/HMMER 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Hands-on workshop with open application describing the use of the InterPro and Pfam tools for protein annotation.
Year(s) Of Engagement Activity 2016
URL http://www.ebi.ac.uk/training/events/2016/bioinformatics-resources-protein-biology
 
Description Blog post - sweetness 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Blog post describing the proteins used in artificial sweeteners and highlighting how InterPro can be used to classify them and discover more information.
Year(s) Of Engagement Activity 2015
URL http://interprodb.blogspot.co.uk/2015/03/the-sweetest-thing.html
 
Description EMBL-EBI workshop at National Veterinary Research Institute, Poland 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact National Veterinary Research Institute, Poland requested a training workshop covering a selected set of EBI resources including InterPro. The delagates were interested in large-scale protein and metageomics analysis and so the topics covered (in the form of presentations and hands-on training exercises) included InterPro as well as Genome Properties. Delegates reported an enthusiasm to utilise the resources covered.
Year(s) Of Engagement Activity 2017
URL https://www.ebi.ac.uk/training/events/2017/embl-ebi-resources-and-tools-genomics-and-proteomics
 
Description Exploring biological sequence data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Presentation and interactive hands-on at 'exploring biological sequence data' workshop, with open participation.
Year(s) Of Engagement Activity 2016
URL http://www.ebi.ac.uk/training/events/2016/exploring-biological-sequence-data
 
Description Functional analysis using InterPro 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Lecture and interactive hands-on at UCL as guest lecturer as part of the Genes and Disease MSc module. Demonstrating how bioinformatics tools such as InterPro can be applied to real world questions and datasets. Lots of questions and debate.
Year(s) Of Engagement Activity 2016
 
Description GO annotation in InterPro 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation at the international society for biocuration meeting, describing the efforts made to keep annotations up to date.
Year(s) Of Engagement Activity 2016
 
Description InterPro and Pfam session in Protein Stucture Analyis at Univesity of Cambridge 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Presentation and hands-on training session covering InterPro and Pfam as part of a University of Cambridge 2-day Protein Stucture Analyis course.
Year(s) Of Engagement Activity 2017
URL https://www.training.cam.ac.uk/event/2052063
 
Description InterPro session at Bioinformatics Resources for Protein Biology course at EBI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation and hands-on training session on InterPro and Genome Properties within an EBI organised course covering Bioinformatics Resources for Protein Biology.
Year(s) Of Engagement Activity 2018
URL https://www.ebi.ac.uk/training/events/2018/bioinformatics-resources-protein-biology-2
 
Description InterPro session at Bioinformatics Resources for Protein Biology course at EBI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation and hands-on training session as past of a 3-day EBI course on Bioinformatics Resources for Protein Biology.
Year(s) Of Engagement Activity 2017
URL https://www.ebi.ac.uk/training/events/2017/bioinformatics-resources-protein-biology-0
 
Description InterPro session at Structural Bioinformatics course at EBI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation and hands-on training session covering InterPro as part of a week-long EBI course on structural bioinformatics.
Year(s) Of Engagement Activity 2017
URL https://www.ebi.ac.uk/training/events/2017/structural-bioinformatics-1
 
Description InterPro session in Exploring Biological Sequences course at EBI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presentation and hands-on training covering InterPro as part of the EBI Exploring Biological Sequences course.
Year(s) Of Engagement Activity 2017
URL https://www.ebi.ac.uk/training/events/2017/exploring-biological-sequences
 
Description Introduction to InterPro at University of Cambridge 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Presentation and hands-on training of a half-day module covering InterPro and Genome Properties as part of the University of Cambridge training provision.
Year(s) Of Engagement Activity 2018
URL https://www.training.cam.ac.uk/event/2239008
 
Description Presentation to EBI Industry Programme quarterly meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Presentation of work on Genome Properties resource.
Year(s) Of Engagement Activity 2018
 
Description Protein focus - what's ape 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Blog post discussing the proteins that are associated with brain size and how they have changed during human evolution, with links to InterPro explaining how information on these proteins can be obtained.
Year(s) Of Engagement Activity 2017
URL https://proteinswebteam.github.io/interpro-blog/2017/09/22/What's-ape/
 
Description Protein focus - zika virus 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Blog post aimed at the general public highlighting the proteins involved with the zika virus and how InterPro can be used to find out more information about them.
Year(s) Of Engagement Activity 2016
 
Description Protein focus article - dont blame the cat 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Blog post highlighting the relationship between cats and toxoplasmosis, and showing how InterPro can be used to find information about the protein pathways involved in this behaviour.
Year(s) Of Engagement Activity 2014
URL http://interprodb.blogspot.co.uk/2014/11/protein-focus-dont-blame-cat.html
 
Description Structural Bioinformatics 2016 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Presentation and hands-on at structural bioinformatics workshop, demonstrating how Pfam and InterPro can be used to help classify and annotate proteins.
Year(s) Of Engagement Activity 2016
URL http://www.ebi.ac.uk/training/events/2016/structural-bioinformatics-2016
 
Description Understanding protein families, domains and function using InterPro and Pfam 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Workshop at University of Cambridge aimed at using InterPro and Pfam to annotate proteins and to solve real world biological data questions. Consisted of lectures and hands-on sessions using the tools with lots of interaction.
Year(s) Of Engagement Activity 2016
URL https://www.training.cam.ac.uk/bioinformatics/event/1879212