Keeping pace with protein sequence annotation; consolidating and enhancing Pfam and InterPro's methodologies for functional prediction

Lead Research Organisation: European Bioinformatics Institute

Department Name: Sequence Database Group

Abstract

New technologies, developed in the last few years, have greatly increased the amount of biological sequence information that it is possible for laboratories to produce. As a result, there is now a very large and ever-growing amount of sequence data entering public databases. The overwhelming majority of these sequences have not been examined by scientists, nor is there any experimental information to suggest what their function might be. The Pfam and InterPro resources help plug this gap, using probabilistic models to predict the function of proteins by examining their amino acid sequences. Pfam is arguably the most well-known and one of the largest producers of such models. InterPro, meanwhile, does not produce models directly, but takes them from Pfam and 10 other complementary databases, integrating them together and adding functional information. InterPro is regularly run against the full contents of the main public repository for protein sequences, the UniProt Knowledgebase, so that its functional predictions can be transferred.

In order that InterPro and Pfam can continue to cover the growing number of sequences and remain accurate in their predictions, new models need to be made and integrated, existing models need to be checked and the proteins that they match evaluated. One aim of the project is to support this effort. Another aim is to look at other prediction methods, not currently used by either Pfam or InterPro, that identify the individual amino acids in a protein sequence that are responsible for the protein's functions. We will add this functionality to the resources and use it to make their predictions more accurate. This will in turn improve the quality of information associated with large numbers of proteins in the UniProt Knowledgebase. Adding to the resources in this way will require changes to some of the underlying software. At the same time, we will update the InterPro and Pfam web sites, so that users can easily see the new and improved data, and understand what it means. Finally, we will prepare and organise training materials and courses to introduce new users to the resources and educate existing users about the new and updated features.

Technical Summary

Pfam and InterPro are two widely used databases containing thousands of protein signatures. Both databases provide websites and services so that user-submitted protein sequences can be searched for identification of conserved functional modules. In this proposal, we intend to improve the accuracy of functional annotation provided by Pfam and InterPro by annotating catalytic and ligand-binding residues for sequences in Pfam, and offering on-the-fly functional residue predictions as part of the InterProScan software. We will also use iPfam to expand the protein interaction information in both resources to the residue level. The domain- and ligand-binding data will be used in combination with other signatures to improve the accuracy of GO term assignment via the InterPro2GO pipeline. We will apply new approaches to expand and improve existing Pfam families, and annotate and integrate these families into InterPro, together with signatures from other member databases, improving and extending annotative coverage.
Methods for calculating, storing and propagating this additional tier of functional residue information in Pfam and InterPro will be developed with future computational scalability key to the design. Existing web interfaces will be extended to enable discovery of this new data. Open source libraries for the graphical representation of the data will also be produced and shared. Mechanisms for producing meaningful, representative multiple sequence alignments for displaying functional residue data will be designed. We will implement a range of web services to provide both large-scale, programmatic access and facilitate data exchange between the two databases and source databases.
The strong links between InterPro, the Gene Ontology and UniProt ensure that all annotations produced as a result of this project will be propagated to a large spectrum of protein resources, thus improving researchers' capability to predict protein function.

Planned Impact

Pfam and InterPro are long-established bioinformatics resources that are widely used to predict the function of protein sequences. Commercial and academic scientists with a wide variety of research focuses (e.g. human, animal and plant health) use both resources. In particular, these services are regularly used in the annotation of genomes and metagenomes. Data produced by InterPro and Pfam are consumed by a number of internationally-important databases, such as Ensembl, Ensembl Genomes, UniProtKB, and model organism-specific databases (including Vectorbase, Pombase, Flybase, Wormbase, TAIR and MGI). These databases, in turn, serve many hundreds of thousands of users on a monthly basis. There are also a variety of widely-used analysis platforms (such as DAVID, Blast2GO and CDD) that incorporate InterPro and Pfam's data and/or search software. Additionally, as evidenced by the large number of sequence searches and visits to Pfam and InterPro's websites, a significant number of users also choose to access these resources directly.

One vital impact of the project will be the continued provision of annotation for sequences entering public protein databases, in the face of ever increasing data volume. Improved annotation of proteins by adding new, residue-level function prediction methodologies to Pfam and InterPro is also critical, since it will allow very fine-grained analysis of proteins (for example, distinguishing catalytically inactive enzymes from their active counterparts). This will improve accuracy of annotation, and help to remove misleading information from the public databases. Augmenting users' capabilities to visualise relevant functional traits on sequences and multiple sequence alignments is highly important, since it will allow them to scrutinise conservation of such traits across families, which will help power annotation transfer based on homology.

The benefits of this project will be felt almost immediately. While InterPro has a two month release cycle, it provides annotation for UniProtKB on a monthly basis. This project would feed into that annotation pipeline. In the medium term, further benefits will come from the inclusion of improved and/or new families into Pfam and from their subsequent integration into InterPro. This will be supplemented by the addition of the novel residue-level annotations and interaction data. Users will be able to explore these data more effectively as the modifications to the user interfaces and viewers are implemented. There will also be long term benefits, in that the new functionalities added to the resources will continue to be offered following the project's completion. Erroneous annotation already in the public databases will be corrected by the inclusion of more accurate data, produced by this project.

It is anticipated that we would employ 2 different types of scientist to work on this project. Firstly, a scientific data curator would be required in order to add to and improve the content of both resources. Data curation is a highly specialised career, but the skills learned as a curator can be transferred to other sectors. For example, curators gain exceptional scientific writing skills and typically attain the ability to precis complex scientific information into a format easily understandable by others, without losing accuracy. These skills are particularly useful in positions requiring scientific (or other) communication. Curators also gain data management and mining expertise, which can be useful in a range of jobs, not limited to scientific fields. A software engineer would also be employed to implement necessary changes to the infrastructure. This may necessitate training in programming languages and software frameworks. Both staff members would be expected to learn how to present their work to others, regardless of their audience's background knowledge or expertise.

Funded Value:

£545,328

Funded Period:

Jul 14 - Jul 17

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/L024136/1

Principal Investigator:

Alex Bateman

Robert Finn

Research Subject:

Biomolecules & biochemistry (60%)

Omic sciences & technologies (24%)

Tools, technologies & methods (12%)

Research Topic:

Bioinformatics (12%)

Catalysis & enzymology (36%)

Protein expression (24%)

Proteomics (24%)

Organisations

People	ORCID iD
Alex Bateman (Principal Investigator)	http://orcid.org/0000-0002-6982-4660
Robert Finn (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Chiang Z (2015) The complexity, challenges and benefits of comparing two transporter classification systems in TCDB and Pfam in Briefings in Bioinformatics

El-Gebali S (2019) The Pfam protein families database in 2019. in Nucleic acids research

Finn R (2016) The Pfam protein families database: towards a more sustainable future in Nucleic Acids Research

Finn R (2014) Pfam: the protein families database in Nucleic Acids Research

Finn RD (2017) InterPro in 2017-beyond protein family and domain annotations. in Nucleic acids research

Gene Ontology Consortium (2021) The Gene Ontology resource: enriching a GOld mine. in Nucleic acids research

Gene Ontology Consortium (2015) Gene Ontology Consortium: going forward. in Nucleic acids research

Mitchell A (2015) The InterPro protein families database: the classification resource after 15 years. in Nucleic acids research

Mitchell AL (2019) InterPro in 2019: improving coverage, classification and access to protein sequence annotations. in Nucleic acids research

Sangrador-Vegas A (2016) GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations. in Database : the journal of biological databases and curation

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	InterPro has now been expanded to include per residue informations. Within a protein individual amino acid residues perform critical roles in the function of the sequence, for example active site residues are brought together in three-dimensional space to perform a specific chemical reaction. Having these fine grained annotations allows for a deeper understanding of a protein function, over and above understanding the presence of domains on a sequence. In addition, MobiDB-Lite has been added to InterProScan. This resource combines eight different predictors to generate a consensus focusing on long range intrinsically disordered regions, which are important mediators of protein binding and interactions, and medically important, being associated with diseases, such as neurodegeneration and cancer. The provision of intrinsic disorder prediction allows fine grained annotation and facilitates greater understanding of protein sequence data. We have also met the targets for InterPro and Pfam curation, increasing the coverage of both resources.
Exploitation Route	These developments touch on virtually all areas of modern molecular biology - increasing the breadth and depth of annotations in InterPro and Pfam - examples of sectors where this information is essential include understanding host pathogen interactions, protein engineering, enzyme discovery, and so on. The annotations provided by Pfam and InterPro are widely using in other informatics resources and are central to the automatic annotation procedures performed in UniProtKB. As new genomes are generated, most will be annotated with one or both of these resources. This allows the generation of biological knowledge and hypothesis, allow the transfer of annotation from a few experimentally characterised sequences to many new sequences as they are produced. Having key residue information will allow the design of more efficient enzymes or proteins that can be used as part of molecular machines.
Sectors	Agriculture Food and Drink Digital/Communication/Information Technologies (including Software) Environment Healthcare Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology


Description	We now have intrinsic disorder (ID) prediction, as well as a huge number of per site residues, which the scientific community can use to understand the role of ID regions and specific residues within a protein. An analysis of open patent data (https://www.surechembl.org) shows that over 7,000 patents mention Pfam, with 30 patents specifically referring to Pfam database entries. Similarly, over 1,000 patents refer to InterPro.
Sector	Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology
Impact Types	Societal Economic


Description	Biomedical Resources
Amount	£1,154,000 (GBP)
Funding ID	108433/Z/15/Z
Organisation	Wellcome Trust
Sector	Charity/Non Profit
Country	United Kingdom
Start	08/2015
End	08/2020


Title	InterPro
Description	InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites. We combine protein signatures from a number of member databases into a single searchable resource, capitalising on their individual strengths to produce a powerful integrated database and diagnostic tool
Type Of Material	Database/Collection of data
Provided To Others?	Yes
Impact	All of the annotations provided by InterPro underpin the automatic annotation pipeline within the UniProt database. InterPro provides tens of millions of sequences to UniProt through the InterPro2Go pipeline. InterPro is the most widely used web service at EMBL-EBI, performing ~15,000,000 searches per month, from around the world.
URL	http://www.ebi.ac.uk/interpro/


Title	Pfam
Description	The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function. Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pfam entries which are related by similarity of sequence, structure or profile-HMM.
Type Of Material	Database/Collection of data
Provided To Others?	Yes
Impact	Pfam is widely used within the research community.
URL	http://pfam.xfam.org


Description	CATH-Gene3D database
Organisation	University College London
Country	United Kingdom
Sector	Academic/University
PI Contribution	Integration of CATH-Gene3D profile hidden Markov models into InterPro.
Collaborator Contribution	Provision of CATH-Gene3D profile hidden Markov models to InterPro.
Impact	Integration of CATH-Gene3D profile hidden Markov models into InterPro, helping provide structural classification of protein sequences.


Description	Conserved domain database (CDD)
Organisation	National Center for Biotechnology Information (NCBI)
Country	United States
Sector	Public
PI Contribution	CDD profiles and active site information has been integrated into InterPro. Within CDD, each domain is modelled as a multiple sequence alignment, which is converted into a position-specific scoring matrix (PSSM) that allows fast identification of conserved domains in protein sequences via RPS-BLAST. InterProScan, the RPS-BLAST has been substituted by a piece of software called 'rpsbproc' ensures that InterPro can faithfully reproduce the results from CDD.
Collaborator Contribution	CDD profiles and active site information has been provided to InterPro
Impact	Integration of CDDs profiles and residue information helps InterPro provide functional and structural information about protein sequences. It also allows the mark up of important residues within sequences, such as those contributing to active sites or binding sites, enabling the finest grained annotation of sequences.
Start Year	2016


Description	HAMAP database
Organisation	Swiss Institute of Bioinformatics (SIB)
Country	Switzerland
Sector	Charity/Non Profit
PI Contribution	Integration of HAMAP profiles into InterPro.
Collaborator Contribution	Provision of HAMAP profiles to InterPro.
Impact	HAMAP profiles have been integrated into InterPro, helping provide functional classification of protein sequences.
Start Year	2009


Description	MobiDB-lite
Organisation	University of Padova
Country	Italy
Sector	Academic/University
PI Contribution	Implementation of the MobiDB-lite software for intrinsic disorder prediction to the InterProScan software.
Collaborator Contribution	Provision of the MobiDB-lite software and underlying signatures to provide prediction of intrinsically disordered regions in protein sequences.
Impact	Intrinsically disordered (ID) protein regions do not fold into defined tertiary structure. They mediate numerous functions, including flexible linkers, linear motifs that mediate interactions, & coupled folding and binding. These regions are highly important from a medical perspective, as they are associated with neurodegeneration and enriched in genes that participate in cell signaling and cancer-associated proteins. However, they display little evolutionary conservation and are therefore hard to model. Implementation of the MobiDB-lite software into InterProScan enables the prediction of ID regions on protein sequences for the first time, allowing fine-grained annotations and improving the coverage of residues by the resource.
Start Year	2016


Description	PANTHER Database
Organisation	University of Southern California
Department	Keck School of Medicine
Country	United States
Sector	Academic/University
PI Contribution	Integration of PANTHER HMMs into InterPro resource. Inclusion of PANTHER software within InterProScan to permit monthly calculation of protein matches to UniProt. This is turn allows the automatic annotation of protein sequences which is an integral component of UniProt.
Collaborator Contribution	Supply of protein family HMMs and post-processing software for InterPro integration. Provision of reference trees for use in comparison of protein classifications in InterPro between PANTHER, SFLD and TIGRFAM. Supplier of
Impact	Harmonization of protein family definitions. Use of PANTHER reference trees as a scaffold for comparing classifications from disparate databases.


Description	PIRSF database
Organisation	Georgetown University
Country	United States
Sector	Academic/University
PI Contribution	Integration of PIRSF profile hidden Markov models into InterPro
Collaborator Contribution	Provision of PIRSF profile hidden Markov models to InterPro
Impact	PIRSF HMMs have been integrated into InterPro, helping provide functional characterisation of protein sequences.


Description	PRINTS database
Organisation	University of Manchester
Country	United Kingdom
Sector	Academic/University
PI Contribution	Integration of PRINTS signatures into the InterPro database
Collaborator Contribution	Provision of PRINTS signatures to InterPro
Impact	PRINTS signatures have been integrated into InterPro, providing functional classification of proteins.


Description	PROSITE database
Organisation	Swiss Institute of Bioinformatics (SIB)
Country	Switzerland
Sector	Charity/Non Profit
PI Contribution	Integration of PROSITE profiles and patterns to InterPro.
Collaborator Contribution	Provision of of PROSITE profiles and patterns to InterPro.
Impact	PROSITE patterns and profiles have been integrated into InterPro and are included in the effort to harmonize protein names and functions between disparate resources.


Description	Pfam database
Organisation	EMBL European Bioinformatics Institute (EMBL - EBI)
Country	United Kingdom
Sector	Academic/University
PI Contribution	Pfam provides a large number of profile hidden Markov models aiming to model protein families and domains. These have been Integrated into InterPro.
Collaborator Contribution	Provision of Pfam profile hidden Markov models to InterPro
Impact	Integration of Pfam HMMs into InterPro, helping provide structure and functional classification of proteins.


Description	SMART database
Organisation	European Molecular Biology Laboratory
Department	European Molecular Biology Laboratory Heidelberg
Country	Germany
Sector	Academic/University
PI Contribution	Integration of SMART profile hidden Markov models into InterPro
Collaborator Contribution	Provision of SMART profile hidden Markov models to InterPro
Impact	SMART HMMs have been integrated into InterPro, helping provide functional classification of protein sequences.


Description	SUPERFAMILY database
Organisation	Medical Research Council (MRC)
Department	MRC Laboratory of Molecular Biology (LMB)
Country	United Kingdom
Sector	Academic/University
PI Contribution	Integration of SUPERFAMILY profile hidden Markov models into the InterPro database.
Collaborator Contribution	Provision of SUPERFAMILY profile hidden Markov models to InterPro.
Impact	Integration of SUPERFAMILY HMMs into InterPro, helping provide structural classification of protein sequences.


Description	TIGRFAM database
Organisation	J Craig Venter Institute
Country	United States
Sector	Charity/Non Profit
PI Contribution	Integration of TIGRFAM HMMs to InterPro. Generation of editable DESCfile format files for Genome Properties, and subsequent curation of the DESCfiles. Production of visualisation system for Genome Properties.
Collaborator Contribution	Provision of TIGRFAM HMMs to InterPro. Provision of Genome Properties flat file data for inclusion in InterPro.
Impact	TIGRFAM HMMs have been integrated into InterPro and are included in the effort to harmonize protein names and functions between disparate resources. A database of Genome Properties has been established at InterPro. The properties are stored as an editable DESCfile format (generated form the flat file data provided) and are currently being curated for presentation within InterPro.


Title	InterProScan5
Description	Allow the user to compare either a DNA or protein sequence and compare it against the collection of InterPro member databases, assign InterPro annotations and associated GO terms.
Type Of Technology	Software
Open Source License?	Yes
Impact	This software is widely downloaded and users (assessed through citations, distributed annotations and helpdesk interactions). This tools is widely used in other analysis pipelines, such as genomics and metagenomics analysis. This tool is updated with every release (bi-monthly) of InterPro to include both data updates and software updates. These software updates take the form of both scientific developments imposed by changes in member databases post-processing. The others are general software maintenance.
URL	https://www.ebi.ac.uk/interpro/interproscan.html


Title	PfamScan
Description	Analyse protein or DNA sequence against Pfam hmm library, perform post processes to assign Clans (family hierarchy) and allow the identification of active site residues.
Type Of Technology	Software
Open Source License?	Yes
Impact	Used widely to assign Pfam, and to reliably reproduce Pfam results.
URL	http://ftp://ftp.ebi.ac.uk/pub/databases/Pfam/Tools


Description	Bioinformatics resources for protein biology - InterPro/HMMER
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Hands-on workshop with open application describing the use of the InterPro and Pfam tools for protein annotation.
Year(s) Of Engagement Activity	2016
URL	http://www.ebi.ac.uk/training/events/2016/bioinformatics-resources-protein-biology


Description	Blog post - sweetness
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Blog post describing the proteins used in artificial sweeteners and highlighting how InterPro can be used to classify them and discover more information.
Year(s) Of Engagement Activity	2015
URL	http://interprodb.blogspot.co.uk/2015/03/the-sweetest-thing.html


Description	EMBL-EBI workshop at National Veterinary Research Institute, Poland
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	National Veterinary Research Institute, Poland requested a training workshop covering a selected set of EBI resources including InterPro. The delagates were interested in large-scale protein and metageomics analysis and so the topics covered (in the form of presentations and hands-on training exercises) included InterPro as well as Genome Properties. Delegates reported an enthusiasm to utilise the resources covered.
Year(s) Of Engagement Activity	2017
URL	https://www.ebi.ac.uk/training/events/2017/embl-ebi-resources-and-tools-genomics-and-proteomics


Description	Exploring biological sequence data
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Presentation and interactive hands-on at 'exploring biological sequence data' workshop, with open participation.
Year(s) Of Engagement Activity	2016
URL	http://www.ebi.ac.uk/training/events/2016/exploring-biological-sequence-data


Description	Functional analysis using InterPro
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	Lecture and interactive hands-on at UCL as guest lecturer as part of the Genes and Disease MSc module. Demonstrating how bioinformatics tools such as InterPro can be applied to real world questions and datasets. Lots of questions and debate.
Year(s) Of Engagement Activity	2016


Description	GO annotation in InterPro
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presentation at the international society for biocuration meeting, describing the efforts made to keep annotations up to date.
Year(s) Of Engagement Activity	2016


Description	InterPro and Pfam session in Protein Stucture Analyis at Univesity of Cambridge
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	Presentation and hands-on training session covering InterPro and Pfam as part of a University of Cambridge 2-day Protein Stucture Analyis course.
Year(s) Of Engagement Activity	2017
URL	https://www.training.cam.ac.uk/event/2052063


Description	InterPro session at Bioinformatics Resources for Protein Biology course at EBI
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presentation and hands-on training session as past of a 3-day EBI course on Bioinformatics Resources for Protein Biology.
Year(s) Of Engagement Activity	2017
URL	https://www.ebi.ac.uk/training/events/2017/bioinformatics-resources-protein-biology-0


Description	InterPro session at Bioinformatics Resources for Protein Biology course at EBI
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presentation and hands-on training session on InterPro and Genome Properties within an EBI organised course covering Bioinformatics Resources for Protein Biology.
Year(s) Of Engagement Activity	2018
URL	https://www.ebi.ac.uk/training/events/2018/bioinformatics-resources-protein-biology-2


Description	InterPro session at Structural Bioinformatics course at EBI
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presentation and hands-on training session covering InterPro as part of a week-long EBI course on structural bioinformatics.
Year(s) Of Engagement Activity	2017
URL	https://www.ebi.ac.uk/training/events/2017/structural-bioinformatics-1


Description	InterPro session in Exploring Biological Sequences course at EBI
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Presentation and hands-on training covering InterPro as part of the EBI Exploring Biological Sequences course.
Year(s) Of Engagement Activity	2017
URL	https://www.ebi.ac.uk/training/events/2017/exploring-biological-sequences


Description	Introduction to InterPro at University of Cambridge
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	Presentation and hands-on training of a half-day module covering InterPro and Genome Properties as part of the University of Cambridge training provision.
Year(s) Of Engagement Activity	2018
URL	https://www.training.cam.ac.uk/event/2239008


Description	Presentation to EBI Industry Programme quarterly meeting
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Presentation of work on Genome Properties resource.
Year(s) Of Engagement Activity	2018


Description	Protein focus - what's ape
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Blog post discussing the proteins that are associated with brain size and how they have changed during human evolution, with links to InterPro explaining how information on these proteins can be obtained.
Year(s) Of Engagement Activity	2017
URL	https://proteinswebteam.github.io/interpro-blog/2017/09/22/What's-ape/


Description	Protein focus - zika virus
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Blog post aimed at the general public highlighting the proteins involved with the zika virus and how InterPro can be used to find out more information about them.
Year(s) Of Engagement Activity	2016


Description	Protein focus article - dont blame the cat
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Blog post highlighting the relationship between cats and toxoplasmosis, and showing how InterPro can be used to find information about the protein pathways involved in this behaviour.
Year(s) Of Engagement Activity	2014
URL	http://interprodb.blogspot.co.uk/2014/11/protein-focus-dont-blame-cat.html


Description	Structural Bioinformatics 2016
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Presentation and hands-on at structural bioinformatics workshop, demonstrating how Pfam and InterPro can be used to help classify and annotate proteins.
Year(s) Of Engagement Activity	2016
URL	http://www.ebi.ac.uk/training/events/2016/structural-bioinformatics-2016


Description	Understanding protein families, domains and function using InterPro and Pfam
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	Workshop at University of Cambridge aimed at using InterPro and Pfam to annotate proteins and to solve real world biological data questions. Consisted of lectures and hands-on sessions using the tools with lots of interaction.
Year(s) Of Engagement Activity	2016
URL	https://www.training.cam.ac.uk/bioinformatics/event/1879212