Keeping pace with protein sequence annotation; consolidating and enhancing Pfam and InterPro's methodologies for functional prediction
Lead Research Organisation:
European Bioinformatics Institute
Department Name: Sequence Database Group
Abstract
New technologies, developed in the last few years, have greatly increased the amount of biological sequence information that it is possible for laboratories to produce. As a result, there is now a very large and ever-growing amount of sequence data entering public databases. The overwhelming majority of these sequences have not been examined by scientists, nor is there any experimental information to suggest what their function might be. The Pfam and InterPro resources help plug this gap, using probabilistic models to predict the function of proteins by examining their amino acid sequences. Pfam is arguably the most well-known and one of the largest producers of such models. InterPro, meanwhile, does not produce models directly, but takes them from Pfam and 10 other complementary databases, integrating them together and adding functional information. InterPro is regularly run against the full contents of the main public repository for protein sequences, the UniProt Knowledgebase, so that its functional predictions can be transferred.
In order that InterPro and Pfam can continue to cover the growing number of sequences and remain accurate in their predictions, new models need to be made and integrated, existing models need to be checked and the proteins that they match evaluated. One aim of the project is to support this effort. Another aim is to look at other prediction methods, not currently used by either Pfam or InterPro, that identify the individual amino acids in a protein sequence that are responsible for the protein's functions. We will add this functionality to the resources and use it to make their predictions more accurate. This will in turn improve the quality of information associated with large numbers of proteins in the UniProt Knowledgebase. Adding to the resources in this way will require changes to some of the underlying software. At the same time, we will update the InterPro and Pfam web sites, so that users can easily see the new and improved data, and understand what it means. Finally, we will prepare and organise training materials and courses to introduce new users to the resources and educate existing users about the new and updated features.
In order that InterPro and Pfam can continue to cover the growing number of sequences and remain accurate in their predictions, new models need to be made and integrated, existing models need to be checked and the proteins that they match evaluated. One aim of the project is to support this effort. Another aim is to look at other prediction methods, not currently used by either Pfam or InterPro, that identify the individual amino acids in a protein sequence that are responsible for the protein's functions. We will add this functionality to the resources and use it to make their predictions more accurate. This will in turn improve the quality of information associated with large numbers of proteins in the UniProt Knowledgebase. Adding to the resources in this way will require changes to some of the underlying software. At the same time, we will update the InterPro and Pfam web sites, so that users can easily see the new and improved data, and understand what it means. Finally, we will prepare and organise training materials and courses to introduce new users to the resources and educate existing users about the new and updated features.
Technical Summary
Pfam and InterPro are two widely used databases containing thousands of protein signatures. Both databases provide websites and services so that user-submitted protein sequences can be searched for identification of conserved functional modules. In this proposal, we intend to improve the accuracy of functional annotation provided by Pfam and InterPro by annotating catalytic and ligand-binding residues for sequences in Pfam, and offering on-the-fly functional residue predictions as part of the InterProScan software. We will also use iPfam to expand the protein interaction information in both resources to the residue level. The domain- and ligand-binding data will be used in combination with other signatures to improve the accuracy of GO term assignment via the InterPro2GO pipeline. We will apply new approaches to expand and improve existing Pfam families, and annotate and integrate these families into InterPro, together with signatures from other member databases, improving and extending annotative coverage.
Methods for calculating, storing and propagating this additional tier of functional residue information in Pfam and InterPro will be developed with future computational scalability key to the design. Existing web interfaces will be extended to enable discovery of this new data. Open source libraries for the graphical representation of the data will also be produced and shared. Mechanisms for producing meaningful, representative multiple sequence alignments for displaying functional residue data will be designed. We will implement a range of web services to provide both large-scale, programmatic access and facilitate data exchange between the two databases and source databases.
The strong links between InterPro, the Gene Ontology and UniProt ensure that all annotations produced as a result of this project will be propagated to a large spectrum of protein resources, thus improving researchers' capability to predict protein function.
Methods for calculating, storing and propagating this additional tier of functional residue information in Pfam and InterPro will be developed with future computational scalability key to the design. Existing web interfaces will be extended to enable discovery of this new data. Open source libraries for the graphical representation of the data will also be produced and shared. Mechanisms for producing meaningful, representative multiple sequence alignments for displaying functional residue data will be designed. We will implement a range of web services to provide both large-scale, programmatic access and facilitate data exchange between the two databases and source databases.
The strong links between InterPro, the Gene Ontology and UniProt ensure that all annotations produced as a result of this project will be propagated to a large spectrum of protein resources, thus improving researchers' capability to predict protein function.
Planned Impact
Pfam and InterPro are long-established bioinformatics resources that are widely used to predict the function of protein sequences. Commercial and academic scientists with a wide variety of research focuses (e.g. human, animal and plant health) use both resources. In particular, these services are regularly used in the annotation of genomes and metagenomes. Data produced by InterPro and Pfam are consumed by a number of internationally-important databases, such as Ensembl, Ensembl Genomes, UniProtKB, and model organism-specific databases (including Vectorbase, Pombase, Flybase, Wormbase, TAIR and MGI). These databases, in turn, serve many hundreds of thousands of users on a monthly basis. There are also a variety of widely-used analysis platforms (such as DAVID, Blast2GO and CDD) that incorporate InterPro and Pfam's data and/or search software. Additionally, as evidenced by the large number of sequence searches and visits to Pfam and InterPro's websites, a significant number of users also choose to access these resources directly.
One vital impact of the project will be the continued provision of annotation for sequences entering public protein databases, in the face of ever increasing data volume. Improved annotation of proteins by adding new, residue-level function prediction methodologies to Pfam and InterPro is also critical, since it will allow very fine-grained analysis of proteins (for example, distinguishing catalytically inactive enzymes from their active counterparts). This will improve accuracy of annotation, and help to remove misleading information from the public databases. Augmenting users' capabilities to visualise relevant functional traits on sequences and multiple sequence alignments is highly important, since it will allow them to scrutinise conservation of such traits across families, which will help power annotation transfer based on homology.
The benefits of this project will be felt almost immediately. While InterPro has a two month release cycle, it provides annotation for UniProtKB on a monthly basis. This project would feed into that annotation pipeline. In the medium term, further benefits will come from the inclusion of improved and/or new families into Pfam and from their subsequent integration into InterPro. This will be supplemented by the addition of the novel residue-level annotations and interaction data. Users will be able to explore these data more effectively as the modifications to the user interfaces and viewers are implemented. There will also be long term benefits, in that the new functionalities added to the resources will continue to be offered following the project's completion. Erroneous annotation already in the public databases will be corrected by the inclusion of more accurate data, produced by this project.
It is anticipated that we would employ 2 different types of scientist to work on this project. Firstly, a scientific data curator would be required in order to add to and improve the content of both resources. Data curation is a highly specialised career, but the skills learned as a curator can be transferred to other sectors. For example, curators gain exceptional scientific writing skills and typically attain the ability to precis complex scientific information into a format easily understandable by others, without losing accuracy. These skills are particularly useful in positions requiring scientific (or other) communication. Curators also gain data management and mining expertise, which can be useful in a range of jobs, not limited to scientific fields. A software engineer would also be employed to implement necessary changes to the infrastructure. This may necessitate training in programming languages and software frameworks. Both staff members would be expected to learn how to present their work to others, regardless of their audience's background knowledge or expertise.
One vital impact of the project will be the continued provision of annotation for sequences entering public protein databases, in the face of ever increasing data volume. Improved annotation of proteins by adding new, residue-level function prediction methodologies to Pfam and InterPro is also critical, since it will allow very fine-grained analysis of proteins (for example, distinguishing catalytically inactive enzymes from their active counterparts). This will improve accuracy of annotation, and help to remove misleading information from the public databases. Augmenting users' capabilities to visualise relevant functional traits on sequences and multiple sequence alignments is highly important, since it will allow them to scrutinise conservation of such traits across families, which will help power annotation transfer based on homology.
The benefits of this project will be felt almost immediately. While InterPro has a two month release cycle, it provides annotation for UniProtKB on a monthly basis. This project would feed into that annotation pipeline. In the medium term, further benefits will come from the inclusion of improved and/or new families into Pfam and from their subsequent integration into InterPro. This will be supplemented by the addition of the novel residue-level annotations and interaction data. Users will be able to explore these data more effectively as the modifications to the user interfaces and viewers are implemented. There will also be long term benefits, in that the new functionalities added to the resources will continue to be offered following the project's completion. Erroneous annotation already in the public databases will be corrected by the inclusion of more accurate data, produced by this project.
It is anticipated that we would employ 2 different types of scientist to work on this project. Firstly, a scientific data curator would be required in order to add to and improve the content of both resources. Data curation is a highly specialised career, but the skills learned as a curator can be transferred to other sectors. For example, curators gain exceptional scientific writing skills and typically attain the ability to precis complex scientific information into a format easily understandable by others, without losing accuracy. These skills are particularly useful in positions requiring scientific (or other) communication. Curators also gain data management and mining expertise, which can be useful in a range of jobs, not limited to scientific fields. A software engineer would also be employed to implement necessary changes to the infrastructure. This may necessitate training in programming languages and software frameworks. Both staff members would be expected to learn how to present their work to others, regardless of their audience's background knowledge or expertise.
Organisations
- European Bioinformatics Institute (Lead Research Organisation)
- University College London (Collaboration)
- University of Manchester (Collaboration)
- University of California, San Francisco (Collaboration)
- J. Craig Venter Institute (Collaboration)
- University of Padova (Collaboration)
- National Center for Biotechnology Information (NCBI) (Collaboration)
- Georgetown University (Collaboration)
- EMBL European Bioinformatics Institute (EMBL - EBI) (Collaboration)
- University of Southern California (Collaboration)
- Swiss Institute of Bioinformatics (SIB) (Collaboration)
- European Molecular Biology Laboratory (Collaboration)
- Medical Research Council (MRC) (Collaboration)
Publications
Chiang Z
(2015)
The complexity, challenges and benefits of comparing two transporter classification systems in TCDB and Pfam.
in Briefings in bioinformatics
Finn R
(2014)
Pfam: the protein families database
in Nucleic Acids Research
Finn RD
(2017)
InterPro in 2017-beyond protein family and domain annotations.
in Nucleic acids research
Finn RD
(2016)
The Pfam protein families database: towards a more sustainable future.
in Nucleic acids research
Gene Ontology Consortium
(2015)
Gene Ontology Consortium: going forward.
in Nucleic acids research
Gene Ontology Consortium
(2021)
The Gene Ontology resource: enriching a GOld mine.
in Nucleic acids research
Mitchell A
(2015)
The InterPro protein families database: the classification resource after 15 years.
in Nucleic acids research
Mitchell AL
(2019)
InterPro in 2019: improving coverage, classification and access to protein sequence annotations.
in Nucleic acids research
Sangrador-Vegas A
(2016)
GO annotation in InterPro: why stability does not indicate accuracy in a sea of changing annotations.
in Database : the journal of biological databases and curation
The Gene Ontology Consortium
(2019)
The Gene Ontology Resource: 20 years and still GOing strong.
in Nucleic acids research
Description | InterPro has now been expanded to include per residue informations. Within a protein individual amino acid residues perform critical roles in the function of the sequence, for example active site residues are brought together in three-dimensional space to perform a specific chemical reaction. Having these fine grained annotations allows for a deeper understanding of a protein function, over and above understanding the presence of domains on a sequence. In addition, MobiDB-Lite has been added to InterProScan. This resource combines eight different predictors to generate a consensus focusing on long range intrinsically disordered regions, which are important mediators of protein binding and interactions, and medically important, being associated with diseases, such as neurodegeneration and cancer. The provision of intrinsic disorder prediction allows fine grained annotation and facilitates greater understanding of protein sequence data. We have also met the targets for InterPro and Pfam curation, increasing the coverage of both resources. |
Exploitation Route | These developments touch on virtually all areas of modern molecular biology - increasing the breadth and depth of annotations in InterPro and Pfam - examples of sectors where this information is essential include understanding host pathogen interactions, protein engineering, enzyme discovery, and so on. The annotations provided by Pfam and InterPro are widely using in other informatics resources and are central to the automatic annotation procedures performed in UniProtKB. As new genomes are generated, most will be annotated with one or both of these resources. This allows the generation of biological knowledge and hypothesis, allow the transfer of annotation from a few experimentally characterised sequences to many new sequences as they are produced. Having key residue information will allow the design of more efficient enzymes or proteins that can be used as part of molecular machines. |
Sectors | Agriculture Food and Drink Digital/Communication/Information Technologies (including Software) Environment Healthcare Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology |
Description | We now have intrinsic disorder (ID) prediction, as well as a huge number of per site residues, which the scientific community can use to understand the role of ID regions and specific residues within a protein. An analysis of open patent data (https://www.surechembl.org) shows that over 7,000 patents mention Pfam, with 30 patents specifically referring to Pfam database entries. Similarly, over 1,000 patents refer to InterPro. |
Sector | Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology |
Impact Types | Societal Economic |
Description | Biomedical Resources |
Amount | £1,154,000 (GBP) |
Funding ID | 108433/Z/15/Z |
Organisation | Wellcome Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 08/2015 |
End | 08/2020 |
Title | InterPro |
Description | InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites. We combine protein signatures from a number of member databases into a single searchable resource, capitalising on their individual strengths to produce a powerful integrated database and diagnostic tool |
Type Of Material | Database/Collection of data |
Provided To Others? | Yes |
Impact | All of the annotations provided by InterPro underpin the automatic annotation pipeline within the UniProt database. InterPro provides tens of millions of sequences to UniProt through the InterPro2Go pipeline. InterPro is the most widely used web service at EMBL-EBI, performing ~15,000,000 searches per month, from around the world. |
URL | http://www.ebi.ac.uk/interpro/ |
Title | Pfam |
Description | The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function. Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pfam entries which are related by similarity of sequence, structure or profile-HMM. |
Type Of Material | Database/Collection of data |
Provided To Others? | Yes |
Impact | Pfam is widely used within the research community. |
URL | http://pfam.xfam.org |
Description | CATH-Gene3D database |
Organisation | University College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Integration of CATH-Gene3D profile hidden Markov models into InterPro. |
Collaborator Contribution | Provision of CATH-Gene3D profile hidden Markov models to InterPro. |
Impact | Integration of CATH-Gene3D profile hidden Markov models into InterPro, helping provide structural classification of protein sequences. |
Description | Conserved domain database (CDD) |
Organisation | National Center for Biotechnology Information (NCBI) |
Country | United States |
Sector | Public |
PI Contribution | CDD profiles and active site information has been integrated into InterPro. Within CDD, each domain is modelled as a multiple sequence alignment, which is converted into a position-specific scoring matrix (PSSM) that allows fast identification of conserved domains in protein sequences via RPS-BLAST. InterProScan, the RPS-BLAST has been substituted by a piece of software called 'rpsbproc' ensures that InterPro can faithfully reproduce the results from CDD. |
Collaborator Contribution | CDD profiles and active site information has been provided to InterPro |
Impact | Integration of CDDs profiles and residue information helps InterPro provide functional and structural information about protein sequences. It also allows the mark up of important residues within sequences, such as those contributing to active sites or binding sites, enabling the finest grained annotation of sequences. |
Start Year | 2016 |
Description | HAMAP database |
Organisation | Swiss Institute of Bioinformatics (SIB) |
Country | Switzerland |
Sector | Charity/Non Profit |
PI Contribution | Integration of HAMAP profiles into InterPro. |
Collaborator Contribution | Provision of HAMAP profiles to InterPro. |
Impact | HAMAP profiles have been integrated into InterPro, helping provide functional classification of protein sequences. |
Start Year | 2009 |
Description | MobiDB-lite |
Organisation | University of Padova |
Country | Italy |
Sector | Academic/University |
PI Contribution | Implementation of the MobiDB-lite software for intrinsic disorder prediction to the InterProScan software. |
Collaborator Contribution | Provision of the MobiDB-lite software and underlying signatures to provide prediction of intrinsically disordered regions in protein sequences. |
Impact | Intrinsically disordered (ID) protein regions do not fold into defined tertiary structure. They mediate numerous functions, including flexible linkers, linear motifs that mediate interactions, & coupled folding and binding. These regions are highly important from a medical perspective, as they are associated with neurodegeneration and enriched in genes that participate in cell signaling and cancer-associated proteins. However, they display little evolutionary conservation and are therefore hard to model. Implementation of the MobiDB-lite software into InterProScan enables the prediction of ID regions on protein sequences for the first time, allowing fine-grained annotations and improving the coverage of residues by the resource. |
Start Year | 2016 |
Description | PANTHER Database |
Organisation | University of Southern California |
Department | Keck School of Medicine |
Country | United States |
Sector | Academic/University |
PI Contribution | Integration of PANTHER HMMs into InterPro resource. Inclusion of PANTHER software within InterProScan to permit monthly calculation of protein matches to UniProt. This is turn allows the automatic annotation of protein sequences which is an integral component of UniProt. |
Collaborator Contribution | Supply of protein family HMMs and post-processing software for InterPro integration. Provision of reference trees for use in comparison of protein classifications in InterPro between PANTHER, SFLD and TIGRFAM. Supplier of |
Impact | Harmonization of protein family definitions. Use of PANTHER reference trees as a scaffold for comparing classifications from disparate databases. |
Description | PIRSF database |
Organisation | Georgetown University |
Country | United States |
Sector | Academic/University |
PI Contribution | Integration of PIRSF profile hidden Markov models into InterPro |
Collaborator Contribution | Provision of PIRSF profile hidden Markov models to InterPro |
Impact | PIRSF HMMs have been integrated into InterPro, helping provide functional characterisation of protein sequences. |
Description | PRINTS database |
Organisation | University of Manchester |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Integration of PRINTS signatures into the InterPro database |
Collaborator Contribution | Provision of PRINTS signatures to InterPro |
Impact | PRINTS signatures have been integrated into InterPro, providing functional classification of proteins. |
Description | PROSITE database |
Organisation | Swiss Institute of Bioinformatics (SIB) |
Country | Switzerland |
Sector | Charity/Non Profit |
PI Contribution | Integration of PROSITE profiles and patterns to InterPro. |
Collaborator Contribution | Provision of of PROSITE profiles and patterns to InterPro. |
Impact | PROSITE patterns and profiles have been integrated into InterPro and are included in the effort to harmonize protein names and functions between disparate resources. |
Description | Pfam database |
Organisation | EMBL European Bioinformatics Institute (EMBL - EBI) |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Pfam provides a large number of profile hidden Markov models aiming to model protein families and domains. These have been Integrated into InterPro. |
Collaborator Contribution | Provision of Pfam profile hidden Markov models to InterPro |
Impact | Integration of Pfam HMMs into InterPro, helping provide structure and functional classification of proteins. |
Description | SFLD added to InterPro Consortium |
Organisation | University of California, San Francisco |
Department | Department of Bioengineering and Therapeutic Sciences |
Country | United States |
Sector | Academic/University |
PI Contribution | We have helped SFLD move to a more formal database design, and provided them with software tools and advice to enable the systematic transfer of annotations. We have begun to integrate SFLD into InterPro. This process has functioned as a QC on the SFLD data, and we have fed back any issues we identified. We have focused on the integration of the SFLD subset of gold-standard entries for comparison with TIGRFAM and PANTHER. |
Collaborator Contribution | SFLD provide the underlying knowledge and data to InterPro, which take the form of multiple sequence alignments, functional annotations and structured ontologies. SFLD provided a subset list of gold-standard families for use in the comparison between PANTHER and TIGRFAM, towards a harmonization of protein family names and functional annotations in InterPro. |
Impact | The SFLD resource is in the process of being added to InterPro, where it will provide fine grained protein annotations associated with enzymes with chemical reactions. |
Start Year | 2015 |
Description | SMART database |
Organisation | European Molecular Biology Laboratory |
Department | European Molecular Biology Laboratory Heidelberg |
Country | Germany |
Sector | Academic/University |
PI Contribution | Integration of SMART profile hidden Markov models into InterPro |
Collaborator Contribution | Provision of SMART profile hidden Markov models to InterPro |
Impact | SMART HMMs have been integrated into InterPro, helping provide functional classification of protein sequences. |
Description | SUPERFAMILY database |
Organisation | Medical Research Council (MRC) |
Department | MRC Laboratory of Molecular Biology (LMB) |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Integration of SUPERFAMILY profile hidden Markov models into the InterPro database. |
Collaborator Contribution | Provision of SUPERFAMILY profile hidden Markov models to InterPro. |
Impact | Integration of SUPERFAMILY HMMs into InterPro, helping provide structural classification of protein sequences. |
Description | TIGRFAM database |
Organisation | J Craig Venter Institute |
Country | United States |
Sector | Charity/Non Profit |
PI Contribution | Integration of TIGRFAM HMMs to InterPro. Generation of editable DESCfile format files for Genome Properties, and subsequent curation of the DESCfiles. Production of visualisation system for Genome Properties. |
Collaborator Contribution | Provision of TIGRFAM HMMs to InterPro. Provision of Genome Properties flat file data for inclusion in InterPro. |
Impact | TIGRFAM HMMs have been integrated into InterPro and are included in the effort to harmonize protein names and functions between disparate resources. A database of Genome Properties has been established at InterPro. The properties are stored as an editable DESCfile format (generated form the flat file data provided) and are currently being curated for presentation within InterPro. |
Title | InterProScan5 |
Description | Allow the user to compare either a DNA or protein sequence and compare it against the collection of InterPro member databases, assign InterPro annotations and associated GO terms. |
Type Of Technology | Software |
Open Source License? | Yes |
Impact | This software is widely downloaded and users (assessed through citations, distributed annotations and helpdesk interactions). This tools is widely used in other analysis pipelines, such as genomics and metagenomics analysis. This tool is updated with every release (bi-monthly) of InterPro to include both data updates and software updates. These software updates take the form of both scientific developments imposed by changes in member databases post-processing. The others are general software maintenance. |
URL | https://www.ebi.ac.uk/interpro/interproscan.html |
Title | PfamScan |
Description | Analyse protein or DNA sequence against Pfam hmm library, perform post processes to assign Clans (family hierarchy) and allow the identification of active site residues. |
Type Of Technology | Software |
Open Source License? | Yes |
Impact | Used widely to assign Pfam, and to reliably reproduce Pfam results. |
URL | http://ftp://ftp.ebi.ac.uk/pub/databases/Pfam/Tools |
Description | Bioinformatics resources for protein biology - InterPro/HMMER |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Hands-on workshop with open application describing the use of the InterPro and Pfam tools for protein annotation. |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.ebi.ac.uk/training/events/2016/bioinformatics-resources-protein-biology |
Description | Blog post - sweetness |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Blog post describing the proteins used in artificial sweeteners and highlighting how InterPro can be used to classify them and discover more information. |
Year(s) Of Engagement Activity | 2015 |
URL | http://interprodb.blogspot.co.uk/2015/03/the-sweetest-thing.html |
Description | EMBL-EBI workshop at National Veterinary Research Institute, Poland |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | National Veterinary Research Institute, Poland requested a training workshop covering a selected set of EBI resources including InterPro. The delagates were interested in large-scale protein and metageomics analysis and so the topics covered (in the form of presentations and hands-on training exercises) included InterPro as well as Genome Properties. Delegates reported an enthusiasm to utilise the resources covered. |
Year(s) Of Engagement Activity | 2017 |
URL | https://www.ebi.ac.uk/training/events/2017/embl-ebi-resources-and-tools-genomics-and-proteomics |
Description | Exploring biological sequence data |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Presentation and interactive hands-on at 'exploring biological sequence data' workshop, with open participation. |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.ebi.ac.uk/training/events/2016/exploring-biological-sequence-data |
Description | Functional analysis using InterPro |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | Lecture and interactive hands-on at UCL as guest lecturer as part of the Genes and Disease MSc module. Demonstrating how bioinformatics tools such as InterPro can be applied to real world questions and datasets. Lots of questions and debate. |
Year(s) Of Engagement Activity | 2016 |
Description | GO annotation in InterPro |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation at the international society for biocuration meeting, describing the efforts made to keep annotations up to date. |
Year(s) Of Engagement Activity | 2016 |
Description | InterPro and Pfam session in Protein Stucture Analyis at Univesity of Cambridge |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | Presentation and hands-on training session covering InterPro and Pfam as part of a University of Cambridge 2-day Protein Stucture Analyis course. |
Year(s) Of Engagement Activity | 2017 |
URL | https://www.training.cam.ac.uk/event/2052063 |
Description | InterPro session at Bioinformatics Resources for Protein Biology course at EBI |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation and hands-on training session on InterPro and Genome Properties within an EBI organised course covering Bioinformatics Resources for Protein Biology. |
Year(s) Of Engagement Activity | 2018 |
URL | https://www.ebi.ac.uk/training/events/2018/bioinformatics-resources-protein-biology-2 |
Description | InterPro session at Bioinformatics Resources for Protein Biology course at EBI |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation and hands-on training session as past of a 3-day EBI course on Bioinformatics Resources for Protein Biology. |
Year(s) Of Engagement Activity | 2017 |
URL | https://www.ebi.ac.uk/training/events/2017/bioinformatics-resources-protein-biology-0 |
Description | InterPro session at Structural Bioinformatics course at EBI |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation and hands-on training session covering InterPro as part of a week-long EBI course on structural bioinformatics. |
Year(s) Of Engagement Activity | 2017 |
URL | https://www.ebi.ac.uk/training/events/2017/structural-bioinformatics-1 |
Description | InterPro session in Exploring Biological Sequences course at EBI |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Presentation and hands-on training covering InterPro as part of the EBI Exploring Biological Sequences course. |
Year(s) Of Engagement Activity | 2017 |
URL | https://www.ebi.ac.uk/training/events/2017/exploring-biological-sequences |
Description | Introduction to InterPro at University of Cambridge |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | Presentation and hands-on training of a half-day module covering InterPro and Genome Properties as part of the University of Cambridge training provision. |
Year(s) Of Engagement Activity | 2018 |
URL | https://www.training.cam.ac.uk/event/2239008 |
Description | Presentation to EBI Industry Programme quarterly meeting |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Presentation of work on Genome Properties resource. |
Year(s) Of Engagement Activity | 2018 |
Description | Protein focus - what's ape |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Blog post discussing the proteins that are associated with brain size and how they have changed during human evolution, with links to InterPro explaining how information on these proteins can be obtained. |
Year(s) Of Engagement Activity | 2017 |
URL | https://proteinswebteam.github.io/interpro-blog/2017/09/22/What's-ape/ |
Description | Protein focus - zika virus |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Blog post aimed at the general public highlighting the proteins involved with the zika virus and how InterPro can be used to find out more information about them. |
Year(s) Of Engagement Activity | 2016 |
Description | Protein focus article - dont blame the cat |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Blog post highlighting the relationship between cats and toxoplasmosis, and showing how InterPro can be used to find information about the protein pathways involved in this behaviour. |
Year(s) Of Engagement Activity | 2014 |
URL | http://interprodb.blogspot.co.uk/2014/11/protein-focus-dont-blame-cat.html |
Description | Structural Bioinformatics 2016 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Presentation and hands-on at structural bioinformatics workshop, demonstrating how Pfam and InterPro can be used to help classify and annotate proteins. |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.ebi.ac.uk/training/events/2016/structural-bioinformatics-2016 |
Description | Understanding protein families, domains and function using InterPro and Pfam |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | Workshop at University of Cambridge aimed at using InterPro and Pfam to annotate proteins and to solve real world biological data questions. Consisted of lectures and hands-on sessions using the tools with lots of interaction. |
Year(s) Of Engagement Activity | 2016 |
URL | https://www.training.cam.ac.uk/bioinformatics/event/1879212 |