📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

2021BBSRC-NSF/BIO UniPlex - Genome-Wide Protein Complex Prediction and Validation

Lead Research Organisation: European Bioinformatics Institute
Department Name: Molecular Networks

Abstract

Proteins are essential components that both build cellular structures and work as the tools that make the cell function. However, proteins do not operate in isolation and often form molecular machines in which several proteins bind together and with other biomolecules to act as a single entity called a molecular complex. This provides tremendous versatility and regulatory capacities, since by changing a single component of the complex, its function can be dramatically altered. Protein complexes often also form more stable structures than isolated proteins, and their formation creates new active sites as protein chains from different molecules assemble in close proximity. It is therefore of crucial importance to know the composition of complexes and study them as discrete functional entities in order to truly understand how cellular processes work. The Complex Portal (www.ebi.ac.uk/complexportal) is an encyclopaedic database that collates and summarizes information on stable, macromolecular complexes of known function from the scientific literature through manual curation. Complex Portal (CP) curators have now completed a first draft of all the stable molecular complexes from baker's yeast (Saccharomyces cerevisiae) and the gut bacteria Escherichia coli, both model organisms widely used for the study of basic biological processes. The next big goal for the project is the complete annotation of the all human complexes (the human complexome). The CP has had multiple requests from the research community to significantly speed up the annotation of human data, but manual curation is laborious, and can only partially meet demand.

There are multiple types of data available in the literature that can indicate that different proteins form part of the same complex: co-immunoprecipitation studies, where proteins that bind together are purified out via a selected protein bait; proximity data sets, which tag proteins which are very close together in a cell using a bacterial enzyme, or co-fractionation experiments, where cells are broken apart and proteins that co-purify together are identified. There are public databases that compile data about how individual proteins bind each other (IntAct); the processes in which such proteins take part, called pathways (Reactome); or capture the 3D structure of two or more proteins bound together (wwPDB). We propose to extend the scope and relevance of the Complex Portal by using machine learning algorithms that can identify groups of proteins that are most likely to represent functional complexes which exist in the cell from large datasets generated using the techniques described above. These predictions of complexes will be validated against other experimental data and, where possible, also against literature evidence. We will also use large scale studies of protein expression in different cell types, tissues, and conditions to validate the predicted complexes and to differentiate between variants of complexes formed in different conditions.
Complexes predicted to exist at high confidence will be made available through the Complex Portal website, properly identified as computationally inferred data, where they will both guide the work of Complex Portal curators and dramatically increase the amount of complexes available for researchers as reference entities. We will add further information from other resources such as Reactome and PDB to these entries and map changes to amino acids which are known to affect protein interaction strength and stability to complex binding interfaces from the IntAct database. This work will help accelerate our understanding of complexes as the molecular machines essential to biological processes and support basic and applied research.

Technical Summary

The Complex Portal (CP) is a manually curated reference resource of molecular complexes. Identification and annotation of all molecular complexes is the CP's biggest challenge, especially for the much-demanded human complexome. We propose to rapidly increase the coverage of the CP through computational inference of high confidence complexes, based on large-scale experimental and computational data. We will extend hu.MAP, the most comprehensive complex map available, by adding thousands of newly published large-scale mass spectrometry experiments. Further, we will improve upon the machine learning framework using an automated model selection algorithm selecting among deep learning as well as classical models to best discriminate between true and false protein interactions. Protein complexes will be identified by clustering of the highest-scoring pairwise interactions, then validated and refined by protein (co-)expression analysis. This will distinguish between core and conditional subunits and map tissue-specific expression and subunit composition, providing information-rich annotations for each individual complex. We will infer high confidence complexes for species spanning three kingdoms of life: S. cerevisiae, H. sapiens, and A. thaliana. The resulting set of high confidence inferred complexes will be enriched with structural and functional data from IntAct, wwPDB, and Reactome, including amino acid mutations known to disrupt protein interactions mapped to complex binding interfaces. The entire prediction pipeline will be developed as a highly automated, adaptable and repeatable workflow which will ensure a continuously updated and expanded set of inferred complexes that can rapidly evolve with additional data becoming available. Presentation and impact of the CP will be improved through website updates and a comprehensive outreach and training program providing a powerful tool for biological discovery for the research community.
 
Description The Complex Portal is an encyclopaedic resource of macromolecular complexes from a number of key model organisms. In addition to the expert manually curated complexes, the portal now holds high-confidence machine-learning predicted human complexes from hu.MAP3.0 and MuSIC. All data is freely available for search and download. As of 01/2025, the portal holds ca. 5,000 manually curated and ca. 15,000 computationally predicted molecular complexes. An innovative visualisation tool, the Complex Navigator, allows user-friendly comparison of related complexes, as well as grouping of complexes by orthology.
Exploitation Route The Complex Portal is a stable reference resource for molecular complexes, providing unique complex identifiers, allowing other resources like pathway databases to refer to an external resource for molecular complexes, instead of just representing them as "bags of protein identifiers". The high quality manually curated subset of the Complex Portal can also be used as a training and/or validation dataset for machine learning approaches to complex prediction. This has already been implemented for the hu.MAP 3.0 dataset [ https://doi.org/10.1101/2024.10.11.617930 ].
Sectors Agriculture

Food and Drink

Pharmaceuticals and Medical Biotechnology

URL https://www.ebi.ac.uk/complexportal/
 
Title Complex Portal 
Description The Complex Portal is a manually curated, encyclopaedic resource of macromolecular complexes from a number of key model organisms, entered into the IntAct molecular interaction database (https://www.ebi.ac.uk/intact/). Data includes protein-only complexes as well as protein-small molecule and protein-nucleic acid complexes. All complexes are derived from physical molecular interaction evidences extracted from the literature and cross-referenced in the entry, or by curator inference from information on homologs in closely related species or by inference from scientific background. All complexes are tagged with Evidence and Conclusion Ontology codes to indicate the type of evidence available for each entry. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact The Complex Portal is a unique reference resource for manually curated biomolecular complexes. 
URL https://www.re3data.org/repository/r3d100013295
 
Title The Complex Portal - beyond binaries or how to tame the spaghetti monster? 
Description The EMBL-EBI Complex Portal (www.ebi.ac.uk/intact/complex) is a central service that provides manually curated information on stable, on macromolecular complexes from model organisms. The database currently holds approximately 2000 complexes with the majority from Saccharomyces cerevisiae, human and mouse. It provides unique identifiers, names and synonyms, list of complex members with their unique identifiers (UniProt, ChEBI, RNAcentral), function, binding and stoichiometry annotations, descriptions of their topology, assembly structure, ligands and associated diseases as well as cross-references to the same complex in other databases (e.g. ChEMBL, GO, PDB, Reactome). Our stable identifiers are used as annotation objects in IntAct and the Protein2GO and as cross-references in ChEMBL, Intermine, MatrixDB and QuickGO. PDBe and Reactome are working towards integrating complex identifiers.Having established the basic data structure and content we are now focusing on providing a better user experience. We have completely redeveloped our website, developing and incorporating many more visualization tools, such as the ComplexViewer, PDBe's LiteMol Viewer, Reactome's DiagramJS, the Atlas widget of expression data and the MI-Circle viewer, a bespoke Chord diagram developed to give an alternative representation of complex topology, binding regions, mutations and links to InterPro domains. Future plans include building a tool that can a) explore evolutionary relationships between complexes across the database and b) infer quaternary structure of complexes for which no structure exists, using the Periodic Table of Complexes developed by the Teichmann group.This is a collaborative project, which has already been contributed to by groups such as UniProtKB, Saccharomyces Genome Database, the UCL Gene Annotation Team and MINT database. We welcome groups who are willing to contribute their expertise and will make editorial access and training available to you. Individual complexes will also be added to the dataset, on request. Contact us on intact-help@ebi.ac.uk for further information. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact The Complex Portal is a unique reference resource for biomolecular complexes. As of 01/2025, it covers ca. 5,000 manually curated and 15,000 computationally predicted complexes. 
URL https://f1000research.com/slides/6-336
 
Description From interactions to quantitative models: FAIR resources for systems biology 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact Talk given at the Eötvös Loránd University, Budapest on FAIR resources for systems biology.
Year(s) Of Engagement Activity 2024
 
Description HUPO 2024: Complex Portal: a resource for functionally annotated macromolecular complexes 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Poster on Complex Portal presented at the 23rd Human Proteome Organization World Congress, October 20-24, Dresden (Germany)
Year(s) Of Engagement Activity 2024
URL https://2024.hupo.org/
 
Description HUPO 2024: Dynamic organisation of the Human protein-protein interactions 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Poster on the Dynamic organisation of the Human protein-protein interactions presented at the 23rd Human Proteome Organization World Congress, October 20-24, Dresden (Germany).
Year(s) Of Engagement Activity 2024
URL https://2024.hupo.org/
 
Description HUpo 2024: Molecular Complex Navigator: Comparative Visualization of Biomolecular Complexes 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Poster on the Molecular Complex Navigator and Comparative Visualization of Biomolecular Complexes was presented at the 23rd Human Proteome Organization World Congress, October 20-24, Dresden (Germany)
Year(s) Of Engagement Activity 2024
URL https://2024.hupo.org/
 
Description IntAct Demo with Olaitan AWE 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Training workshop IntAct Demo with Olaitan AWE during the African Omics Workshop 2024 at the African Society for Bioinformatics and Computational Biology, Cape Town, 2024.
Year(s) Of Engagement Activity 2024
 
Description IntAct: Protein-protein interactions database 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact Training on IntAct and protein-protein interactions database was provided at the University of Andes for undergraduate and postgraduate students in October 2024.
Year(s) Of Engagement Activity 2024
 
Description Molecular interactions in the context of Rare diseases: Annotation rich dataset from the IMEx consortium. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Talk by Senior Scientific Database Curator, Kalpana Panneerselvam, on "Molecular interactions in the context of Rare diseases: Annotation rich dataset from the IMEx consortium" at the 17th Annual International Biocuration Conference, in India.
Year(s) Of Engagement Activity 2024
URL https://ibdc.rcb.res.in/biocuration2024/
 
Description Network Analysis with Cytoscape 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact An introduction to the basic theory and concepts of network analysis. Attendees learned how to construct protein-protein interaction networks and subsequently use these to overlay large-scale data such as that obtained through RNA-Seq or mass-spec proteomics. The course focused on giving attendees hands-on experience in the use of one of the most commonly used open source Network Visualisation Platforms, Cytoscape.
Year(s) Of Engagement Activity 2024
URL https://sites.google.com/cam.ac.uk/lpsjrtfh68dh3kcvfbzgfes4/home
 
Description Network Biology theory and practicals with IntAct Cytroscape app 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Undergraduate students
Results and Impact Training provided to 30 undergraduate students at the University of Cambridge during the U. of Cambridge Genomic Medicine Advanced Bioinformatics module GMO4.
Year(s) Of Engagement Activity 2024
 
Description Network and IntAct basics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Undergraduate students
Results and Impact Training session on Network and IntAct basics at the framework of the Mathematics of Life course hosted at EBI. This course provided participants with an introduction and hands-on training on modelling approaches, tools, and resources used in systems biology as well as touch on network analysis.
Year(s) Of Engagement Activity 2024
URL https://www.ebi.ac.uk/training/events/mathematics-life-modelling-molecular-mechanisms/
 
Description Outreach Activity: Project development: Validating predicted molecular complexes through Xlinking MS 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact Talk on Validating predicted molecular complexes through Xlinking MS given at the National Synthesis Center for Emergence in the Molecular and Cellular Sciences during the first annual summit meeting in October 2024 at the University of Chicago.
Year(s) Of Engagement Activity 2024
URL https://bpb-us-e1.wpmucdn.com/sites.psu.edu/dist/2/180585/files/2024/10/NCEMS-Summit-2024-program_v1...
 
Description Poster on Molecular Complex Navigator 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Poster on Molecular Complex Navigator at the 14th international meeting on 'Visualizing Biological Data' (VIZBI 2024) at the University of Southern California
Year(s) Of Engagement Activity 2024
URL https://calendar.usc.edu/event/14th_international_meeting_on_visualizing_biological_data_vizbi_2024
 
Description Poster: Context-specific protein-protein interaction networks, IntAct database - Host ontology mapping 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Poster presented at the 10th year Open Targets anniversary event held at EBI in October 2024.
Year(s) Of Engagement Activity 2024
 
Description Proteomics Bioinformatics Course 2024: IntAct and IMEx 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Undergraduate students
Results and Impact This course provided hands-on training in the basics of mass spectrometry (MS) and proteomics bioinformatics. 30 undergraduate and postgraduate students received training on how to use search engines and post-processing software, quantitative approaches, MS data repositories, the use of public databases for protein analysis, annotation of subsequent protein lists, and incorporation of information from molecular interaction and pathway databases.
Year(s) Of Engagement Activity 2024
URL https://www.ebi.ac.uk/training/events/proteomics-bioinformatics-1/
 
Description Proteomics Bioinformatics Course 2024: Network Analysis with Cytoscape 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Undergraduate students
Results and Impact Training course which provided hands-on training in the basics of mass spectrometry (MS) and proteomics bioinformatics. 30 undergraduate and postgraduate students were provided training on how to use search engines and post-processing software, quantitative approaches, MS data repositories, the use of public databases for protein analysis, annotation of subsequent protein lists, and incorporation of information from molecular interaction and pathway databases.
Year(s) Of Engagement Activity 2024
URL https://www.ebi.ac.uk/training/events/proteomics-bioinformatics-1/
 
Description Seminar: IntAct: Curation strategy and data visualisation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Other audiences
Results and Impact Seminar as part of the GAA Seminar series that took place in November 2024.
Year(s) Of Engagement Activity 2024
 
Description Suffolk Family Carers Young Carers visit 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Suffolk Family Carers and Young Carers visit at EBI.
Year(s) Of Engagement Activity 2024