MIDAS - Molecular Interaction Data Availability Standards

Lead Research Organisation: European Bioinformatics Institute
Department Name: Proteomics Services Team

Abstract

An understanding of the molecular interactions a cell makes is critical to understanding the biology of that cell, and the mechanisms by which it reacts to a change in its surrounding environment. Biologists have studied these interactions, mainly protein-protein but now extending to other molecule types, for many years and databases were created to store such information. These databases have been united through the work of the Proteomics Standards Initiative (PSI), which produced common interchange standards that were adopted by all the major players in the field. It was now possible for researchers in the field to combine datasets from disparate resources. Additionally many tools were written to visualize and analyze such data and these tools worked for data from multiple different sources.
The data formats have been stable since 2007 years but experimental methodologies, data types studies and the complexity of the resulting data has moved on. It is now necessary to advance the standards to meet new challenges and to write novel tools, or update successful existing applications, to work with these upgraded formats. The developers will work as part of the PSI, in consultation with data producers, data users and tool developers to ensure the updated formats meet their needs. Once developed, the new formats and accompanying tool suite and visualization resources will be incorporated into two high profile, UK-based resources - the IntAct molecular interaction data abase and the InterMine project which provides an open-source data integraion platform for several important model organisms. A training program will be delivered to ensure the outputs of this grant are understood and used by the Systems Biology community and the work of the IMEx Consortium, responsible for supplying a high-quality non-redundant set of interaction data for network biologists, will also be supported.

Technical Summary

Molecular interaction Information is a key resource in modern biomedical research. The PSI-MI XML2.5 and MITAB2.5 formats were developed by interaction data producers and providers from both the academic and commercial sector to enable the description of interactions between a wider range of molecular types. Both formats have been widely adopted and resulted in a raft of accompanying tools and webservices being developed. However, new use cases have arisen that the format cannot properly accommodate. PSI-MI XML3.0 will be written to capture both more challenging experimental data, such as dynamic data and causal interactions and also knowledge abstracted from such data, for example a description of protein complexes. A Java framework will be designed to parse and write all versions of MITAB and PSI-MI XML and load the objects in a common framework. This framework will ease subsequent tool development both within this grant but also by external groups. A tool suite to enable the handling of large datasets will be produced on the Java interface, improved graphical representation modules will be developed and the PSICQUIC webservice will be upgraded and improved. These tools will be incorporated in two well-used UK resources, the IntAct molecular interaction database and the InterMine data integration platform. Tutorial and training materials will be produced and a series of workshops organized to disseminate information about these new tools and resources. Finally support will be provided for the work of the IMEx Consortium, a group of databases which cooperate to produce a high-quality, non-redundant set of interaction data, the raw materials required for large-scale data analysis and the building blocks of modern Systems Biology.

Planned Impact

1. As already described under 'Academic Beneficiaries, one of the major groups who will profit by this work are large-scale data producers performing network analysis on large datasets. These include pharmaceutical companies and SMEs who map protein networks to disease and looking for drugs which disrupt these networks. These companies will not only benefit from the improved tools and formats and the new API but are interested in the PSI-MI developing the ability to describe causal interactions, as described in WP2. Overlaying pathway data with molecular interaction networks, using the much improved PSICQUIC XML webservice, will additionally enable target identification beyond the currently understood linear pathways
2. Indirect beneficiaries will be any researcher in the fields of basic biology or biomedicine as network biology continues to contribute to our understanding of the processes within a living cell. Improved access to data and tools to utilise this data
3. Funders, such as the Research Councils, will benefit from the increased impact of the projects they support, as the Editorial tool will make it easier to deposit interaction data into the public domain repositories and therefore available for reuse.
4. The IMEx Consortium is actively encouraging the direct deposition of interaction data into databases such as IntAct in the UK, thus ensuring that publicly funded data is not lost, but rather adds to the corpus of information available to the biological community. The accession numbers issued by this consortium enables granting bodies to monitor data sharing compliance of applicants. The UK currently plays a leadership role in this consortium which plays a major role on the global interaction field, and is of critical importance to both our industry and research scientists. The two-way interchange of information with these groups at meetings, forums and workshops is of value to us all.
5. Staff employed will benefit from exposure to numerous international collaborations, through the PSI, new collaborations with both research groups and industry, particularly in relation to the shared development of software and training in software development and implementation.

Publications

10 25 50
publication icon
Bastian FB (2015) The Confidence Information Ontology: a step towards a standard for asserting confidence in annotations. in Database : the journal of biological databases and curation

publication icon
Combe CW (2017) ComplexViewer: visualization of curated macromolecular complexes. in Bioinformatics (Oxford, England)

publication icon
Deutsch EW (2017) Proteomics Standards Initiative: Fifteen Years of Progress and Future Work. in Journal of proteome research

publication icon
Meldal BH (2015) The complex portal--an encyclopaedia of macromolecular complexes. in Nucleic acids research

publication icon
Orchard S (2015) Shared resources, shared costs--leveraging biocuration resources. in Database : the journal of biological databases and curation

publication icon
Orchard S (2017) The MINTAct Archive for Mutations Influencing Molecular Interactions in Genomics and Computational Biology

publication icon
Panni S (2017) The yeast noncoding RNA interaction network. in RNA (New York, N.Y.)

publication icon
Sivade Dumousseau M (2018) JAMI: a Java library for molecular interactions and data interoperability. in BMC bioinformatics

 
Description The MIDAS grant has allowed us to continue the highly successful work of the Molecular Interactions work group of the HUPO Proteomics Standards Initiative (PSI) over three annual meetings and on-going collaborations(PMID:28849660, PMID:28701522, PMID:27270715). We have released level 3.0 of the HUPO PSI-MI standard format for molecular interactions (PMID:29642841), implemented support for this updated format in the "JAMI" software library (PMID:29642846). The new format allowed us to develop a new resource, the ComplexPortal, as a reference resource for biomolecular complexes (PMID:25313161), and develop an attractive visualisation system for it (PMID:29036573). We also published a "Methods" manuscript, providing detailed instructions on how to use this new resource (PMID:29605928).
Exploitation Route The PSI-MI 3.0 format developed with support from this grant is already in use by the community, both by the IntAct database and its 10 international collaborators in the IMEx international exchange consortium, and by collaborators in the EU-funded GREEKC consortium.
Sectors Agriculture, Food and Drink,Education,Pharmaceuticals and Medical Biotechnology

URL https://www.ebi.ac.uk/complexportal/
 
Title Complex Portal 
Description The Complex Portal is a manually curated, encyclopaedic resource of macromolecular complexes from a number of key model organisms, entered into the IntAct molecular interaction database (https://www.ebi.ac.uk/intact/). Data includes protein-only complexes as well as protein-small molecule and protein-nucleic acid complexes. All complexes are derived from physical molecular interaction evidences extracted from the literature and cross-referenced in the entry, or by curator inference from information on homologs in closely related species or by inference from scientific background. All complexes are tagged with Evidence and Conclusion Ontology codes to indicate the type of evidence available for each entry. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact The Complex Portal is a unique reference resource for manually curated biomolecular complexes. 
URL https://www.re3data.org/repository/r3d100013295
 
Title The Complex Portal - beyond binaries or how to tame the spaghetti monster? 
Description The EMBL-EBI Complex Portal (www.ebi.ac.uk/intact/complex) is a central service that provides manually curated information on stable, on macromolecular complexes from model organisms. The database currently holds approximately 2000 complexes with the majority from Saccharomyces cerevisiae, human and mouse. It provides unique identifiers, names and synonyms, list of complex members with their unique identifiers (UniProt, ChEBI, RNAcentral), function, binding and stoichiometry annotations, descriptions of their topology, assembly structure, ligands and associated diseases as well as cross-references to the same complex in other databases (e.g. ChEMBL, GO, PDB, Reactome). Our stable identifiers are used as annotation objects in IntAct and the Protein2GO and as cross-references in ChEMBL, Intermine, MatrixDB and QuickGO. PDBe and Reactome are working towards integrating complex identifiers.Having established the basic data structure and content we are now focusing on providing a better user experience. We have completely redeveloped our website, developing and incorporating many more visualization tools, such as the ComplexViewer, PDBe's LiteMol Viewer, Reactome's DiagramJS, the Atlas widget of expression data and the MI-Circle viewer, a bespoke Chord diagram developed to give an alternative representation of complex topology, binding regions, mutations and links to InterPro domains. Future plans include building a tool that can a) explore evolutionary relationships between complexes across the database and b) infer quaternary structure of complexes for which no structure exists, using the Periodic Table of Complexes developed by the Teichmann group.This is a collaborative project, which has already been contributed to by groups such as UniProtKB, Saccharomyces Genome Database, the UCL Gene Annotation Team and MINT database. We welcome groups who are willing to contribute their expertise and will make editorial access and training available to you. Individual complexes will also be added to the dataset, on request. Contact us on intact-help@ebi.ac.uk for further information. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact The Complex Portal is a unique reference resource for manually curated biomolecular complexes. 
URL https://f1000research.com/slides/6-336
 
Description Bioinformatics resources for Protein Biology 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact ~25 participants attended this course which introduces participants to the data resources and tools developed by EMBL-EBI to help with protein studies. The participants leave with a greater understanding of the access to and range of protein resources, how resources can be used to retrieve relevant protein information and how that can be applied to their research. This results in an increased usage and awareness within the scientific community of the groups resources in particular the use of IntAct, Complex Portal and Reactome.
Year(s) Of Engagement Activity 2020
URL https://www.ebi.ac.uk/training/events/bioinformatics-resources-protein-biology/
 
Description Career Q&A 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact This career Q&A with year 10 students was carried out virtually for the local collage and it is hoped that it would encourage more student to think about entering not only science but all the field of bioinformatics.
Year(s) Of Engagement Activity 2020