SysMO-DB: Supporting Data Access and Integration

Lead Research Organisation: University of Manchester
Department Name: Computer Science

Abstract

SysMO is a European trans-national funding and research initiative on 'Systems Biology of Microorganisms'. The goal pursued by SysMO is to record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way and to present these processes in the form of computerized mathematical models. The aim is to pool research capacities and know-how from eleven projects. To facilitate this process, the Data Management Group (DMG) has been created to support data access and integration. Each of the individual projects in SysMO are working towards different research outcomes and represent a cross-section of microorganisms, including bacteria, archaea and yeast. The environmental conditions for each organism also vary widely with organisms growing in culture, soil, water and animal hosts. As a consequence of this diversity, there is no one model for experimentation or for the types of data collected and the types of models produced. In order to pool the research outcomes for SysMO, our job is to support and manage this diversity and promote a shared understanding across the community by using the same technologies. The underlying premises are (i) the systematic management of data, models and processes; (ii) the gathering of minimal information to support reproducible science, and (iii) shared best practice. The main objectives of a data management solution for SysMO are thus to: facilitate the web-based exchange of data between research groups within- and inter- consortia, and to provide an integrated platform for the dissemination of the results of the SysMO projects to the scientific community. The aim of this proposal is to present a progressive and scalable solution to the data management needs of the SysMO initiative, that: facilitates and maximises the potential for data exchange between SysMO research groups, maximises the 'shelf life' and utility of data generated by SysMO, provides an integrated platform for the dissemination of the results of the SysMO projects to the scientific community, and facilitates standardization of practices in Systems Biology for the interfacing of modelling and experimentation. We will adopt a progressive solution to the data exchange, comparison and dissemination needs of the SysMO initiative combining elements to support data storage and annotation, model storage and annotation, and the definition of (common) processes using workflows combining access to SysMO and external resources and for building models. We will provide unified access through a portal - the SysMO-HUB. We will create a catalogue of data assets - SysMO-SEEK; of models using JWS Online; and of workflows using myExperiment, and integrated search across all these through the HUB. We will help the partners develop their data management systemsand adopt community solution;, annotate models and adopt community standards and develop and share workflows using a de facto community system, Taverna. Metadata for discovery and exchange will be identified and developed on an on-demand basis sufficient without being overbearing. We will create an environment where the SysMO groups will be able to exchange their results as and when they want, disseminate their results outside the consortium and effectively use the resources of the wider Systems Biology community.

Technical Summary

We will adopt a progressive solution, with regular staged delivery of capability, to the data exchange, comparison and dissemination needs of the SysMO initiative. The solution combines elements to support data storage and annotation, model storage and annotation, and the definition of (common) processes using workflows. There will be an emphasis on training, dissemination of best practice and 'help yourself' throughout the consortium in order to ensure sustainability and scalability. We will provide unified access through a secure portal - the SysMO-HUB - implemented using the Liferay Portal Framework, with fully customised access control. We will create a new, annotated, catalogue of data assets - SysMO-SEEK; of models using the JWS Online repository and simulator; and of workflows using the myExperiment workflow repository and social network system. Integrated search across all three repositories will be possible from the portal. Plugins through portlets will enable query and updates to datasets, model simulation and workflow execution. We will help the partners to: develop their data management systems and adopt where relevant community solutions such as SABIO-RK; accurately annotate models and adopt community standards such as SBML and MIRIAM; and develop and share workflows for the automated build of SBML models and the integration of resources within and without SysMO, using a de facto bio-community system, Taverna. Metadata for discovery and exchange of data, models and workflows will be identified and developed on an on-demand basis sufficient without being overbearing. We will create an environment where the SysMO groups will be able to exchange their results as and when they want to whom they want, disseminate their results outside the consortium and effectively reuse the resources of the wider Systems Biology community.

Publications

10 25 50
publication icon
Sansone SA (2012) Toward interoperable bioscience data. in Nature genetics

publication icon
Wolstencroft K (2011) The SEEK: a platform for sharing data and models in systems biology. in Methods in enzymology

publication icon
Bechhofer S (2013) Why linked data is not enough for scientists in Future Generation Computer Systems

publication icon
Wolstencroft K (2011) RightField: embedding ontology annotation in spreadsheets. in Bioinformatics (Oxford, England)

 
Description The SysMO-DB project (http://www.sysmo-db.org) established a data, model and SOP management platform to support the long-term retention, exchange, sharing and publishing of the outcomes of the multi-partner projects of the first round of the ERANet SysMO (Systems Biology for MicroOrganisms, http://sysmo.net/) programme.

In partnership with HITS, Heidelberg, the project:
- developed the SEEK platform for the lightweight management of data, models and SOPs in Systems Biology (http://www.seek4science.org). This has functionalities for project management (yellow pages), organising outcomes using the ISA framework, catalogues ad repositories for digital assets arising from Sys Bio research, seamless integration with Sys Bio simulators and gateways to third party data archives. It is standards driven.

- established the SysMO-SEEK resource for the projects to register, interlink, retain and share their results (http://https://seek.sysmo-db.org/)
This has now been subsumed into the FAIRDOMHub Commons (http://www.fair-dom.org). 50% of the projects still actively use the platform many years after the programme ended.

- developed a methodology for lightweight data gathering using Just Enough Results Model (JERM)

- developed tools (RightField, http://www.rightfield.org.uk) for making JERM templates and spreadsheet tools to help biologists with data curation

- developed the JWS Online Sys Bio model database and simulation system.

- ran summer schools, workshops, tutorials and training programmes

- founded and operated a PALs knowledge network of young researchers in the SysMO projects who became skilled data managers, champions and scouts for the platform.

- assisted projects to curate their data and models, with numerous site visits

- participated in Sys Bio Standards activities, notably the COMBINE and ISA standards.

- worked with funding councils to set data sharing policy and data management strategy.

Wruck et al. Data management strategies for multinational large-scale systems biology projects. Briefings in Bioinformatics 2012 stated that Out of the box it provides the most useful features for large scale biology projects.
Exploitation Route The SEEK platform is open source and freely available, as the Rightfield metadata collection tool suite. See separate entries for up to date details

The SEEK was adopted by all but 1 of the SysMO projects and has gone on to be widely adopted in other programmes, notably the Virtual Liver Network, ERANet ERASysBio+ projects, and the ERANet ERASysAPP projects.

The platform was adopted by 10+ independent projects (e.g. Unicellsys, JenAge, RosAge, SBCancer, Sybacol) and local group instances have been established at the VU Amsterdam for Yeast Glycolysis, Systems Science for Health in Birmingham and Magdeburg Centre for Systems Biology. Commercially, SEEK was a component of Eagle Genomics Ltd's ElasticAP platform.
Since the SysMO Programme the platform has been further adopted (see the DMMCore Entry). currently 96 projects use the FAIRDOMHub; 30+ projects/groups use their own instances of the platform.

Two of the BBSRC SynBio Research Centres (SynthSys, SYNBIOCHEM) use the SEEK as their data/model/SOP Platform.

The SysMO-SEEK resource continues to be used by over half the SysMO projects, even after the funds, the programmes and the projects ended. Many groups are including the platform in their new grant proposals.

The JWS Online Sys Bio model database and simulation system continues to offer data and simulation facilities.

The PALs knowledge network and the workshop/tutorial programme have been adopted by other projects, notably FAIRDOM (See DMMCore Entry), and the German Virtual Liver Network.

Our work in COMBINE and ISA standards is adopted by the wider community for data and model management.

Work on the SysMO-DB project directly lead to participation in the ESFRI Research Infrastructure ISBE - Infrastructure for Systems Biology Europe, and we have lead the Data and model management work package setting out Europe's plans for this area.

The SysMO-DB project directly lead to the DMMCore award, renamed the FAIRDOM project (http://www.fair-dom.org), a consortium of 4 EU funding councils to: Establish a sustainable European Infrastructure to extend the network services to the wider European systems biology community; Develop the necessary toolset and set up a data and model management platform for systems biology project, building on SEEK and openBIS (SystemsX); and document and disseminate the outcomes and activities to funding agencies, projects and centres with the goal of establishing a sustainable business model for this infrastructure. See DMMcore entry for up to date details.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Education,Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://www.sysmo-db.org/
 
Description The platform was adopted by the all bar 1 of the SysMO I and II projects and has gone on to be widely adopted in other programmes, notably the Virtual Liver Network, ERANet ERASysBio+ projects, and the forthcoming ERANet ERASysAPP. The SysMO-SEEK resource continues to be used by over half the SysMO projects, even after the funds, the programmes and the projects ended. Many groups are including the platform in their new grant proposals. We have successfully retained the outcomes of the SysMO programme. Wruck et al. Data management strategies for multinational large-scale systems biology projects. Briefings in Bioinformatics 2012 stated that Out of the box it provides the most useful features for large scale biology projects. The platform had been adopted by 10+ independent projects (e.g. Unicellsys, JenAge, RosAge, SBCancer, Sybacol projects) and local group instances have been established at the VU Amsterdam for Yeast Glycolysis, Systems Science for Health in Birmingham and Magdeburg Centre for Systems Biology. Commercially, SEEK is a component of Eagle Genomics Ltd's eaglecore platform. Since the establishment of the FAIRDOM project (partially funded by the DMMCore BBSRC award) 58 projects now use the FAIRDOMHub (a public Commons which subsumes the SysMO and ERASysAPP SEEKs) and 30 projects use the software platform in their own instances. The SysMO-DB project has also directly lead to the DMMCore award that created the FAIRDOM project, a consortium of 4 EU funding councils to: Establish a sustainable European Infrastructure to extend the network services to the wider European systems biology community; Develop the necessary toolset and set up a data and model management platform for systems biology project, building on SEEK and openBIS (SystemsX); and document and disseminate the outcomes and activities to funding agencies, projects and centres with the goal of establishing a sustainable business model for this infrastructure. See the DDMCore entry. We have since established the FAIRDOM Association e.V. to promote and sustain the products and services first developed in SysMO-DB. The JWS Online Sys Bio model database and simulation system continues to offer data and simulation facilities. The PALs knowledge network and the workshop/tutorial programme have been adopted by other projects Our work on COMBINE (MIRIAM, SED-ML) and ISA standards is adopted by the wider community for data and model management. Work on the SysMO-DB project directly lead to participation in the ESFRI Research Infrastructure ISBE - Infrastructure for Systems Biology Europe, and we have lead the Data and model management work package setting out Europe's plans for this area. The SEEK platform is core to ISBE's Data Stewardship pillar and the FAIRDOM follow-on project funded by DMMCore
First Year Of Impact 2009
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Pharmaceuticals and Medical Biotechnology
Impact Types Societal,Economic

 
Description BB/M013189/1 DMMCore: Data and Model Management Core for ERASysAPP & Europe
Amount £1,015,804 (GBP)
Funding ID BB/M013189/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 11/2014 
End 10/2019
 
Description BBSRC BB/H024921/1, Omics Data Sharing: the investigation/Study/Assay Infrastructur
Amount £15,669 (GBP)
Funding ID BB/H024921/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 10/2010 
End 09/2013
 
Description BBSRC BB/I004637/1 SysMO-DB2 Supporting Data Access and Integration
Amount £950,271 (GBP)
Funding ID BB/I004637/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 11/2010 
End 10/2014
 
Description EU FP7 ESFRI ISBE Infrastructure for Systems Biology Europe
Amount € 350,000 (EUR)
Funding ID 312455 
Organisation European Commission 
Department Seventh Framework Programme (FP7)
Sector Public
Country European Union (EU)
Start 08/2012 
End 09/2015
 
Description PREP-IBISBA Industrial Biotechnology Innovation and Synthetic Biology Accelerator Preparatory Phase
Amount € 3,995,065 (EUR)
Funding ID 871118 
Organisation European Commission H2020 
Sector Public
Country Belgium
Start 01/2020 
End 12/2023
 
Description Heidelberg Institute for Theoretical Studies 
Organisation Heidelberg Institute for Theoretical Studies
Country Germany 
Sector Charity/Non Profit 
PI Contribution HITS are our partners in the SysMO-DB and SysMO-DB2 projects. We were both funded under the ERANet SysMO; them by BMBF, us by BBSRC. Partners in the DMMCore project - now called FAIRDOM (http://www.fair-dom.org) Within ERANET ERASysAPP, the data and model management efforts started during ERANets SysMO1 and ERASysBIO and further developed during SysMO2, were applied in the research projects. Additionally, a combined EU RI ISBE / ERASysAPP Data and Model Management Project was funded by BMBF, SystemsX and BBSRC. This project FAIRDOM; (www.fair-dom.org) was funded to support the ERASysAPP research groups and to establish a European one stop infrastructure which bundles data and model management expertise and offers support in this field as well as to train future data managers and coordinate further tool developments in data management systems.
Collaborator Contribution See the contributions arising from the SysMO-DB1, SysMO-DB2 and DMMCore BBSRC awards. HITS are our co-development partners of the SEEK4Science Data and Model Management platform, associated software and curation, and co-partners in the delivery of the FAIRDOM data and model stewardship programme. HITS co-founded the FAIRDOM Association e.V, the not for profit set up to run FAIRDOM's products and services
Impact See the outputs and outcomes arising from the SysMO-DB1 and SysMO-DB2 BBSRC awards. See outputs and outcomes from BB/M013189/1 DMMCore: Data and Model Management Core for ERASysAPP and Europe
Start Year 2008
 
Description U Stellenbosch, SA 
Organisation University of Stellenbosch
Country South Africa 
Sector Academic/University 
PI Contribution Long term partners in the SysMO-DB1, SysMO-DB2 and DMMCore BBSRC funded projects. The programme is now called FAIRDOM. Within ERANET ERASysAPP, the data and model management efforts started during ERANets SysMO1 and ERASysBIO and further developed during SysMO2, were applied in the research projects. Additionally, a combined EU RI ISBE / ERASysAPP Data and Model Management Project was funded by BMBF, SystemsX and BBSRC. This project "FAIRDOM" (www.fair-dom.org) was funded to support the ERASysAPP research groups and to establish a European one stop infrastructure which bundles data and model management expertise and offers support in this field as well as to train future data managers and coordinate further tool developments in data management systems.
Collaborator Contribution Stellenbosch provide the model curation and the JWS Online model simulation platform fully integrated into the FAIRDOM Software Platform SEEK4Science (http://www.seek4science.org), and the FAIRDOM Web-based Community Commons FAIRDOMHub (http://www.fairdomhub.org)
Impact Numerous. See BBSRC SysMO-DB1, SysMO-DB2 and DMMCore Grant awards
Start Year 2008
 
Title Just Enough Results Model 
Description The JERM describes the relationships between data, models, SOPs, samples, specimens and publications for Systems Biology. It is used as part of the SEEK4Science Platform which in turn is the basis of the metadata for the FAIRDOM platform which in turn is the basis of the FAIRDOMHub Commons for Systems Biology projects and for over 30 other projects using the FAIRDOM platform in their own installations. The JERM It adheres to the ISA de facto standard for organising investigations, studies and assays. It was started in the SysMO-DB1 award, developed further in the SysMO-DB2 award and continues to be developed in the DMMCore award. JERM 2.0 was released in Sept 2017. 
Type Of Technology Software 
Year Produced 2008 
Open Source License? Yes  
Impact It is used as part of the SEEK4Science Platform metadata infrastructure and is feeding into the European Open Science Cloud Pilot (EOSCpilot) Data Catalogue interoperability EDMI minimum information model. 
URL http://jermontology.org/
 
Title RightField 
Description RightField is an open-source tool for adding ontology term selection to Excel spreadsheets. RightField is used by a 'Template Creator' to create semantically aware Excel spreadsheet templates. The Excel templates are then reused by Scientists to collect and annotate their data; without any need to understand, or even be aware of, RightField or the ontologies used. RightField was started in the SysMO-DB1 award, developed further in the SysMO-DB2 award and continues to be developed in the DMMCore award 
Type Of Technology Software 
Year Produced 2010 
Open Source License? Yes  
Impact Rightfield is used as part of the Extract-Transform-Load pipeline for the FAIRDOM SEEK platform used by the SysMO-DB BBSRC projects. It has been downloaded over 620 times. The RightField platform has been adopted in archeaology, cultural studies and environmental sciences. It forms the basis of the Populous Ontology development tool, developed by Manchester and the EBI. 
URL http://www.rightfield.org.uk
 
Title SEEK4Science 
Description The SEEK platform is a web-based resource for sharing heterogeneous scientific research datasets,models or simulations, processes and research outcomes. It preserves associations between them, along with information about the people and organisations involved. Underpinning SEEK is the ISA infrastructure, a standard format for describing how individual experiments are aggregated into wider studies and investigations. Within SEEK, ISA has been extended and is configurable to allow the structure to be used outside of Biology. SEEK is incorporating semantic technology allowing sophisticated queries over the data, yet without getting in the way of your users. 
Type Of Technology Software 
Year Produced 2009 
Open Source License? Yes  
Impact The SEEK4Science platform was adopted by the all of the ERANet SysMO I and II projects it was designed for and has gone on to be widely adopted in other programmes, notably the German Virtual Liver Network and its follow-on Liver Systems Medicine project, ERANet's ERASysBio+ projects, and ERASysAPP. The Platform is now developed under the FAIRDOM Initiative (funded by the DMMCore project partners, including the BBSRC) http://www.fair-dom.org, were it has been rebadged as FAIRDOM-SEEK The software platform has been independently adopted by 50+ groups in Europe, Russia, South Africa, USA, and the UK. EU Projects that adopt the platform include: EmPowerPutida and Mycosynvac; national projects include the German Systems Medicine for Liver project and the de.NBI German Bioinformatics Network, and the Norway's Digital Life programme. The platform has been adopted by the Environmental Molecular Sciences Laboratory a large national scientific user facility at Pacific Northwest National Laboratory (PNNL) in Washington State, USA. Combined with the openBIS system, it is the platform for two UK Synthetic Biology Centre's data management (SynBioChem and SynthSys). 90+ projects are currently registered on FAIRDOMHub.org Commons, a centralised public community instance of the SEEK4Science Platform. Work on the SysMO-DB project and the SEEK directly lead to participation in the ESFRI Research Infrastructure ISBE Light - Infrastructure for Systems Biology Europe, and we have lead the Data and model management work package setting out Europe's plans for this area. The SEEK4Science software and associated software, FAIRDOMHub Commons and its support services form a core pillar of the ISBE Light Interim phase. Commercially, SEEK was the prototype component of Eagle Genomics Ltd's eaglecore platform, adopted by GeneXplain and is currently being reviewed by 3 commercial organisations. Practical evaluation of SEEK and openBIS for biological data management in SynthSys; first report (https://www.era.lib.ed.ac.uk/handle/1842/12236) recommended the platform. The SysMO-DB project also directly lead to the DMMCore award (renamed the FAIRDOM project), a consortium of 4 EU funding councils to: Establish a sustainable European Infrastructure to extend the network services to the wider European systems biology community; Develop the necessary toolset and set up a data and model management platform for systems biology project, building on SEEK and openBIS (SystemsX); and document and disseminate the outcomes and activities to funding agencies, projects and centres with the goal of establishing a sustainable business model for this infrastructure. FAIRDOM is funded by the UK through the BBSRC BB/M013189/1 DMMCore: Data and Model Management Core for ERASysAPP and Europe project. Wruck et al. Data management strategies for multinational large-scale systems biology projects. Briefings in Bioinformatics 2012 stated that Out of the box it provides the most useful features for large scale biology projects. 
URL http://www.seek4science.org