Building a global metagenomics portal ('MGportal') to handle next-generation sequencing data and associated metadata
Lead Research Organisation:
University of Oxford
Department Name: Oxford e-Research Centre
Abstract
While genomes represent the full genetic (DNA) complement of a single organism, metagenomes represent the DNA of an entire community of organisms. These organisms might be free-living in the environment, or be found on the skin or in the gut of a human being or other species. Microbial organisms play a major role in our everyday health and well-being, which is not surprising when you consider that the number of microbial cells in or on an average human body actually exceeds the number of human cells! Microbes play a similarly important role in the environment; different types of organisms live under different conditions (including extreme habitats, such as the run-off from acid mines or the depths of the oceans). Understanding how these organisms have adapted to their various living conditions will lead to a better understanding of how changes in the environment will have impact on biodiversity in the future. It may also lead to discovery of entirely new species or novel proteins which could have utility as antibiotics or other drugs. Combined with other types of 'omic data, metagenomes hold the promise of unparalleled insights into fundamental questions across a range of fields including evolution, ecology, environment biology, health and medicine. To fully exploit the promise of these data we need both scientific innovation and community agreement on how to provide appropriate stewardship of these resources for the benefit of all. Significant numbers of metagenomics projects have been awarded grants by international funding bodies. Whilst all of these projects have specific, scientifically-interesting aims, they mostly exist in isolation, with little or no cross-referencing to other metagenomic or genomic datasets. Our intention is to leverage existing infrastructure to deliver a world-class metagenomics resource with unique utility for UK-based metagenomics researchers. This resource, MGportal, will utilise user-friendly interfaces, state-of-the-art algorithms and the EBI's unique position as a hub of biological information to measurably enhance the value of these researchers' data. It will be built in close collaboration with the Genomic Standards Consortium (GSC). MGportal will consist of software tools to enable metagenomics researchers to upload their data to the raw nucleotide sequence archives, data analysis pipelines to predict what potential genes are present in the data and what their function is, plus a web interface which will display these data and results in a way that is easy to browse and query. We will hold training courses and a workshop to gain input from the scientific community about the portal. It is hoped that MGportal will eventually allow researchers to understand the results of their metagenomics experiments, as well as seeing how those results compare with the outcomes of other studies.
Technical Summary
While genomes represent the full genetic (DNA) complement of a single organism, metagenomes represent the DNA of an entire community of organisms. Interest in improved sampling of diverse environments (e.g. hosts/gut, plants, soil, etc) combined with advances in the development and application of ultra-high throughput sequence methodologies is set to vastly accelerate the pace at which new metagenomes are generated. Combined with other types of 'omic data, metagenomes hold the promise of unparalleled insights into fundamental questions across a range of fields including evolution, ecology, environment biology, health and medicine. To fully exploit the promise of these data we need both scientific innovation and community agreement on how to provide appropriate stewardship of these resources for the benefit of all. In this three year collaborative project we aim to build an international data resource and portal for metagenomic data at the European Bioinformatics Institute. This portal will manage the submission, storage, dissemination and mining of metagenomic data from data providers across the world. The portal will focus on the capture of rich in contextual information (metadata), working in close collaboration with the Genomic Standards Consortium (GSC) an international working body creating and implementing standards to describe genomes, metagenomes and marker gene sequences. Further, the collaborative use of the ISA Infrastructure software suite for metadata capture will enable capture and sharing of standards compliant data and integration with a range of other data types. The resulting MGPortal will be a major new resource at the EBI. The combined MGPortal Team will engage in a range of community-building activities, including hosting workshops and training activities that both educate data submitters and users and will ensure the portal develops in line with community needs.
Planned Impact
The full impact of this work is described in the impact statement of the lead institute, the EBI. Here we elaborate on the specific impact of the work to be completed in this project under the auspices of the Genomic Standards Consortium and the ISA Infrastructure project. The primary impact of the proposed tight collaboration between these groups and the EBI is the increased level of community involvement in the creation of resources that serve community needs. This is a pioneering aspect of this proposal. Community-level consensus: This project will help to continue fund these key grass-roots activities, thus strengthening them and their ability to give a voice to the wider scientific community on issues of data stewardship, standardization and sharing. Specifically, this project will directly fund core activities with the GSC (i.e. through Peter Sterk's role as Secretary of the GSC) and most importantly provide funds to implement GSC recommended standards and the international level. This is a key step on the path towards international adoption of standards that will underpin future data sharing. It will also ensure the usage of a premier example of standards-compliant tools in the creation of this portal. The ISA Infrastructure, already funded by the BBSRC in the past BBR round, is a complete suite of tools for capturing and disseminating standards-compliant metadata. Its use in this project paves the way for universal sharing of metadata about sampled and data types as this work will increase the chances that other projects will adopt this shared aprpoach. Data Sharing. The adoption of these community-defined approaches is also in direct support of the strong BBSRC data sharing policy. Putting this standards-compliant infrastructure into place will ensure compliance with policy of making data freely available in re-useable form. Policy makers. The production of more-richly annotated bioinvestigations will improve the evidence base for policy makers by providing greater interpretability of experimental context, simplifying the job of data integration and study comparison. More detail for those forming policy on biological and biomedical issues should produce better decisions. Journals. The current trend shows that, like funders, journals increasingly require that firstly, researchers make more of their data public, for example by submitting it to public repositories, and that secondly, they begin to comply with community-defined standards. However 'non-compliance' may be difficult to overcome: experimental metadata are still normally sparse in publications and the supplementary data that sometimes accompany them, limiting data accessibility and utility. This is because of the lack of (i) reviewer time and expertise - they are not trained to check compliance, (ii) awareness of the existence of an appropriate reporting standards, (iii) access to freely available tools implementing standards, and (iv) adequate data management resources at the local and community levels. Greater automation of the reporting processes is required. The only feasible solution is better annotation and education at source (i.e., by providing data producers with a straightforward way in which to use community annotation standards), assisted by some form of automated content validation. Through this collaboration we will disseminate this best practice by building compliance with standards into the MGPortal. Outreach. The high profile nature of this project (a major new database/portal at the EBI) will help to spread the word about the importance of standards in the community. Finally, the planned workshops and interactions with the existing GSC and ISA communities with succeed in engaging a larger proportion of bench scientists in efforts to provide the best possible stewardship of our collective data assets.
Organisations
- University of Oxford (Lead Research Organisation)
- Natural Environment Research Council (Co-funder)
- University of Manchester (Collaboration)
- University College London (Collaboration)
- Newcastle University (Collaboration)
- UNIVERSITY OF CAMBRIDGE (Collaboration)
- IMPERIAL COLLEGE LONDON (Collaboration)
- ISA Commons (Collaboration)
- UNIVERSITY OF EDINBURGH (Collaboration)
- UNIVERSITY OF OXFORD (Collaboration)
- ELIXIR (Collaboration)
- Rothamsted Research (Collaboration)
- Heriot-Watt University (Collaboration)
- UNIVERSITY OF BIRMINGHAM (Collaboration)
- UNIVERSITY OF LIVERPOOL (Collaboration)
- UNIVERSITY OF DUNDEE (Collaboration)
- EARLHAM INSTITUTE (Collaboration)
Publications
Ho Sui SJ
(2012)
The Stem Cell Discovery Engine: an integrated repository and analysis system for cancer stem cell comparisons.
in Nucleic acids research
Hunter S
(2014)
EBI metagenomics--a new resource for the analysis and archiving of metagenomic data.
in Nucleic acids research
Johnson D
(2021)
ISA API: An open platform for interoperable life science experimental metadata
in GigaScience
Liolios K
(2012)
The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness.
in Standards in genomic sciences
Maguire E
(2013)
Visual Compression of Workflow Visualizations with Automated Detection of Macro Motifs
in IEEE Transactions on Visualization and Computer Graphics
Maguire E
(2012)
Taxonomy-Based Glyph Design—with a Case Study on Visualizing Workflows of Biological Experiments.
in IEEE transactions on visualization and computer graphics
Maguire E
(2013)
OntoMaton: a bioportal powered ontology widget for Google Spreadsheets.
in Bioinformatics (Oxford, England)
McQuilton P
(2016)
BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences.
in Database : the journal of biological databases and curation
Musen MA
(2015)
The center for expanded data annotation and retrieval.
in Journal of the American Medical Informatics Association : JAMIA
Description | We have contributed to the development of a public repository for metagenomics data at the EBI; specifically, we have refined a set of tools to help researchers to collect, annotate and submit their datasets to this repository. |
Exploitation Route | This is a public data deposition service and the tools are freely available to researchers for their continued use in managing and sharing their metagenomics datasets. |
Sectors | Agriculture Food and Drink Digital/Communication/Information Technologies (including Software) Education Pharmaceuticals and Medical Biotechnology |
URL | https://www.ebi.ac.uk/metagenomics |
Description | The portal is maturing and currently serves as a key community portal for this dat type. Its use will continue to increase the effectiveness of data sharing and the reuse. |
First Year Of Impact | 2013 |
Sector | Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Pharmaceuticals and Medical Biotechnology |
Impact Types | Cultural |
Description | COpenPlantOmics (COPO): a Collaborative Bioinformatics Plant Science Platform |
Amount | £1,000,000 (GBP) |
Funding ID | BB/L024101/1 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2015 |
End | 12/2018 |
Description | EC H2020 - INFRADEV-3-2015 - ELIXIR EXCELERATE |
Amount | € 240,000 (EUR) |
Organisation | European Commission |
Department | Horizon 2020 |
Sector | Public |
Country | European Union (EU) |
Start | 08/2015 |
End | 08/2019 |
Description | ISA-InterMine: accelerating and rewarding data sharing |
Amount | £1,174,660 (GBP) |
Funding ID | 208381/A/17/Z |
Organisation | Wellcome Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 07/2018 |
End | 07/2021 |
Title | BioSharing |
Description | Registry of standards and databases linked to data policies by funders and journals. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2011 |
Provided To Others? | Yes |
Impact | Launched in 2011, the BioSharing portal (https://biosharing.org) of interrelated standards, databases, and policies has 53,741 users and is a resource of the ELIXIR UK Node and the ELIXIR Interoperability Platform. Endorsed by a community of 68 organizations, including publishers (embedded in the data policies of 600 Springer Nature's journals, also PloS, EMBO press, BMJ, F1000Research, BioMedCentral, Oxford University Press, Wellcome Trust Open Research), standardization groups, and research data management support initiatives and libraries (such as those at JISC, Stanford, Cambridge and the Oxford Universities). |
URL | http://biosharing.org/ |
Title | Continued improvements to the ISA toolkit |
Description | Started in 2003 and first released in 2007, the ISA tools have been developed over time by the Oxford team and collaborators or directly contributed by partnering contributors, via the ISA Commons collaborative community. Short description of the developments and achievements of the resource over the last year: • Awarded Wellcome Trust funds (2018-2021), as collaborative project with the University of Cambridge's InterMine team to link the two resources and reward researchers for annotating and publish FAIR data; also, ISA is embedded in two ELIXIR Implementation Studies, on a Plant-focused data validation and on metabolomics. • With the uptake of ISA-Galaxy tools (https://github.com/ISA-tools/isatools-galaxy) and integration with the Galaxy Framework, ISA has struck a major milestone by showcasing how prospective data management can be done, demonstrating a full deposition workflow to Metabolights and creating training material (10.7490/f1000research.1115757.1). • Jupyter notebooks (https://github.com/ISA-tools/dtp-isa-exercises) have been developed as teaching material to showcase the use of ISA-API in various context to undergraduate and postgraduate courses on data readiness. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | Community use and impact is tracked via the ISA Commons, which currently has over 40 international groups, projects, and organizations that use and contribute to the development of components of the ISA metadata tracking framework. Therefore, we can say that the ISA user base ranges from hundreds to thousands of researchers from increasingly diverse domains (ranging from -omics, cell-based research, biomedical nanotechnology, plant phenotyping, toxicology, biodiversity, metagenomics, stem cell research, system biology, neuroscience, microbial science and immunology), and goes beyond researchers, curators, others resource developers and service providers, to also include journals. For example, ISA is used by the University of Oxford' GigaScience and underpins Springer Nature's Scientific Data data journal, supporting intelligent data sharing and credit; ISA is used to describe the experiment and to provide browse and search functionality for Scientific Data's content (http://scientificdata.isa-explorer.org). The ISA framework is currently embedded in a number of UK, EC and NIH and pharma funded infrastructure and research projects; here are exemplars from the ELIXIR UK Node and other Nodes: o EMBL-EBI MetaboLights' new web-based submission relies on ISA-JSON format to build web component and on the ISA-API to validate, convert experiments represented in ISA objects. o BBSRC-funded COPO infrastructure relies on the ISA API, ISA-JSON serialization and on the ISA configurations to support plant-based experiment molecular profiling experiments; it also used the ISAconverter to deposit to the ENA database. o ELIXIR-UK Node partners, University of Birmingham and Imperial College London use ISA Galaxy Tools, ISA-API and ISA validator - as part of their work in the UK Phenome Centre - to collect data prospectively but also organise public deposition to repositories. o ELIXIR Plant Community's MIAPPE standards and BrAPI rely on availability of ISA parsers and validation tools in the context of data validation programs. |
URL | http://isa-tools.org |
Title | Continued improvements to the ISA toolkit and the new Datascriptor component |
Description | Started in 2003 and first released in 2007, the ISA tools (http://isa-tools.org) have been developed over time by the Oxford team and collaborators or directly contributed by partnering contributors, via the ISA Commons collaborative community (https://www.isacommons.org). Key work over the last year is the development of a new component, the Datascriptor: https://datascriptor.org, as part of the Wellcome Trust award (2018-2021), a collaborative project with the University of Cambridge's InterMine team. Leveraging our experience and links with the communities, we are designing an open-source web-based tool - part of an ecosystem of existing annotation and authoring systems - to help researchers to use community standards to describe their (meta)data at the source, and capitalize on their effort to accelerate the creation of a data article. In addition major advances have been made to the ISA API also working with the ELIXIR Plant and Metabolomics communities. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | Community use and impact is tracked via the ISA Commons, which currently has over 40 international groups, projects, and organizations that use and contribute to the development of components of the ISA metadata tracking framework. Therefore, we can say that the ISA user base ranges from hundreds to thousands of researchers from increasingly diverse domains (ranging from -omics, cell-based research, biomedical nanotechnology, plant phenotyping, toxicology, biodiversity, metagenomics, stem cell research, system biology, neuroscience, microbial science and immunology), and goes beyond researchers, curators, others resource developers and service providers, to also include journals. For example, ISA is used by the University of Oxford' GigaScience and underpins Springer Nature's Scientific Data data journal, supporting intelligent data sharing and credit; ISA is used to describe the experiment and to provide browse and search functionality for Scientific Data's content (http://scientificdata.isa-explorer.org). The ISA framework is currently embedded in a number of UK, EC and NIH and pharma funded infrastructure and research projects; here are exemplars from the ELIXIR UK Node and other Nodes: (i) EMBL-EBI MetaboLights' new web-based submission relies on ISA-JSON format to build web component and on the ISA-API to validate, convert experiments represented in ISA objects. (ii) BBSRC-funded COPO infrastructure relies on the ISA API, ISA-JSON serialization and on the ISA configurations to support plant-based experiment molecular profiling experiments; it also used the ISAconverter to deposit to the ENA database. (iii) ELIXIR-UK Node partners, University of Birmingham and Imperial College London use ISA Galaxy Tools, ISA-API and ISA validator - as part of their work in the UK Phenome Centre - to collect data prospectively but also organise public deposition to repositories. (iv) ELIXIR Plant Community's MIAPPE standards and BrAPI rely on availability of ISA parsers and validation tools in the context of data validation programs. |
URL | https://datascriptor.org |
Title | ISA tools |
Description | Tools to collect, annotate, store, share and publish datasets |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2010 |
Provided To Others? | Yes |
Impact | Running since 2007, the open source metadata reporting ISA software suite has a user base ranging from hundreds to thousands of users from diverse domains (http://isa-tools.org), and is a resource of the ELIXIR UK Node. Currently it is embedded in 27 public resources (institute-based, project/consortium-based or global repositories, including some based at EBI, in USA, Japan, China and Australia), supports two data-driven journals (Springer Nature Scientific Data, Oxford University Press GigaScience), and complements 9 internal data platforms (also at the FDA National Centre for Toxicological Resources and Janssen R&D)- http://www.isacommons.org. The extension of the ISA metadata representation format for nanotechnology applications became a formal ASTM standard in 2013. |
URL | http://www.isa-tools.org |
Description | ELIXIR Interoperability Platform and ISA |
Organisation | ELIXIR |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | ISA is part of the ELIXIR Recommended Interoperability Resources (RIRs) to facilitate interoperability and reusability of life science data and support the principles of FAIR data management. |
Collaborator Contribution | The ELIXIR Recommended Interoperability Resources have been selected by external panel of reviewers, based on the selection criteria published in the Call for RIR application, which measure how they facilitate scientific research and how they improve FAIRness of life science data. |
Impact | ISA is and will continue to be used by and further developed with ELIXIR communities, especially with Plant and Metabolomics use cases. |
Start Year | 2018 |
Description | ELIXIR UK Node |
Organisation | Earlham Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | Heriot-Watt University |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | Imperial College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | Newcastle University |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | Rothamsted Research |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | University College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | University of Birmingham |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | University of Cambridge |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | University of Dundee |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | University of Edinburgh |
Department | Edinburgh Genomics |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | University of Edinburgh |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | University of Liverpool |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | University of Manchester |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ELIXIR UK Node |
Organisation | University of Oxford |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Help create the ELIXIR UK Node |
Collaborator Contribution | Contribute to the creation of the ELIXIR UK Node |
Impact | Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized. |
Start Year | 2012 |
Description | ISA Commons |
Organisation | ISA Commons |
Sector | Charity/Non Profit |
PI Contribution | We have helped many users, service providers and other developers to implement one or more components of the ISA software suite at their site to fit their data needs. |
Collaborator Contribution | They have helped us to refine the ISA software suite, filling gaps and tuning it for certain data types. |
Impact | The ISA Commons is a growing ecosystem of institute-based (e.g. USA NASA GeneLab Data Repository) and global repositories (e.g. EMBL-EBI MetaboLights), as well as data-driven journals (e.g. Springer Nature Scientific Data) that use the ISA formats, and/or are powered by one or more component of the ISA software suite. But also grass-root standards groups that leverage on the ISA data model and formats. The sustainability and maintenance of the ISA data model, formats, and tools, is guided by the ISA Working Group. |
Start Year | 2010 |
Title | Datascriptor |
Description | From structured dataset to data article. Leveraging our experience and links with the communities, we are now designing an open-source web-based tool - part of an ecosystem of existing annotation and authoring systems - to help researchers to use community standards to describe their (meta)data at the source, and capitalize on their effort to accelerate the creation of a data article. The user will be guided to provide (semi)structured descriptions of the experimental design, and of the post-processed data, to generate, respectively, the Methods and a set of statements to populate the Results section of a manuscript. Datascriptor will work: (i) as a stand-alone tool - for anyone to use - implementing generic metadata models, such as W3C Data Catalog vocabulary; and (ii) as a component of the ISA Tools - for its user communities - implementing the ISA metadata model. To output short sentences from the (semi)structured input, we will evaluate a mixed data-to-text approach using template-based and neural-based (i.e. machine learning) methods. To further enrich the content of the manuscript, Datascriptor will connect to existing authoring systems, including Substance, Texture, Stenci.la and Manuscripts, and export the result in JATS format. Our plans also include an export as a DAR file and in LaTeX format. |
Type Of Technology | Webtool/Application |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | Work has just started, but to ensure continued impact in the stakeholder community, the Datascriptor User Advisory Board includes a core group of existing collaborators: Thomas Lemberger (EMBO Press), Scott Edmunds (GigaScience), Holly Murray ( F1000), Varsha Khodiyar (Springer Nature). |
Title | ISA Model and Serialization |
Description | The original ISA-Tab specification was published as a Release Candidate document in 2008, documenting the initial work that forms the ISA framework, with a further update in 2009. Since then, we have done work on a new serialization in JSON, ISA-JSON, and abstracted out the data model from both the tabular and JSON formats. |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | Serialisations implemented by several ISA components; the documentation also helps other users to implement ISA formats. |
URL | http://isa-tools.org/2016/10/release-of-the-isa-specs/ |
Title | ISA software suite (built iteratively, component by component) |
Description | The open source ISA framework and tools help to manage an increasingly diverse set of life science, environmental and biomedical experiments that employing one or a combination of technologies. Built around the 'Investigation' (the project context), 'Study' (a unit of research) and 'Assay' (analytical measurement) data model and serializations (tabular, JSON and RDF), the ISA framework helps you to provide rich description of the experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable. |
Type Of Technology | Software |
Year Produced | 2010 |
Open Source License? | Yes |
Impact | Growing number of users, as listed at http://isacommons.org; but also of co-developers have and are contributing to the collaborative enhancements. |
URL | http://isa-tools.org/ |
Title | ISA tooling for the metabolomics community |
Description | A new set of ISA software tools have been developed out of the EU H2020 PhenoMeNal: Large-Scale Computing for Medical Metabolomics project (http://phenomenal-h2020.eu/home). The ISA team has been contributing to the project since 2015, and has been collaborating on the development of user-facing, cloud-based data management and processing infrastructure in the project. The PhenoMeNal software includes a new set of ISA-related Galaxy workflow tools, as well as native support for the ISA-Tab format in Galaxy. |
Type Of Technology | Software |
Year Produced | 2018 |
Open Source License? | Yes |
Impact | The tools work with the EBI MetaboLights database as well as with ISA-Tab studies uploaded directly into the Galaxy platform, and builds on the Python ISA-API. The Metabolights' use of ISA-API: Python-based REST service relying on the ISA-API https://github.com/EBI-Metabolights/MtblsWS-Py |
URL | http://isa-tools.org/2018/03/isa-galaxy-developed-for-metabolomics/ |
Title | ISA-API Python library |
Description | Project name: ISA-API Project home page: http://github.com/ISA-tools/isa-api Operating system(s): Platform independent Programming language: Python 3 Other requirements: None License: CPAL-1.0 ISA-API, a Python library that supports the creation, editing, parsing, and validatiation of both ISA-Tab and ISA-JSON formats, using a common data model implemented as native Python objects. |
Type Of Technology | Software |
Year Produced | 2018 |
Open Source License? | Yes |
Impact | This provides users with a common interface and interoperable medium between the two ISA formats, as well as conversion to a set of other formats required for depositing data in public databases. |
Description | Biohackathon; ELIXIR, Paris |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | The team participated to several tracks, especially working on ISA for plant and metabolomics community, as well as for use in Galaxy, and the bioschema work. The work carried our continue to embed ISA and FAIRsharing into ELIXIR-driven infrastructure and activities. |
Year(s) Of Engagement Activity | 2018 |
URL | https://www.elixir-europe.org/events/biohackathon-2018-paris |
Description | Datascriptor hackathon - eLife Innovation Sprint |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Hackathon on the Datascriptor prototype, part of the ISA toolkit. Datascriptor aims to taking the pain out of beginning to write papers, making it easy to automatically generate the parts of a paper that can be easily scaffolded and incentivising reproducible papers by ensuring the scaffolds include well-structured data and metadata. During the online event the prototype was fleshed out by user testing with hands-on use cases. |
Year(s) Of Engagement Activity | 2020 |
URL | https://sprint.elifesciences.org/data-paper-skeleton-tools-for-life-sciences/ |
Description | Poster presentation: ISAcreate and Galaxy; Galaxy conference, Portland |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | ISA-Tab format is now used by Galaxy tools; the discussion helped ensuring the uptake continue |
Year(s) Of Engagement Activity | 2018 |
URL | https://gccbosc2018.sched.com/event/FEWs/g26-isacreate-a-galaxy-tool-for-prospective-data-management... |