Expanded Metadata Support in the Open Microscopy Environment's Bio-Formats & OMERO Data Applications

Lead Research Organisation: University of Dundee
Department Name: School of Life Sciences

Abstract

Biological microscopy has always involved "imaging": images were initially hand drawn and with the advent of light-sensitive film, recorded and then reproduced on paper. These methods distorted the relationships between the signals they recorded (formally, they are "non-linear media"), making it difficult to use them for scientific measurements. However, the application of digital detectors to microscopy delivered "linear" measurements suitable for scientific use. This, combined with automation, spawned massive growth in the number and diversity of uses for digital imaging in basic and clinical research. Each platform produces many GBytes of data, usually in a closed, proprietary file format. These are powerful systems, but their full utility is limited by closed data and the difficulty of viewing and sharing large datasets on standard desktop computers.

The Open Microscopy Environment (OME) has built open software tools that enable access, analysis, viewing and sharing of this data. Initially built for light microscopy, we have successfully extended these tools to electron microscopy, high content screening (used for drug discovery in pharmaceutical research) and digital pathology. This proposal seeks to extend the type of data that OME covers, specifically to support the output of analyses and measurements made on digital image data. All of OME's software and resources are open source, available on-line to anyone, and supported by a dedicated team that manages documentation and community outreach.

Technical Summary

OME's OMERO platform (http://openmicroscopy.org/site/products/omero) includes server and client applications that combine an image metadata database, a binary image data repository and high performance visualization and analysis. A permissions system controls access to data within OMERO and enables sharing of data with users in a specific group or even publishing of image data to the worldwide community. Using these facilities, OMERO and Bio-Formats provide data access and management facilities to hundreds of laboratories worldwide, and several on-line scientific image publication systems, (e.g., http://emdatabank.org/ & http://jcb-dataviewer.rupress.org).

In this project, we aim to extend OME's metadata capabilities:

1. OME-TIFF and Bio-Formats are currently used by 1000's of labs and many commercial imaging companies, for transporting image acquisition metadata between software tools. In this project we will extend a completed draft "region-of-interest" (ROI) specification to include a comprehensive set of multi-dimensional shapes. This ROI specification will be incorporated into OME-TIFF and Bio-Formats and specifically used to pass and transport calculated regions between software tools.

2. OMERO users are increasingly adding extra, non-image data to images stored in OMERO that capture associated analytic results, annotations and other experimental outputs. OMERO currently supports these data (e.g., PDF, XML, .xls, etc.) as "Structured Annotations", allowing indexing of any text, and a unique namespace for access and recovery. Currently, these files are stored on a filesystem, accessed for download through the OMERO API. As these data grow in size and complexity, better storage, retrieval and mining are required. In this project, we will extend the sophistication of OMERO, extending its NoSQL support to these image-associated documents, to ensure that all associated data can be properly stored, indexed, mined, shared, and retrieved.

Planned Impact

The rise of quantitative biology has driven the generation of ever increasing stores of experimental data that are the foundation for biological research and discovery. Unfortunately, full exploitation of these data still remains unrealised. Data generated on commercial platforms are not stored in easily accessible formats and the size and complexity of these data makes routine analysis and sharing difficult. Collaborations depend on data sharing, but the transfer of complex, large datasets (>100 Gbytes is routine) between scientists, labs and/or software tools limits what can be achieved and is ultimately a barrier to scientific discovery.

OME's goal is to provide interfaces that enable data exchange-- between different software tools and between geographically remote scientists. Currently, OME's Bio-Formats file translation library and OMERO data management platform enable:

-- access to >125 scientific image file formats;
-- management, analysis, and sharing of image data relevant to a diverse range of biological research topics;
-- the foundation for the first on-line image publication facilities.

OME's tools are used worldwide, in thousands of laboratories, across many different domains of biological research. OME's commitment to an open development process, where all planning, roadmapping, user support, and developed code are openly available has built an active community of users in academic, biotech and pharmaceutical research. Some simply use the software as is, but many see it as a platform upon which their own applications, defined by their research needs, can be built. OMERO is the foundation for PerkinElmer's Columbus data management system which now runs HCS data in most major pharmaceutical companies in the world. OMERO and Bio-Formats also power several on-line scientific image repositories, the largest of which is the JCB DataViewer (http://jcb-dataviewer.rupress.org). The data published at the JCB DataViewer and other public OMERO instances are being incorporated into two new data resources, the Cell Phenotype Database and BioStudy (see LoS from Alvis Brazma and Gabriella Rustici) being developed at the EMBL-EBI, as part of their ongoing efforts to deliver important datasets to the life sciences community. Thus, the impact of OME, and its future funding and activities enhance research and productivity in laboratories in the UK and around the world.

Publications

10 25 50
publication icon
Burel JM (2015) Publishing and sharing multi-dimensional image data with OMERO. in Mammalian genome : official journal of the International Mammalian Genome Society

publication icon
Li S (2016) Metadata management for high content screening in OMERO. in Methods (San Diego, Calif.)

 
Description We have developed new metadata specifications for heterogeneous metadata in experimental biology. We have updated the OME Data Model, Bio-Formats and OMERO to support these extensions, and under separate funding implmeneted them in the Image Data Resource, a public resource for imaging metadata. Documentation for the new technology is public (https://docs.openmicroscopy.org/omero/5.4.10/developers/Model/KeyValuePairs.html).
Exploitation Route Our tools, provide a flexible metadata structure anyone can use to record their biological experiments. The tools will be especially useful will be useful for integrating heterogeneous data from biological studies.
Sectors Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://idr.openmicroscopy.org
 
Description The project has substantially developed and extended the range of metadata supported in OME's Data Model and software. Specific changes like addition of scientific units to all quantities are an important step. We have developed a flexible key-value representation for flexible metadata, especially for representations of experimental metadata. OME's Region of Interest (ROI) model is no much much more complete and supported across all OME software. A specification for object trajectories has been added and has been proposed as a standard by the Cell Migrations Standards Organisation. As an example, all these metadata extensions have been incorporated into the Image Data Resource (IDR), an image data publishing resource.
First Year Of Impact 2015
Sector Other
 
Description BBSRC BBR
Amount £1,790,000 (GBP)
Funding ID BB/M018423/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 01/2015 
End 06/2016
 
Description The Image Data Resource: Making Biological Imaging Data FAIR
Amount £1,323,597 (GBP)
Funding ID 212962/Z/18/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 10/2018 
End 10/2021
 
Title Bio-Formats and OMERO Software 
Description The extensions of OME's metadata model were cast in software and distributed under an open source license. 
IP Reference  
Protection Copyrighted (e.g. software)
Year Protection Granted 2014
Licensed Yes
Impact Glencoe Software has an exclusive license to deliver commercial licenses to Bio-Formats and OMERO software to its academia, biotech and pharma customers.
 
Title OMERO 5 Updates 
Description Re-architecture of OMERO to use a repository-based image import strategy, removing the requirement for data duplication, and substantially improving performance of the software 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact This development makes OMERO much more useful for large imaging studies, where data duplication is not an option. 
URL http://www.openmicroscopy.org/site/products/ome5
 
Title The OMERO Platform 
Description OME Remote Objects (OMERO) is a modern client-server software platform for visualizing, managing, and annotating scientific image data. OMERO lets you import and archive your images, annotate and tag them, record your experimental protocols, and export images in a number of formats. It also allows you to collaborate with colleagues anywhere in the world by creating user groups with different permission levels. OMERO consists of a Java server, several Java client applications, as well as Python and C++ bindings and a Django-based web application. 
Type Of Technology Software 
Year Produced 2009 
Open Source License? Yes  
Impact OMERO powers a large number of public image data repositories, and is installed in >4,000 sites worldwide. Please note-- the year of output realisation is incorrect, as it is has been a development project running since 2005. 
URL https://www.openmicroscopy.org/site/support/omero5/users/index.html