New Open File Formats for the Biological Imaging Community

Lead Research Organisation: University of Dundee
Department Name: School of Life Sciences

Abstract

Biological microscopy has always involved "imaging": images were initially hand drawn and with the advent of light-sensitive film, recorded and then reproduced on paper. These methods distorted the relationships between the signals they recorded (formally, they are "non-linear media"), making it difficult to use them for scientific measurements. However, the application of digital detectors to microscopy delivered "linear" measurements suitable for scientific use. This, combined with automation, spawned massive growth in the number and diversity of uses for digital imaging in basic and clinical research. Each platform produces many GBytes of data, usually in a closed, proprietary file format. These are powerful systems, but their full utility is limited by closed data and the difficulty of viewing and sharing large datasets on standard desktop computers.

The Open Microscopy Environment (OME) has built open software tools that enable access, analysis, viewing and sharing of this data. Initially built for light microscopy, we have successfully extended these tools to electron microscopy, high content screening (used for drug discovery in pharmaceutical research) and digital pathology. This proposal seeks to extend the file formats that OME supports and ensure there are specifications and software that meet the demands of the most advanced biological imaging modalities. All of OME's software and resources are open source, available on-line to anyone, and supported by a dedicated team that manages documentation and community outreach.

Technical Summary

OME's OME-TIFF is well-adopted in the biological imaging community. In this proposal, we aim to use OME's strong standing in the community as a developer and supplier of specifications and software for biological imaging metadata and apply our expertise and tools to several new binary image format s that have appeared. The proposed formats compromise a range of different data storage strategies and thus will be useful on their own, and also provide templates for others to extend other formats.

We will deliver software that supports storage of OME metadata in:

1. the HDF5-based BigDataViewer (BDV) format, making OME metadata available for users of this format (to date, primarily LSFM and block-face scanning electron microscopy). In a related project, we will add OME metadata, in using OME's ScreenPlateWell specification to the HDF-based Cellh5 format, which is used in a small number of HCS applications. Besides delivering updates to the specified formats, the formats and software will also serve as templates for anyone wanting to store metadata in their own version of HDF5.

2. the KLB format, an open source binary format for LSFM that uses fast lossless compression. As image volumes grow, several applications will require ways of incorporating compression. The KLB format provides one of the more advanced methods of storing and and compressing large numbers of 2D planes.

3. DICOM-Sup145, a derivative of DICOM developed to support pyramidal multi-resolution whole slide imaging data. To date the specification does not include released software. Moreover the current specification includes no concepts around annotations, regions of interest, etc.

Our implementations will be delivered to the community in Java and C++ to ensure they have the maximal exposure and utility.

Planned Impact

The rise of quantitative biology has driven the generation of ever increasing stores of experimental data that are the foundation for biological research and discovery. Unfortunately, full exploitation of these data still remains unrealised. Data generated on commercial platforms are not stored in easily accessible formats and the size and complexity of these data makes routine analysis and sharing difficult. Collaborations depend on data sharing, but the transfer of complex, large datasets (>100 Gbytes is routine) between scientists, labs and/or software tools limits what can be achieved and is ultimately a barrier to scientific discovery.

OME's goal is to provide interfaces that enable data exchange-- between different software tools and between geographically remote scientists. Currently, OME's Bio-Formats file translation library and OMERO data management platform enable:

-- access to >145 scientific image file formats;
-- management, analysis, and sharing of image data relevant to a diverse range of biological research topics;
-- the foundation for the first on-line image publication facilities.
- and relevant to this proposal, open formats for storing image metadata and binary data in an open form.

The major impact of this project will be the appearance of alternative open formats for biological imaging. This recognises the reality that a single file format can't cover the range of imaging modalities and experimental regimes across the wide range of biological imaging applications. We aim to initiate the development of several formats that support multi-resolution and TB-scale imaging.

More generally, OME's tools are used worldwide, in thousands of laboratories, across many different domains of biological research. OME's commitment to an open development process, where all planning, roadmapping, user support, and developed code are openly available has built an active community of users in academic, biotech and pharmaceutical research. Some simply use the software as is, but many see it as a platform upon which their own applications, defined by their research needs, can be built. OMERO is the foundation for PerkinElmer's Columbus data management system which now runs HCS data in most major pharmaceutical companies in the world. OMERO and Bio-Formats also power several on-line scientific image repositories, the largest of which is the JCB DataViewer (http://jcb-dataviewer.rupress.org). Thus, the impact of OME, and its future funding and activities enhance research and productivity in laboratories in the UK and around the world.

Publications

10 25 50
publication icon
Besson S (2019) Bringing Open Data to Whole Slide Imaging. in Digital pathology : 15th European Congress, ECDP 2019, Warwick, UK, April 10-13, 2019, Proceedings. European Congress on Digital Pathology (15th : 2019 : Warwick, United Kingdom)

 
Description This funding has resulted in a new version of Bio-Formats, released in Feb 2019. This version of Bio-Formats contains three major types of improvements:
1. Full reader/writer/sepcfication support for an updated OME-TIFF that supports multi-dimensional, multi-resolution tiled ("pyramidal") image files as sused in imaging of large blocks of tissue in reasearch and in clinical applications. This is the first open source, fully open, full implemented file format for whole slide imaging and other tisse imaging applications.
2. Support for the Keller Lab Block (KLB) image file format, a format used by several labs performaing light sheet microscopy or of large biological speSupport for the BigDataViewer format, another commonly used light sheet microscopy format.
Exploitation Route Bio-Formats is an open source software library used by >40,000 institutions worlwide and started >100,000 times per day. The updates in this version of Bio-Formats make it much more useful for scientists using light sheet microscopy and also add support for reading and writing an open source WSI image data format. We aim to promote the use of this format in the WSI resrach and clinical communities.
Sectors Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology

URL https://www.openmicroscopy.org/bio-formats/
 
Description With the release of Bio-Formats 6.0, there is now a released, supported reader and writer for pyramidal whole slide images (WSIs), which are used heavily in Digital Pathology. The new format uses an update of the successful, open OME-TIFF file format (https://docs.openmicroscopy.org/ome-model/5.6.4/ome-tiff/). In the recent InnovateUK Digital Pathology Network call (https://apply-for-innovation-funding.service.gov.uk/competition/177/overview), awards were made to two consortia, iCAIRD (http://www.sinapse.ac.uk/news/icaird-scottish-centre-of-excellence-for-ai-in-digital-diagnostics-to-open-in-glasgow) and PathLAKE (https://warwick.ac.uk/newsandevents/pressreleases/warwick_awarded_23/) that include OME's commercial partner, Glencoe Software (https://glencoesoftware.com). Both projects plan to build research data resources for anonynised, clinical WSI data that will use the updated OME-TIFF file format.
First Year Of Impact 2019
Sector Digital/Communication/Information Technologies (including Software),Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Title McDole et al Dataset in IDR 
Description Addition of the KLB reader to Bio-Formats made it possible to publsih the definitive fate map of the mouse embryo (Publication: https://doi.org/10.1016/j.cell.2018.09.031) 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact These are the original data that underly the publciation by McDole et al and demsontrate the definitive fate map of the mouse embryo. 
URL http://idr.openmicroscopy.org/webclient/?show=project-502
 
Description Euro-BioImaging 
Organisation Euro-BioImaging
Country European Union (EU) 
Sector Public 
PI Contribution BioImagingUK connects with Euro-BioImaging to provide feedback and updates on then status and priorities of the UK imaging community.
Collaborator Contribution Euro-BioImaging
Impact Ongoing work during Euro-BioImaging Interim Phase
Start Year 2009
 
Description Fiji/ImageJ-- Support for BigDataViewer Format in Bio-Formats 
Organisation Max Planck Society
Department Max Planck Institute for Molecular Cell Biology and Genetics
Country Germany 
Sector Academic/University 
PI Contribution Our Bio-Formats library is used by tens thousands of research organisations worldwide to read imaging data in >150 different formats. Our contibution involves the software (https://www.openmicroscopy.org/bio-formats/) and the expertise in developing new readers.
Collaborator Contribution The Desden-based Fiji development goup (led by Dr Pavel Tomancak) has built the BigDataViewer format and reader/writer software. The goal of the collaboration is to incorporate the reasders and writers in Bio-Formats, and make them versioned, and fully integrated and available beyond the standard Fiji distribution and to all Bio-Formats users.
Impact Software development is ongoing and will be released in 2019. Current software repos are at https://github.com/openmicroscopy/bioformats https://github.com/bigdataviewer
Start Year 2016
 
Description Support for KLB Image File Format in Bio-Formats 
Organisation Howard Hughes Medical Institute
Department Janelia Research Campus
Country United States 
Sector Academic/University 
PI Contribution Our Bio-Formats library is used by tens thousands of research organisations worldwide to read imaging data in >150 different formats. Our contibution involves the software (https://www.openmicroscopy.org/bio-formats/) and the expertise in developing new readers.
Collaborator Contribution Philipp Keller's Lab at Janelia Farm Research Center/Howard Hughes Medical Institute have developed the KLB format (https://bitbucket.org/fernandoamat/keller-lab-block-filetype) for use with light sheet microscopy imaging data. The Keller Lab's contribution was to provide guidance on the software they have developed and sample files we could use for testing and validating the Bio-Formats KLB reader.
Impact The announcement of support for the KLB format in Bio-Formats is at https://forum.image.sc/t/release-of-bio-formats-6-0-0/23099 The software is described and available at https://www.openmicroscopy.org/bio-formats/
Start Year 2017
 
Title Bio-Formats 6.0 
Description Bio-Formats 6.0.0 is a major update that includes support for the updated OME-TIFF file format, which now supports multi-resolution tiled images (so-called pyramidal file format). For more info, see http://blog.openmicroscopy.org/file-formats/community/2018/11/29/ometiffpyramid/. This new version of Bio-Formats also includes support for the KLB format for light sheet microscopy. Bio-Formats API changes: Java 8 is now the minimum supported version Sub-resolution reading: added MetadataList and CoreMetadataList classes added a new SubResolutionFormatReader abstract class for handling pyramidal format readers updated all pyramid format readers to use SubResolutionFormatReader deprecated getCoreMetadataList, seriesToCoreIndex, coreIndexToSeries, getCoreIndex and setCoreIndex in IFormatWriter added a new IPyramidHandler interface with the resolution getter methods Sub-resolution writing changes: IFormatWriter now extends IPyramidHandler (breaking) added setResolutions and getResolutions methods to IFormatWriter (breaking) added examples of using the sub-resolution writing API Tiled writing API changes: updated IFormatWriter to use setTileSizeX(0) and setTileSizeY(0) as a way to disable tiling (breaking) updated FormatWriter to set 0 as the default values of getTileSizeX() and getTileSizeY (breaking) IFormatWriter.getCompressionTypes now returns the types for the selected writer only Metadata handling: added getter methods to MetadataTools for retrieving OME enumerations by value deprecated OME enumeration getter methods in FormatReader Refactor FilePatternReader logic in a new WrappedReader abstract class New file formats: KLB added a new reader for Keller Lab Block (KLB) files CV7000 added a new reader for Yokogawa CV7000 datasets GE MicroCT added a new reader for GE MicroCT datasets File format fixes and improvements: Aperio SVS/AFI removed pyramidal resolutions of mismatching pixel types fixed exposure times, improved image naming of AFI datasets displayed original metadata keys for each channel of AFI datasets added support for multiple Z sections DICOM improved file grouping and file-to-series mapping for multi-file datasets Fake added support for multi-resolution test images now populating WellSample positions when present using Plane data Gatan Digital Micrograph adjusted endianness and record byte count for long values allowed ROIs to be stored in DocumentObjectList groups no longer creating an empty ROI when an unsupported shape type is encountered Image Pro added support for Image Pro Plus .ips set GE InCell added support for parsing minimum and maximum pixel values Lambert Instruments FLIM fixed an integer overflow error with large files (thanks to Rolf Harkes) Leica LIF unified metadata parsing to use DataTools.parseDouble Leica SCN improved support for Versa datasets Micro-Manager improved handling of very large metadata.txt files prevented NumberFormatException for invalid double values add support for parsing ChannelColor from metadata.txt files Metamorph added support for multi-dimensional .scan dataset created from Scan Slide (thanks to Jeremy Muhlich) MRC (Medical Research Council) fixed endian detection for old-style headers Nikon ND2 prevented integer overflow when reading chunkmaps from files larger than 2GB fixed handling of duplicate and incomplete exposure time lists fixed chunk map handling when CustomData blocks are between ImageDataSeqs OME-TIFF added support for reading OME-TIFF with pyramidal resolutions stored as SubIFDs added support for writing OME-TIFF with pyramidal resolutions added support for companion OME-TIFF filesets where TIFF does not link back to the metadata file improved handling of missing planes in TiffData PerkinElmer Operetta improved support to handle datasets generated by the Harmony software TIFF split IFDs into separate series if the dimensions or pixel type mismatch restricted use case for legacy TIFF JAI reader fixed a bug with FillOrder which resulted in 0 pixel values Zeiss CZI reduced duplicate original metadata when reading a pyramid file Zeiss TIFF added support for AVI files acquired with Keyence software Zeiss ZVI reuse stream for sequential calls to openBytes on the same plane updated all pyramidal format readers to consume SubResolutionReader updated all readers to consume MetadataTools getter to retrieve enumerations reviewed all readers and plugins to close open instances of RandomAccessInputStream fixed some deprecation warnings in a number of readers for RGB images using ChannelSeparator all channel metadata is now copied instead of just names ImageJ plugin improvements: updated the updater message in the Fiji plugin (thanks to Jan Eglinger) disabled LUT writing for any plane that has a default grayscale lookup table added macro option to always skip LUT writing MATLAB toolbox improvements: improved performance of bfGetPlane by removing an unnecessary data copy (thanks to Cris Luengo) Command-line tools improvements: bfconvert utility added -no-flat option to the command-line tools to convert files with sub-resolutions added -pyramid-scale and -pyramid-resolutions options to generate sub-resolutions during conversion removed Plate elements when -series is passed as an option extended usage to describe available formats, extensions and compressions xmlvalid utility added new validate methods to loci.formats.tools.XMLValidate returning the validation status added a return code to xmlvalid Component changes: ome-common was upgraded to 6.0.0 ome-codecs was upgraded to 0.2.3 ome-model was upgraded to 6.0.0 Automated test changes: added testng.allow-missing property allowing to skip unconfigured filesets added testUnflattenedSaneOMEXML to compare series count to OME-XML images count when resolution flattening is disabled added test-equivalent target to compare pixel data between two files added support for storing resolution index and resolution count in the configuration files used for automated testing tests now fail when a configured file throws UnknownFormatException Documentation improvements: fixed the xmlvalid documentation page (thanks to Kouichi C. Nakamura) improved the memory section of the MATLAB documentation page (thanks to Kouichi C. Nakamura) extended IFormatReader Javadocs to reflect the reader guide added reference to current Adobe TIFF specification switched to image.sc as the reference location for public feedback Full details can be found at: https://docs.openmicroscopy.org/bio-formats/6.0.0/about/whats-new.html 2 The software is available at: https://www.openmicroscopy.org/bio-formats/downloads/ 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This version of Bio-Formats contains three major types of improvements: 1. Full reader/writer/sepcfication support for an updated OME-TIFF that supports multi-dimensional, multi-resolution tiled ("pyramidal") image files as sused in imaging of large blocks of tissue in reasearch and in clinical applications. This is the first open source, fully open, full implemented file format for whole slide imaging and other tisse imaging applications. 2. Support for the Keller Lab Block (KLB) image file format, a format used by several labs performaing light sheet microscopy or of large biological specimens. 3. Not released, but soon to be is support for the BigDataViewer format, another commonly used light sheet microscopy format. 
URL https://docs.openmicroscopy.org/bio-formats/6.0.0/about/whats-new.html