14 NSFBIO: Mining of imaging flow cytometry data for label free, single cell analysis

Lead Research Organisation: Swansea University
Department Name: College of Engineering


The project is a collaboration between researchers at Swansea University, UK and scientists at the Broad Institute of Harvard and MIT, Cambridge, US. The project will develop and demonstrate software to mine data from imaging flow cytometers. These instruments can capture thousands of images of cells per second. The images can in theory be analyzed to precisely measure hundreds of features related to cellular morphology; this project is to develop advanced machine-learning software to accomplish this, unlocking the otherwise hidden information within the images. The software will be developed, improved, and validated in several demonstration experiments involving the cell cycle, the component cells of primary blood, immune cell activation, and stem cell identity. The goal will be to use as few or indeed no fluorescent biomarkers, eliminating the need to perturb cells. The resulting open-source software will be freely available to scientists worldwide for both applied and clinical research, and will be accompanied by user-friendly training materials and in-person workshops. The project is collaborative and interdisciplinary and includes training early career-stage scientists in computational biology, via the existing Scientists without Borders program. The project involves close collaboration with a host of researchers from both the UK and US who use imaging flow cytometers and builds on a previous successful interdisciplinary collaboration in biological data mining by the teams at the Broad Institute and Swansea University.

In order to devise the novel software and methodology to mine the large datasets acquired using imaging flow cytometry, the team will develop algorithms to seamlessly import data from an imaging cytometer, robustly segment cells, quality-filter them (e.g., for debris and blur), and quantify morphological parameters (usually hundreds) for each cell (usually thousands), including various measures of size, shape, and texture. Using these features, trained machine-learning algorithms will identify cell phenotypes of interest or otherwise characterize the state of cell in driving biological projects from project partners who use imaging flow cytometry in a host of biological research studies. The goal will be to use as few or indeed no fluorescent biomarkers, eliminating the need to perturb cells. The project will give the scientific community a validated, open-source software toolbox of image processing and machine learning algorithms readily usable by biologists.

Technical Summary

In this project, we will leverage our expertise and existing microscopy-oriented software to develop a data mining methodology for the image data generated using an imaging cytometer. With experimental data from our project partners, we will optimize and prove these methodologies in biological applications in areas as diverse as stem cell differentiation, cell cycle analysis, and immune system activation. The novelty of this work includes:
1. Developing label-free assays. In addition to measuring morphological parameters that are standard for fluorescence microscopy images, we will emphasize the extraction of information present in two channels that do not require exogenous labels: brightfield and darkfield (the latter represents the scatter from the cell at right angles to the excitation). Both of these are captured by imaging flow cytometers but have thus far been largely (or completely) ignored. They have also generally been ignored using conventional microscopy because image-processing algorithms struggle to accurately delineate cell borders in both types of images - a problem that is avoided in imaging flow cytometry, because the cells are physically separate as they flow through the instrument. We expect that the inclusion of rich data from these channels, together with advanced machine learning algorithms, will enable scoring phenotypes and cellular states using fewer labels; in fact, in some key cases our proposed advancements will enable biologists to make relevant measurements of their cells with no staining required, expanding the types of cells and biological questions that can be addressed with the technology.
2. Quantifying heterogeneity at multiple levels. We will quantify the variation of the parameters derived from these cell images; both in a cell population from an individual and among cell populations from a cohort of people.
3. Creating a new, open-source workflow for advanced analysis of imaging flow cytometry data.

Planned Impact

To ensure broad impact, dissemination, and sustainability of the proposed research, the algorithms we develop for imaging flow cytometry will be packaged into user friendly, open-source software to allow any researcher to apply these advanced image analysis techniques in their own laboratories. Dr. Carpenter's group at the Broad Institute has an excellent track record of serving the needs of end users and ensuring software is disseminated and adopted by the community. For example, their CellProfiler software has already been cited more than 1,000 times. For this project, close collaboration with the laboratories leading the driving biological projects will ensure that end users contribute to the design and evaluation of the tools so that the end result serves the biological community's needs.

Specifically, we will disseminate algorithms as modules in the CellProfiler software package following our customary software engineering best practices. CellProfiler is already widely known and mature, is open-source and user-friendly, reads >100 image file formats, enables reproducible research, is cross-platform (Windows, Mac, and Unix), has a visualization/analysis companion called CellProfiler Analyst, and is being interfaced with other open-source projects (e.g., ImageJ/NIH Image, MicroManager). Contributing to existing software eliminates the waste of building and maintaining a separate software package solely for imaging flow cytometry data, and facilitates sustainability. Online, we will distribute example images, CellProfiler analysis pipelines, and written tutorials so the algorithms can be used easily by biologists. For computer scientists developing and testing new algorithms, we will contribute the validation images and ground truth to our Broad Bioimage Benchmark Collection (BBBC, www.broadinstitute.org/bbbc).


10 25 50

publication icon
Caicedo J (2017) Data-analysis strategies for image-based cell profiling in Nature Methods

publication icon
Rees P (2019) The origin of heterogeneous nanoparticle uptake by cells. in Nature communications

publication icon
Summers HD (2016) Multiscale benchmarking of drug delivery vectors. in Nanomedicine : nanotechnology, biology, and medicine

publication icon
McConnell KI (2016) Reduced Cationic Nanoparticle Cytotoxicity Based on Serum Masking of Surface Potential. in Journal of biomedical nanotechnology

publication icon
Nassar M (2019) Label-Free Identification of White Blood Cells Using Machine Learning. in Cytometry. Part A : the journal of the International Society for Analytical Cytology

publication icon
Doan M (2020) Label-Free Leukemia Monitoring by Computer Vision. in Cytometry. Part A : the journal of the International Society for Analytical Cytology

publication icon
Rees P (2016) An Analysis of the Practicalities of Multi-Color Nanoparticle Cellular Bar-Coding. in Combinatorial chemistry & high throughput screening

Description A major finding of this grant was that we could use machine learning to remove the requirement of using biomarkers to characterise a cell type in an image. Removing the need for biomarkers leads to cheaper assays and also freeing fluorescence channels to apply markers for cell function.
Exploitation Route Label free cell imaging will have significant applications to all researchers in biomedical imaging and also our latest work has demonstrated this a clinical environment i.e. Acute Lymphoblastic Leukaemia.
Sectors Healthcare

Description Open access deep learning solutions for imaging flow cytometry
Amount £150,562 (GBP)
Funding ID BB/P026818/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 01/2018 
End 09/2019
Description Broad Institute Sabbatical 
Organisation Broad Institute
Country United States 
Sector Charity/Non Profit 
PI Contribution 7 month sabbatical at the Broad Institute
Collaborator Contribution continued research projects together
Impact Nature Methods, ACS Nano and Cytometry papers
Start Year 2013
Description Crick Institute, London 
Organisation Francis Crick Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution Collaborators on Nature Comms paper
Collaborator Contribution Jurkat and yeast experiments
Impact Nature Comms
Start Year 2013
Description Helmholtz Centre Munich 
Organisation Helmholtz Zentrum München
Country Germany 
Sector Academic/University 
PI Contribution Collaboration on machine learning for label free cell phenotype identification
Collaborator Contribution Expertise in machine learning and visualisation
Impact Nature Comms Paper
Start Year 2014