Open access deep learning solutions for imaging flow cytometry

Lead Research Organisation: Swansea University
Department Name: School of Engineering

Abstract

Imaging flow cytometry (IFC), where an image of each individual cell is acquired as it flows through a cytometer can measure hundreds of thousands of individual cells in minutes, combining the high-throughput capabilities of conventional flow cytometry with single-cell imaging. IFC measures not only total fluorescence intensities but also the spatial image of the fluorescence plus bright-field and dark-field images of each cell in a population. The researcher is then able to measure hundreds of different parameters from these images, for example the cell size, shape, granularity and the position and intensity of any fluorescence biomarker in the cell. This rich information captured through IFC makes it an ideal candidate for the use of high-content approaches to specialist multivariate analysis tools which can then cluster similar cells together, identify rare cells in large populations and look for relationships between cells such as stem cell differentiation pathways.

Previously we have applied traditional machine learning techniques to the data output from an Imaging Flow Cytometer to perform these types of tasks and developed software solutions to allow a not expert user to apply these methods to their own datasets. While the methods have proven very successful the user must still measure the features using image analysis tools which requires knowledge of image analysis. However recently there has been a revolution in the field of artificial intelligence with the introduction of deep neural networks (DNNs). These DNNs take inspiration from the working of the human brain with many layers all connected to each other with some basic processing rules. Recent advances in the use of mulit-core graphical processing units has provided the computational power required to train these complex networks in a reasonable amount of time and they have found significant improvements over traditional machine learning methods for image recognition problems. One major advantage of these algorithms is that the user is not required to measure cell features or parameters as required for traditional machine learning algorithms. Therefore these DNNs offer the perfect solution to the problem of removing the step of requiring image analysis before the application of machine learning tools. Also the DNN require very large numbers of single cell images for training the networks which make them ideally suited to the analysis of IFC data.

In this project we will develop open access software tools to allow a non-expert to input the output file from an IFC directly into a neural network. We will use a variety of existing DNNs which have been optimised for existing image recognition problems (e.g. character recognition) and apply them to a set of IFC datasets. These datasets have been chosen not only to represent a set of specific biological problems but to also pose specific challenges to the machine learning algorithms. We will also develop and train our own DNNs which will be optimised to the specific datasets and make these available. One disadvantage of DNNs is the method the network learns to classify images is hidden from the user and to address this problem we will study the last layers of the network which contain the most specific patterns that the network uses for classifying images. We will correlate these patterns with the cells they identify in order to improve the performance of future network design. Finally we will use these features to visualise the relationships between cells in order to understand the evolution of the cells in a population.

Technical Summary

The large number of single cell images, which an Imaging Flow Cytometer (IFC) can measure from an individual sample, makes this an ideal platform for the use of multivariate analysis to perform tasks such as phenotype identification, cell differentiation or activation and cell cycle analysis. However the application of these techniques has been slow and the true potential of the measurement system has not been yet exploited. This is in part due to the expert knowledge required to apply multivariate techniques to this type of data. However in recent years the advent of graphical processing units in desktop personal computers has brought a revolution in the use of neural networks for a host of image recognition applications, making this one of the most exciting fields in artificial intelligence. These neural networks are simple to apply, as they do not require the user to measure any features from the images and are ideally suited to a large number of single object images, as generated by the IFC. In this project we will investigate the application of the latest deep neural networks to IFC data from a variety of applications from different collaborators. These applications cover a wide range of biological questions such as rare cell identification, cell activation heterogeneity etc. We will use exist pre-trained deep neural networks implemented using some of the most popular frameworks such as Tensor Flow and Caffe to determine their performance on our test datasets. The results of which will be used to develop our own networks which will incorporate visualisation tools to determine the relationship between cells e.g. tSNE. The culmination of the project will be the development of an open source pipeline, whicallows the user to input the proprietary format output file of the IFC into a typical neural network. The pipelines will be made freely available and we will organise workshops to expose interested researchers to the techniques.

Planned Impact

The results of the project will be published in journals appropriate to the research described in the case for support, including Bioinformatics, Methods, Nature Methods, Nature communications and Scientific Reports. We have requested funding for the postdoctoral researcher to attend CYTO 2018 to disseminate the results from the project and we will endeavour to attend the other relevant conferences and present this work using our own funds. We will also disseminate the work through the Imaging Flow Cytometry networks both in the UK and Europe. We are also members of the DeepBio consortium, a network which promotes the application of deep learning to biological problems and we will present our work and disseminate best practice at these meetings.

If the grant is awarded the College of Engineering is committing to fund a PhD studentship to work on this project. The student will be exposed to the field of deep learning - currently one of the most exciting fields in artificial intelligence with applications in areas as diverse as time series prediction of financial markets, speech recognition, object recognition

While the purpose of the project is to provide open source solutions to the analysis of IFC data we are aware that there are applications of these tools in a commercial setting and indeed we have just obtained a US patent related to use of machine learning to develop cell sorting solution using imaging cytometry (H. Hennig, T. Blasi, P. Rees, A.E. Carpenter, Method for Label-Free Image Cytometry, U.S. patent, publication no. WO2015168026 A3 (2016)). Also through Dr George Johnson we will develop deep learning tools for the genotoxicity micronucleus assay which is currently used by companies such as GSK, AstraZeneca and Roche.

The Welsh Government have also committed to fund a 2 day workshop/meeting through their Ser Cymru National Research Network scheme - support for up to 100 delegates which we will use to invite all users on the ImageStream and FlowSight instruments in the UK ( £10,000 costs, see letter of support). The NRN will help with the general organization of this workshop while the PDRA funded on the project will be tasked with organising the scientific content of the meeting including the DNN implementation tutorial. Previously we have run workshops designed specifically to take biologists, medics and clinicians with little computer knowledge and train them to run traditional machine learning algorithms using CellProfiler and CellProfiler Analyst on their IFC data. The workshop will demonstrate the use of DNNs to the analysis of IFC for the purposes of cell classification, cell cycle analysis etc. demonstrating how the use of DNNs will significantly reduce the complexity of the analysis pipeline by removing the need for the feature extraction from the individual cell images.

IFC is increasingly used in a clinical setting and our project aim of simplifying the use of very powerful DNN algorithms to the datasets acquired from these instruments will be extremely attractive to these users. One of the prime developments in healthcare over the next decade will be the introduction of personalised medicine through stratified treatment regimes. Our planned work on developing deep learning algorithms for use on human primary blood samples to identify all the cell types, activation/disease state will clearly integrate well with this vision. Within our collaborative network we have engagement with the Newcastle Children's Hospital, the ABMU NHS trust in South West Wales and the Massachusetts General Hospital, Boston. There will be therefore close and frequent dialogue between our research team and clinical groups who are using cytometry for diagnosis and treatment of disease.

Publications

10 25 50

publication icon
Doan M (2020) Objective assessment of stored blood quality by deep learning in Proceedings of the National Academy of Sciences

publication icon
Doan M (2018) Diagnostic Potential of Imaging Flow Cytometry. in Trends in biotechnology

publication icon
Doan M (2020) Label-Free Leukemia Monitoring by Computer Vision in Cytometry Part A

publication icon
Nassar M (2019) Label-Free Identification of White Blood Cells Using Machine Learning. in Cytometry. Part A : the journal of the International Society for Analytical Cytology

publication icon
Piasecka J (2020) Diffusion Mapping of Eosinophil-Activation State. in Cytometry. Part A : the journal of the International Society for Analytical Cytology

publication icon
Rees P (2019) The origin of heterogeneous nanoparticle uptake by cells. in Nature communications

 
Description Cambridge University Vet Medicine 
Organisation University of Cambridge
Department Cambridge Neuroscience
Country United Kingdom 
Sector Academic/University 
PI Contribution Analysis of single cell tissue images
Collaborator Contribution Providing tissue images for analysis
Impact Developing bovine mammary terminal duct lobular units have a dynamic mucosal and stromal immune microenvironment (see publications)
Start Year 2019