ICF: NIRG: HistoMaps: Stain agnostic feature representations to identify clinically relevant traits in the tumour microenvironment

Lead Research Organisation: University of Warwick
Department Name: Computer Science

Abstract

In early years of computational pathology, the algorithms were mainly focussed on segmentation and identification of objects such as nuclei, glands, ducts, vessels, and other patterns which are of interest to pathologists in every day clinical practice. The concept was to assist pathologists in identifying patterns which are difficult to eyeball over the huge landscape of cancer tissue in a whole slide image (WSI). The advent of modern CPath algorithms based on deep learning (DL) found that there are hidden features which humans usually ignore due to inattentional blindness. Therefore, CPath has moved beyond identification and classification of individual patterns within a whole slide image (WSI) towards WSI-level or case-level diagnosis, mutation and therapeutic response prediction discovering new morphological patterns, tissue phenotypes, even surpassing pathologist performance in some cases. On the other hand, DL algorithms are usually considered to be a black box due to lack of interpretability of the learnt features which makes it difficult to understand the biology of different diseases. One of the reasons is our inability to analyse huge landscapes in the tumour microenvironment (TME) where the WSIs are divided into small patches before analysis due to hardware limitations and complex DL architectures required for the analysis of images from different stains and modalities. The challenge is the gigapixel size of the WSIs containing the landscape of cancer which on one hand compels exploration but at the same time faces technological challenges. Due to tumour heterogeneity these small patches are usually not representative of the WSIs. Therefore, we need to develop techniques which can analyse WSIs without dividing them into smaller patches keeping the spatial information intact. This not only allows to overcome tumour heterogeneity limitations but helps in identifying heterogenous regions and embedded spatial relationships linked to patient outcome and other clinical variables. These techniques should be able to overcome the practical limitations of the hardware, invariant to the input stains and should be able to help with interpretability and biological understanding of the TME. The algorithmic limitations are currently being tackled by WSI-level weakly supervised labels or compressed representations. These approaches have some major drawbacks e.g., these approaches discard the essential spatial information required to incorporate cell-to-cell interactions in clinically significant regions during compression and are focussed mostly on identification or classification of disease into sub-categories where the DL model is treated as a black box. Analysis of TME at the cellular level is important to understand mechanisms in cancer where tumour heterogeneity plays a significant role. Multiplexed Immunofluorescence (MxIF) images provide additional data to subtype individual cells on the same tissue section which is not currently possible with existing brightfield approaches. There have been recent advances in whole slide image fluorescence imaging which allow scanning of WSIs with multiple markers. Therefore, we need stain and modality agnostic approaches which can analyse WSIs without losing spatial information at the cellular level so the rich data can be mined for better understanding of cancer. We propose to build on existing technology and utilise the extracted information to understand TME interactions at the whole slide image level. In this project, we will develop stain agnostic techniques to analyse and identify patterns in whole slide images (WSIs) by creating HistoMaps which can be directly related to biologically meaningful and clinically relevant parameters i.e., mutations, survival and response to therapy linking histology landscapes to clinical variables for better understanding of cancer helping oncologists to make informed decisions on therapeutic interventions and assisting pharma to develop new targets.

Technical Summary

The main aims of the project are to design and develop stain agnostic feature representations for the analysis of whole slide images (WSIs) which can overcome hardware limitations by reducing redundant information while keeping the essential data including spatial information intact. This will involve development of 1) image registration methods, 2) deep learning algorithms to extract features from Haematoxylin and Eosin (HE), Immunohistochemistry (IHC) and multiplexed immunofluorescence (MxIF) WSIs and 3) methods to link clinical variables and genetic variations to the new feature representations. We propose to design algorithms which can take out the redundant information to create representations of biological structures embedded in HistoMaps as done in a preliminary study for prediction of ER/PR status in breast histology using cell locations to create CellMaps. This new representation is much smaller in size (hxw) compared to the original WSI as we take out the redundant information and therefore helps with the challenge of hardware limitations to analyse WSI in one pass through a neural network. We will develop approaches which can utilise our previous work on classification and segmentation of cells, glands, and other objects to create feature representations in the form of HistoMaps while keeping the information on cell/region locations, their morphology, tumour heterogeinity and spatial arrangement intact. The HistoMaps will help to explore the role of various components in the TME and to study the impact of their interaction on disease outcome and behaviour, response to therapy and to investigate the link of these features to genetic alterations. The HistoMaps will also allow to mask cellular features with regional structures such as stroma, glands, vessels etc to investigate if the presence or absence of various cell types and their morphological features in these regions can be linked to clinical and biological data with the help of feature ranking.

Publications

10 25 50