Novel Deep Learning for Detecting Cancer cells with Raman Spectroscopy

Lead Research Organisation: University College London

Department Name: Statistical Science

Abstract

1. A brief description of the topic
Raman Spectroscopy is widely used in chemistry to provide a structural fingerprint by which molecules can be identified. Raman spectra of cells, especially the characteristic peaks of spectra, contain essential bio-chemical information of these cells. Therefore, the motivation of this topic is that we want to apply Raman Spectroscopy to the problem of distinguishing cancer cells from other types of cells. By using Raman Mapping for a cell, we can measure Raman spectra at different positions evenly within a pre-specified rectangle area in this cell. Therefore, for each cell, we have an individual dataset with multiple Raman spectra at different positions. For a single Raman spectrum, intensity is measured at a large range of Raman shift wavenumbers (e.g. 1000 wavenumbers).

For a project at PhD level, I am eager to classify cancer cells from different cancer development stages, i.e. studying how aggressive cancer cells are. The whole motivations for this topic are: developing scalable machine learning algorithms and classifying cancer cells from different cancer stages. Any potential problems within the progress of the project can be set as extra topics, e.g. Variational Auto-encoder for medical image analysis.

2. Potential prospects
The methods developed for the project can be further transferred to other areas of research. For instance, image analysis, time series analysis, adaptive identification of human faces, and brain MRI analysis. The novel contents of the project include developing scalable machine learning algorithms for medical imaging which will be based on modern advanced deep learning algorithms.

3. Outline of the study
The whole framework of the project includes data pre-processing, dimension reduction, learning on Raman spectra of cells and learning on the images of cells.

Data pre-processing procedures, smoothing methods and dimension reduction techniques should be taken into account as well as classification techniques. Data pre-processing procedures may incorporate with bio-chemical knowledge. For example, the effects of fluorescence, water, glass and environment should be carefully removed before analysis. One example is modified polynomial fitting, which has been widely used to subtract the auto-fluorescence from Raman spectra. It is a method that modifies the least-square-based polynomial fitting by reassignment of fitted values. In terms of signal processing and dimension reduction, wavelet analysis can be employed as well as low-pass filters. Suitable dimension reduction methods such as widely-used linear dimension reduction methods and other non-linear dimension reduction methods (e.g. manifold learning using Variational auto-encoder, Laplacian Eigenmaps) can have great impact on the results. Potential machine learning methods can be taken into account, e.g. Boosting (as a meta-learning algorithm) and Gaussian Processes, which can be used as benchmarks for further analyses with deep learning techniques.

I will start the research with Deep Feedforward Networks (or MLPs, for short). After having some initial idea of to what extent MLPs can perform, I will start developing a deep learning framework for cancer cells detection. For instance, variational auto-encoders will be developed appropriately to learn manifolds from the training data and thus generate lower-dimensional representations, which can improve performance in classification tasks. For Raman Spectroscopy, Convolutional Neural Networks (or CNNs, for short) and Generative Adversarial Networks (or GANs, for short) can be used since intensities within a cell are measured at different positions across a large range of wave-numbers. This characteristic makes CNNs proper candidates to analyze Raman data of cells. The potential topics within this part of the project include the design of the architecture of networks, loss functions, attention mechanisms.

Student:

ZHUO SUN

Period of Study:

Oct 19 - Sep 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2327885

Research Topic:

Unclassified

Organisations

University College London (Lead Research Organisation)

People	ORCID iD
Jinghao Xue (Primary Supervisor)	http://orcid.org/0000-0003-1174-610X
ZHUO SUN (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/R513143/1			01/10/2018	30/09/2023
2327885	Studentship	EP/R513143/1	01/10/2019	22/09/2023	ZHUO SUN

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects