📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Novel Deep Learning for Detecting Cancer cells with Raman Spectroscopy

Lead Research Organisation: UNIVERSITY COLLEGE LONDON
Department Name: Statistical Science

Abstract

1. A brief description of the topic
Raman Spectroscopy is widely used in chemistry to provide a structural fingerprint by which molecules can be identified. Raman spectra of cells, especially the characteristic peaks of spectra, contain essential bio-chemical information of these cells. Therefore, the motivation of this topic is that we want to apply Raman Spectroscopy to the problem of distinguishing cancer cells from other types of cells. By using Raman Mapping for a cell, we can measure Raman spectra at different positions evenly within a pre-specified rectangle area in this cell. Therefore, for each cell, we have an individual dataset with multiple Raman spectra at different positions. For a single Raman spectrum, intensity is measured at a large range of Raman shift wavenumbers (e.g. 1000 wavenumbers).

For a project at PhD level, I am eager to classify cancer cells from different cancer development stages, i.e. studying how aggressive cancer cells are. The whole motivations for this topic are: developing scalable machine learning algorithms and classifying cancer cells from different cancer stages. Any potential problems within the progress of the project can be set as extra topics, e.g. Variational Auto-encoder for medical image analysis.

2. Potential prospects
The methods developed for the project can be further transferred to other areas of research. For instance, image analysis, time series analysis, adaptive identification of human faces, and brain MRI analysis. The novel contents of the project include developing scalable machine learning algorithms for medical imaging which will be based on modern advanced deep learning algorithms.

3. Outline of the study
The whole framework of the project includes data pre-processing, dimension reduction, learning on Raman spectra of cells and learning on the images of cells.

Data pre-processing procedures, smoothing methods and dimension reduction techniques should be taken into account as well as classification techniques. Data pre-processing procedures may incorporate with bio-chemical knowledge. For example, the effects of fluorescence, water, glass and environment should be carefully removed before analysis. One example is modified polynomial fitting, which has been widely used to subtract the auto-fluorescence from Raman spectra. It is a method that modifies the least-square-based polynomial fitting by reassignment of fitted values. In terms of signal processing and dimension reduction, wavelet analysis can be employed as well as low-pass filters. Suitable dimension reduction methods such as widely-used linear dimension reduction methods and other non-linear dimension reduction methods (e.g. manifold learning using Variational auto-encoder, Laplacian Eigenmaps) can have great impact on the results. Potential machine learning methods can be taken into account, e.g. Boosting (as a meta-learning algorithm) and Gaussian Processes, which can be used as benchmarks for further analyses with deep learning techniques.

I will start the research with Deep Feedforward Networks (or MLPs, for short). After having some initial idea of to what extent MLPs can perform, I will start developing a deep learning framework for cancer cells detection. For instance, variational auto-encoders will be developed appropriately to learn manifolds from the training data and thus generate lower-dimensional representations, which can improve performance in classification tasks. For Raman Spectroscopy, Convolutional Neural Networks (or CNNs, for short) and Generative Adversarial Networks (or GANs, for short) can be used since intensities within a cell are measured at different positions across a large range of wave-numbers. This characteristic makes CNNs proper candidates to analyze Raman data of cells. The potential topics within this part of the project include the design of the architecture of networks, loss functions, attention mechanisms.

People

ORCID iD

ZHUO SUN (Student)

Publications

10 25 50

publication icon
Li X (2021) BSNet: Bi-Similarity Network for Few-shot Fine-grained Image Classification. in IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

publication icon
Li, K. (2023) Multilevel Control Functional in arXiv

publication icon
Sun, Z. (2023) Vector-Valued Control Variates in Proceedings of Machine Learning Research

publication icon
Sun, Z. (2023) Meta-learning Control Variates: Variance Reduction with Limited Data in Proceedings of Machine Learning Research

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513143/1 30/09/2018 29/09/2023
2327885 Studentship EP/R513143/1 30/09/2019 21/09/2023 ZHUO SUN
 
Description Significant new knowledge generated;
New and improved research methods or skills developed;
Important new research resources identified;
Important new research questions opened up.
Exploitation Route There are more and more reliant on artificial intelligence algorithms for ecological, environmental, and social problems. However, running these algorithms or collecting sufficient data can be very expensive and often requires large computer clusters or substantial human resources. This has lots of environmental implications, which is a big problem given that the UK and most modern societies are aiming for net zero. To tackle these challenges, my work allows people to reuse previous computation to reduce the overall environmental impact of these artificial intelligence models very significantly. It also allows us to tackle much more complicated artificial intelligence problems. In particular, my research builds efficient transfer-learning strategies for statistics and machine learning.
Sectors Digital/Communication/Information Technologies (including Software)

Energy

Environment

Financial Services

and Management Consultancy

Healthcare

Manufacturing

including Industrial Biotechology

 
Description More and more artificial intelligence (AI) algorithms are built for healthcare, ecological, environmental, and social problems. It can be very expensive to use these AI algorithms since it often involves collecting sufficient data and requires large computer clusters and substantial human resources. This can be a big problem given that the UK and most modern societies are aiming for net zero. The efficient transfer learning strategies for machine learning that I have designed allows people or scientists to re-use previous computation, data and (optimised-)algorithms to reduce the overall environmental impact of these artificial intelligence models very significantly. Meanwhile, it also allows us to tackle much more complicated artificial intelligence problems.
Sector Digital/Communication/Information Technologies (including Software),Energy,Environment,Financial Services, and Management Consultancy,Healthcare
 
Description The Alan Turing Institute Enrichment Placement 
Organisation Alan Turing Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution Within the placement, I collaborated with several scholars from The Alan Turing Institute and published papers in top tier artificial intelligence conferences. I also co-organised Statistics in Data Centric Engineering Seminar series at The Alan Turing Institute from 2021 to 2022.
Collaborator Contribution The Alan Turing Institute provided extra funding and created an environment that encourages academic collaboration.
Impact Meta-learning Control Variates: Variance Reduction with Limited Data. In Proceedings of The 39th Conference on Uncertainty in Artificial Intelligence (UAI), 2023. [awarded oral presentation].
Start Year 2021