Robust Multimodal Fusion For Low-Level Tasks

Lead Research Organisation: Heriot-Watt University

Department Name: Sch of Engineering and Physical Science

Abstract

There is a silent but steady revolution happening in all sectors of the economy, from agriculture through manufacturing to services. In virtually all activities in these sectors, processes are being constantly monitored and improved via data collection and analysis. While there has been tremendous progress in data collection through a panoply of new sensor technologies, data analysis has revealed to be a much more challenging task. Indeed, in many situations, the data generated by sensors often comes in quantities so large that most of it ends up being discarded. Also, many times, sensors collect different types of data about the same phenomenon, the so-called multimodal data. However, it is hard to determine how the different types of data relate to each other or, in particular, what one sensing modality tells about another sensing modality.

In this project, we address the challenge of making sensing of multimodal data, that is, data that refers to the same phenomenon, but reveals different aspects from it and is usually presented in different formats. For example, several modalities can be used to diagnose cancer, including blood tests, imaging technologies like magnetic resonance (MR) and computed tomography (CT), genetic data, and family history information. Each of these modalities is typically insufficient to perform an accurate diagnosis but, when considered together, they usually lead to an undeniable conclusion.

Our departing point is the realization that different sensing modalities have different costs, where "cost" can be financial, refer to safety or societal issues, or both. For instance, in the above example of cancer diagnosis, CT imaging involves exposing patients to X-ray radiation which, ironically, can provoke cancer. MR imaging, on the other hand, exposes patients to strong magnetics fields, a procedure that is generally safe. A pertinent question is then whether we can perform both MR and CT imaging, but use a lower dose of radiation in CT (obtaining a poor-resolution CT) and, afterward, improve the resolution of CT by leveraging information from MR. This, of course, requires learning what type of information can be transferred between different modalities. Another example scenario is autonomous driving, in which sensors like radar, LiDAR, or infrared cameras, although much more expensive than conventional cameras, collect information that is critical to driving in safe conditions. In this case, is it possible to use cheaper, lower-resolution sensors and enhance them with information from conventional cameras? These examples also demonstrate that many of the scenarios in which we collect multimodal data also have robustness requirements, namely, precision of diagnosis in cancer detection and safety in autonomous driving.

Our goal is then to develop data processing algorithms that effectively capture common information across multimodal data, leverage these structures to improve reconstruction, prediction, or classification of the costlier (or all) modalities, and are verifiable and robust. We do this by combining learning-based approaches with model-based approaches. Over the last years, learning-based approaches, namely deep learning methods, have reached unprecedented performance, and work by extracting information from large datasets. Unfortunately, they are vulnerable to so-called generalization errors, which occur when the data to which they are applied differs significantly from the data used in the learning process. On the other hand, model-based methods tend to be more robust, but have poorer performance in general. The approaches we propose to explore use learning-based techniques to determine correspondences across modalities, extracting relevant common information, and integrate that common information into model-based schemes. Their ultimate goal is to compensate cost and quality imbalances across the modalities while, at the same time, providing robustness and verifiability.

Planned Impact

Several groups will benefit from this project's research, from academic communities and companies working in the healthcare and robotics sectors, to national defense and, ultimately, the general public.

The participation in conferences and workshops like ICASSP, ICIP, ICML, BMVC, and AIP will ensure dissemination of the research outputs among the different academic communities in the areas of signal and image processing, machine learning and computer vision, and multimodal data processing. A more focused avenue for dissemination will be the organization of a small workshop at HWU. The theme of the workshop will be on multimodal signal processing, and we will invite two or three top researchers (national and international) in the area. To maximize its impact, we aim to make professional recordings of the talks freely available.

Although this project will develop fundamental signal processing tools, it has potential application in specific domains, for example, healthcare and autonomous robotic navigation. Two of our partners, Canon Medical Research Europe and SeeByte, are world-leading companies in translating research ideas into commercial products. Through joint supervision of students and regular meetings, these companies will help us identify the most promising research directions and applications of our algorithms and theory. The impact of these interactions, especially if it results in commercial products, can be significant. As mentioned in the case for support and pathways to impact, the PI is involved the UDRC Phase III project, which is funded both by Dstl (MoD) and EPSRC. The meetings associated to UDRC often take place at Dstl, and representatives of several defense companies (Thales, MBDA, Leonardo, BAE Systems, etc) are regular participants. This presents an ideal platform to advertise the research outputs of this proposal within the defense community and, thus, to maximize its national impact.

Finally, the wider public will also benefit from the research, not only indirectly via safer technology for automated healthcare, autonomous driving, and improved security systems, but also directly via outreach activities. With the help of undergraduate students at HWU, we will create demos and didactic videos related to the research, which will be presented at events like the Edinburgh Science Festival, secondary schools, and HWU open days. With these activities, we expect not only to improve the awareness of the general public on signal processing and its applications, but also to attract more students into this area.

Funded Value:

£254,575

Funded Period:

Feb 21 - May 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/T026111/1

Principal Investigator:

Joao De Castro Mota

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (25%)

Digital Signal Processing (75%)

Organisations

People	ORCID iD
Joao De Castro Mota (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Hamadouche A (2023) Improved Convergence Bounds For Operator Splitting Algorithms With Rare Extreme Errors

Hamadouche A (2023) A Low-Power Hardware-Friendly Optimisation Algorithm With Absolute Numerical Stability and Convergence Guarantees

Hamadouche A (2024) Sharper Bounds for Proximal Gradient Algorithms with Errors in SIAM Journal on Optimization

Hamadouche A (2021) Approximate Proximal-Gradient Methods

Hamadouche A (2022) Probabilistic Verification of Approximate Algorithms with Unstructured Errors: Application to Fully Inexact Generalized ADMM

Hamadouche A (2022) Sharper Bounds for Proximal Gradient Algorithms with Errors

Hua W (2024) Classification-Driven Discrete Neural Representation Learning for Semantic Communications in IEEE Internet of Things Journal

Lei Z (2023) Progressive Deep Image Compression for Hybrid Contexts of Image Classification and Reconstruction in IEEE Journal on Selected Areas in Communications

Mourya R (2023) MCNeT: Measurement-Consistent Networks Via A Deep Implicit Layer For Solving Inverse Problems

Vella M (2021) Enhanced Hyperspectral Image Super-Resolution via RGB Fusion and TV-TV Minimization

Key Findings
Impact Summary


Description	Although the original proposal focused on processing multimodal data, we obtained results so promising for single-modality data that we decided to pursue that direction instead. One of the key findings of the project was a new way to make neural networks more robust. Currently, end-to-end deep neural networks (DNNs) achieve the best results for reconstructing images from partial data, for example, magnetic resonance images (MRI), which are essential to detect various diseases. Despite achieving impressive results, they often fail to reconstruct small but important details or, reversely, introduce artefacts, a phenomenon called hallucinating. The techniques we developed allow modifying any end-to-end DNN to overcome these limitations. This was demonstrated on a series of applications, from hyperspectral imaging to MRI reconstruction and super-resolution imaging.
Exploitation Route	The techniques we developed are generic and can be explored in many other areas, for example, for embedding physical laws into DNNs.
Sectors	Digital/Communication/Information Technologies (including Software),Energy,Healthcare,Transport


Description	The industrial partners of the project, SeeByte and Canon Medical Imaging, have benefited from the findings of the project. For example, we identified a direct application of the developed methods to improve an important product of SeeByte.
First Year Of Impact	2023
Sector	Aerospace, Defence and Marine,Healthcare
Impact Types	Economic

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications