Explainable Deep Learning for Situational Understanding from Multimodal Data

Lead Research Organisation: Cardiff University
Department Name: Computer Science

Abstract

The PhD topic is aligned to the Distributed Analytics and Information Sciences International Technology Alliance (DAIS ITA), a joint programme between the UK Ministry of Defence and the US Army Research Laboratory, led by IBM see https://dais-ita.org/. The focus of DAIS ITA is on supporting future coalition operations that depend on artificial intelligence (AI) and machine learning capability, as well as humans and machines working effectively together. This PhD topic examines how AI systems built using deep learning techniques can be made more explainable and thus trustable by humans. As such, DAIS ITA and this PhD topic directly addresses the artificial intelligence element of the Industrial Strategy.
Huge advances have been made in machine learning in recent years due to breakthroughs in deep neural networks, called Deep Learning (DL). However, a key problem with DL approaches is that they are generally seen as being "black boxes": while they may work well in particular applications, it is usually unclear how they work, leading to challenges in improving their performance when they fail, and issues of user trust. There is consequently great interest in researching techniques to improve the interpretability of DL approaches to allow DL systems to generate explanations of how they reached a decision. To be useful, such explanations need to be generated in human-understandable terms, for example, identifying image features that were significant in a classification decision, or providing a brief textual description. The goal of this PhD is to make progress in this challenging area of DL, with a particular focus on situational understanding problems where the DL system is intended to assist a human decision maker in domains such as emergency response, security, policing and medical applications.
Situational understanding requires three key elements in terms of machine learning, both of which need to be explainable: (1) learning of temporal relationships, including predictions of likely future states (i.e., based on the current situation, what is likely to happen next); (2) learning at multiple hierarchical scales, from detection of low-level objects to identification of high-level relationships; and (3) learning from the fusion of multiple data streams of different modalities (e.g., imagery, text, GPS, etc). As an example, consider the problem of managing a large-scale event in a city centre, where streams of CCTV imagery, social media text, and real-time location data may be used to predict potential overcrowding and consequential disruption. This PhD will focus in particular on (3) - explainability in the context of multimodal data, with an initial focus on audio/visual or audio/text modalities.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S515000/1 01/10/2018 30/09/2022
2112215 Studentship EP/S515000/1 01/10/2018 31/12/2022 Harrison Lloyd Taylor