Explainability of Machine Learning models for Adversarial Robustness

Lead Research Organisation: King's College London

Department Name: Informatics

Abstract

Adversarial attacks no longer only attack the classifier, but can now bypass explanation methods. What was once thought to be an effective defense, explanation methods now give a false sense of security to many high-performing Deep Learning models in the computer vision domain. This has caused security practitioners to be wary that these security flaws could be transferred to the security domain. However, we believe that explanation methods fulfil a major role in the robustness of classifiers against adversarial attacks. By having a deeper understanding of these explainers, we can better utilize them as a defense and forensics tool for not only security, but all machine learning problems. Our central question is "How can explainability of machine learning models improve adversarial robustness in the security domain?". This is broken down in to 4 smaller research questions: "Is the terminology in the literature for the explanation's robustness appropriate?", "Can explanations for computer vision be transferred to the security domain"?, "Can explanation methods help us better understand and tackle practical challenges in security?"," Do adversarial attacks reveal a similar explainer behavior to drift?". Which we aim to answer with 4 main projects: Dataset bias, Explanation Affinity triangle, Drift forensics and Adversarial attacks.

Student:

Hoifung Chow

Period of Study:

Sep 21 - Mar 25

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2608271

Research Topic:

Unclassified

Organisations

King's College London (Lead Research Organisation)

People	ORCID iD
Hoifung Chow (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/T517963/1			30/09/2020	29/09/2025
2608271	Studentship	EP/T517963/1	30/09/2021	30/03/2025	Hoifung Chow

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects