Explainability of Machine Learning models for Adversarial Robustness
Lead Research Organisation:
King's College London
Department Name: Informatics
Abstract
Adversarial attacks no longer only attack the classifier, but can now bypass explanation methods. What was once thought to be an effective defense, explanation methods now give a false sense of security to many high-performing Deep Learning models in the computer vision domain. This has caused security practitioners to be wary that these security flaws could be transferred to the security domain. However, we believe that explanation methods fulfil a major role in the robustness of classifiers against adversarial attacks. By having a deeper understanding of these explainers, we can better utilize them as a defense and forensics tool for not only security, but all machine learning problems. Our central question is "How can explainability of machine learning models improve adversarial robustness in the security domain?". This is broken down in to 4 smaller research questions: "Is the terminology in the literature for the explanation's robustness appropriate?", "Can explanations for computer vision be transferred to the security domain"?, "Can explanation methods help us better understand and tackle practical challenges in security?"," Do adversarial attacks reveal a similar explainer behavior to drift?". Which we aim to answer with 4 main projects: Dataset bias, Explanation Affinity triangle, Drift forensics and Adversarial attacks.
Organisations
People |
ORCID iD |
| Hoifung Chow (Student) |
Studentship Projects
| Project Reference | Relationship | Related To | Start | End | Student Name |
|---|---|---|---|---|---|
| EP/T517963/1 | 30/09/2020 | 29/09/2025 | |||
| 2608271 | Studentship | EP/T517963/1 | 30/09/2021 | 30/03/2025 | Hoifung Chow |