A framework for evaluating and explaining the robustness of NLP models

Lead Research Organisation: King's College London

Department Name: Informatics

Abstract

The standard practice for evaluating the generalisation of supervised machine learning models in NLP tasks is to use previously unseen (i.e. held-out) data and report the performance on it using various metrics such as accuracy. Whilst metrics reported on held-out data summarise a model's performance, ultimately these results represent aggregate statistics on benchmarks and do not reflect the nuances in model behaviour and robustness when applied in real-world systems.

We propose a robustness evaluation framework for NLP models concerned with arguments and facts, which encompasses explanations for robustness failures to support systematic and efficient evaluation. We will develop novel methods for simulating real-world texts stemming from existing datasets, to help evaluate the stability and consistency of models when deployed in the wild. The simulation methods will be used to challenge NLP models through text-based transformations and distribution shifts on datasets as well as on data sub-sets that capture linguistic patterns, to provide a systematic coverage of real-world linguistic phenomena. Furthermore, our framework will shed insights into a model's robustness by generating explanations for robustness failures along the lexical, morphological, and syntactic dimensions, extracted from the various dataset simulations and data sub-sets, thus departing from current approaches that solely provide a metric to quantify robustness. We will focus on two NLP research areas, argument mining and fact verification, however, several simulation methods and the robustness explanations are also scalable to other NLP tasks.

Funded Value:

£318,212

Funded Period:

Jul 24 - Jan 27

Funder:

EPSRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/X04162X/1

Principal Investigator:

Oana Cocarascu

Research Subject:

Info. & commun. Technol. (50%)

Linguistics (50%)

Research Topic:

Artificial Intelligence (50%)

Computational Linguistics (50%)

Organisations

People	ORCID iD
Oana Cocarascu (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Mamta (2024) BiasWipe: Mitigating Unintended Bias in Text Classifiers through Model Interpretability in EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Mamta M (2025) FactEval: Evaluating the Robustness of Fact Verification Systems in the Era of Large Language Models

Research Databases and Models


Title	FactEval
Description	FactEval is a large-scale benchmark for extensive evaluation of large language models in the fact verification domain, covering 16 realistic word-level and character-level perturbations and 4 types of subpopulations.
Type Of Material	Computer model/algorithm
Year Produced	2025
Provided To Others?	Yes
Impact	FactEval can be used to evaluate the robustness of large language models in the fact verification domain.
URL	https://github.com/TRAI-group/FactEval

Abstract

Organisations

People

ORCID iD

Publications