📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

XAIvsDisinfo: eXplainable AI Methods for Categorisation and Analysis of COVID-19 Vaccine Disinformation and Online Debates

Lead Research Organisation: University of Sheffield
Department Name: Computer Science

Abstract

UK vaccination rates are in decline and experts believe that vaccine disinformation, widely spread
in social media, may be one of the reasons. Recent surveys have established that vaccine
disinformation is impacting negatively citizen trust in COVID-19 vaccination specifically. As a
response, the UK Government agreed with Twitter, Facebook, and YouTube measures to limit the
spread of disinformation. However, simply removing disinformation from platforms is not enough,
as the government also needs to monitor and respond to the concerns of vaccine hesitant citizens.
Moreover, manual detection and tracking of disinformation, as currently practiced by many
journalists, is infeasible, given the scale of social media.

XAIvsDinfo aims to address these gaps through novel research on explainable AI-based models for
large-scale analysis of vaccine disinformation. Specifically, vaccine disinformation will be classified
automatically into the six narrative types defined by First Draft. A second model will categorise
vaccine statements as pro-vaccine, anti-vaccine, vaccine-hesitant, or other.
We will investigate explainable machine learning approaches that are human interpretable: both
in detecting errors and weaknesses of the models and in providing human-readable explanations
of the models' decisions.

XAIvsDisinfo will also create two new multi-platform datasets and organise a new community
research challenge on cross-platform analysis of vaccine disinformation, as follow-up from our
RumourEval one.

Our XAI models and tools will be integrated into the open-source InVID-WeVerify plugin, for take
up by journalists and fact-checkers. The project outputs will also contribute to evidence-based
policy activities by the UK government on improving citizen perception of COVID-19 vaccines.
 
Description We have studied the spread of vaccine narratives online, including vaccine hesitancy. The research is now complete. It involved journalists and data scientists, as well as AI researchers developing intelligent methods for detection and classification of vaccine hesitant and anti-vaccine posts, as well as categorisation of vaccine narratives into 6 topical categories.
Exploitation Route The research papers and datasets arising from the project have been of great interest to the research community and have been downloaded and cited multiple times already. Software and datasets have been made available on Zenodo and github for replicability.

The grant helped strengthen our reputation as world-leading researchers on online misinformation, as well as enabled us to provide timely input and advice to government departments around COVID-19 misinformation and vaccine misinformation in particular.

The misinformation detection and analysis research is now continuing with focus on the forthcoming elections and we plan to extend it towards climate misinformation in future.
Sectors Communities and Social Services/Policy

Digital/Communication/Information Technologies (including Software)

Government

Democracy and Justice

 
Description Provided input to DCMS to help shape up the Online Harms bill, as well as help with guidance during the pandemic response. Used by journalists at First Draft for their work on analysing vaccine narratives. Numerous researchers have downloaded the dataset and used the web services to replicate our research and analyse COVID-19 data.
First Year Of Impact 2021
Sector Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice
Impact Types Societal

Policy & public services

 
Description Participation in DCMS task force and virtual round table on tackling COVID-19 and vaccine misinformation
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
Impact Learnings from the task force have been used to inform the Online Harms bill going through Parliament.
URL https://www.gov.uk/government/news/social-media-giants-agree-package-of-measures-with-uk-government-...
 
Description Participation in the DCMS College of Experts
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
Impact Research on online disinformation and online abuse has played a key role in informing the latest Online Harms bill which is going to be put through Parliament.
 
Description vera.ai: Verification Assisted by Artificial Intelligence
Amount £900,000 (GBP)
Funding ID https://www.veraai.eu/home 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 08/2022 
End 09/2025
 
Title COVID-19 Claim Categoriser 
Description A machine learning classifier trained to categorise claims about COVID-19 into 10 categories. These were proposed by the Reuters Institute for the Study of Journalism: Public authority actions, policy, and communications Community spread and impact Medical advice and self-treatments Claims about prominent actors Conspiracy theories Virus transmission Virus origin and properties Public preparedness, protests, and civil disobedience Vaccines, medical treatments, and tests Other 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact The corresponding PLOS One journal paper has 61 citations as of 23 Feb 2024 (according to Google scholar): https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0247086 The web service has been used widely by many users since its launch, to replicate the research results. 
 
Title COVID-19 Vaccine Narrative Categoriser 
Description A machine learning classifier trained to categorise COVID-19 vaccine text into 6 categories. These are: Liberty/Freedom Development Provision and Access Safety Efficacy and Necessity Politics and Economics Conspiracy Morality Religiosity and Ethics 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact Used by First Draft in their research on vaccine misinformation. 
URL https://cloud.gate.ac.uk/shopfront/displayItem/covid19-vaccine
 
Title Vaccine Hesitancy Text Classifier 
Description This service classifies documents based on the stance towards COVID-19 vaccines expressed within the text. The classifier was trained using the VaxxHesitancy dataset we released. Full details of the dataset can be found in the associated ICWSM 2023 paper. Given a document, the service assigns one of the following four classes: pro-vaccine, anti-vaccine, vaccine-hesitant, and irrelevant. For details please see the accompanying paper: https://ojs.aaai.org/index.php/ICWSM/article/view/22213/21992 and dataset https://zenodo.org/records/7601328 
Type Of Material Improvements to research infrastructure 
Year Produced 2024 
Provided To Others? Yes  
Impact See citations of the accompanying paper: https://ojs.aaai.org/index.php/ICWSM/article/view/22213/21992 and dataset https://zenodo.org/records/7601328 The dataset has so far been downloaded 247 times and viewed over 800 times (as of 12 March 2024). 
URL https://cloud.gate.ac.uk/shopfront/displayItem/vaccine-hesitancy-classifier
 
Title COVID-19 Misinformation Dataset 
Description The dataset and the annotation codebook from "Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of the COVID-19 Infodemic" (accepted at RANLP 2023) 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://zenodo.org/record/8131933
 
Title COVID-19 Misinformation Dataset 
Description The dataset and the annotation codebook from "Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of the COVID-19 Infodemic" (accepted at RANLP 2023) 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact Citations of the paper and dataset for replicability. 
URL https://zenodo.org/record/8131932
 
Title Classifying COVID-19 vaccine narratives 
Description We release the augmented Twitter dataset of 355 vaccine-related narratives, created for the following paper. The tweets are labelled as one of four classes: Conspiracy (Cons), Moral, Religious, and Ethical Concerns (MRE), Liberties and Freedom (LF), and Animal Vaccines (AnimalVac).
@article{li2022classifying, title={Classifying COVID-19 vaccine narratives}, author={Li, Yue and Scarton, Carolina and Song, Xingyi and Bontcheva, Kalina}, journal={arXiv preprint arXiv:2207.08522}, year={2022} }
The paper has been accepted by RANLP 2023. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Dataset released for download and replicability. 
URL https://zenodo.org/record/8192130
 
Title Classifying COVID-19 vaccine narratives 
Description We release the augmented Twitter dataset of 355 vaccine-related narratives, created for the following paper. The tweets are labelled as one of four classes: Conspiracy (Cons), Moral, Religious, and Ethical Concerns (MRE), Liberties and Freedom (LF), and Animal Vaccines (AnimalVac).
@article{li2022classifying, title={Classifying COVID-19 vaccine narratives}, author={Li, Yue and Scarton, Carolina and Song, Xingyi and Bontcheva, Kalina}, journal={arXiv preprint arXiv:2207.08522}, year={2022} }
The paper has been accepted by RANLP 2023. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://zenodo.org/record/8192131
 
Title MMTweets Dataset 
Description Multilingual Misinformation Tweets (MMTweet) Dataset This repository contains datasets and scripts for Multilingual Misinformation Tweets (MMTweet) Dataset. Description - Data Annotation - Tweet Classification.pdf: Data annotation guidelines for tweet classification. - Data Annotation - Claim Matching.pdf: Data annotation guidelines for claim matching. - MMTweets_full_dataset.csv: CSV file containing the full Multilingual Misinformation Tweets (MMTweets) dataset. - MMTweets_test.csv: CSV file containing the test subset of the MMTweets dataset. - MMTweets_train.csv: CSV file containing the training subset of the MMTweets dataset. - debunk_corpus.json: JSON file containing the corpus of debunked narratives. - get_tweet_text.py: Python script for extracting tweet text from tweet IDs. Please contact us if you need the debunk information fields. - load_dataset.ipynb: Example Python notebook for loading dataset. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://zenodo.org/doi/10.5281/zenodo.7144807
 
Title MMTweets Dataset 
Description Multilingual Misinformation Tweets (MMTweet) Dataset This repository contains datasets and scripts for Multilingual Misinformation Tweets (MMTweet) Dataset. Description - Data Annotation - Tweet Classification.pdf: Data annotation guidelines for tweet classification. - Data Annotation - Claim Matching.pdf: Data annotation guidelines for claim matching. - MMTweets_full_dataset.csv: CSV file containing the full Multilingual Misinformation Tweets (MMTweets) dataset. - MMTweets_test.csv: CSV file containing the test subset of the MMTweets dataset. - MMTweets_train.csv: CSV file containing the training subset of the MMTweets dataset. - debunk_corpus.json: JSON file containing the corpus of debunked narratives. - get_tweet_text.py: Python script for extracting tweet text from tweet IDs. Please contact us if you need the debunk information fields. - load_dataset.ipynb: Example Python notebook for loading dataset. 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
URL https://zenodo.org/doi/10.5281/zenodo.10637161
 
Title MMTweets Dataset 
Description Multilingual Misinformation Tweets (MMTweet) Dataset This repository contains datasets and scripts for Multilingual Misinformation Tweets (MMTweet) Dataset. Description - codebooks: Contains PDF documents providing data annotation guidelines for tweet classification and claim matching. - MMTweets_full_dataset.csv: CSV file containing the full Multilingual Misinformation Tweets (MMTweets) dataset. - MMTweets_test.csv: CSV file containing the test subset of the MMTweets dataset. - MMTweets_train.csv: CSV file containing the training subset of the MMTweets dataset. - debunk_corpus.json: JSON file containing the corpus of debunked narratives. - scripts: Python scripts for extracting tweet text from tweet IDs. Please contact us if you need the debunk information fields. - load_dataset.ipynb: Example Python notebook for loading dataset. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
URL https://zenodo.org/doi/10.5281/zenodo.7144808
 
Title VaxxHesitancy: A Dataset for Studying Hesitancy Towards COVID-19 Vaccination on Twitter 
Description *** Please do not use this version. *** Please use the V2 version via https://zenodo.org/record/7601328 *** *** 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://zenodo.org/record/7535228
 
Title VaxxHesitancy: A Dataset for Studying Hesitancy Towards COVID-19 Vaccination on Twitter 
Description We create a publicly available dataset of over 3,100 COVID-19 vaccine-related tweets labeled as one of four stance categories: pro-vaxx, anti-vaxx, vaxx-hesitant, or irrelevant. *** Please use the V2 version. *** We split our dataset into two separate files: (1) VaccineHesitancy_train_v2.csv (Single + Double annotated) (2) VaccineHesitancy_test.csv (Double annotated) We present the details of this dataset here: VaxxHesitancy: A Dataset for Studying Hesitancy Towards COVID-19 Vaccination on Twitter (ICWSM 2023) @inproceedings{mu2023vaxxhesitancy, title={VaxxHesitancy: A Dataset for Studying Hesitancy Towards COVID-19 Vaccination on Twitter}, author={Mu, Yida and Jin, Mali and Grimshaw, Charlie and Scarton, Carolina and Bontcheva, Kalina and Song, Xingyi}, booktitle={Proceedings of the International AAAI Conference on Web and Social Media}, volume={17}, pages={1052--1062}, year={2023} } 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact The dataset has had 225 downloads as of 23 February 2023 and 771 views. 
URL https://zenodo.org/doi/10.5281/zenodo.7601328
 
Title VaxxHesitancy: A Dataset for Studying Hesitancy Towards COVID-19 Vaccination on Twitter 
Description We create a publicly available dataset of over 3,100 COVID-19 vaccine-related tweets labeled as one of four stance categories: pro-vaxx, anti-vaxx, vaxx-hesitant, or irrelevant. *** Please use the V2 version. *** We split our dataset into two separate files: (1) VaccineHesitancy_train_v2.csv (Single + Double annotated) (2) VaccineHesitancy_test.csv (Double annotated) We present the details of this dataset here: VaxxHesitancy: A Dataset for Studying Hesitancy Towards COVID-19 Vaccination on Twitter (ICWSM 2023) Our Pre-trained model (GateNLP/covid-vaccine-twitter-bert) : https://huggingface.co/GateNLP/covid-vaccine-twitter-bert Paper: https://ojs.aaai.org/index.php/ICWSM/article/view/22213/21992   @inproceedings{mu2023vaxxhesitancy, title={VaxxHesitancy: A Dataset for Studying Hesitancy Towards COVID-19 Vaccination on Twitter}, author={Mu, Yida and Jin, Mali and Grimshaw, Charlie and Scarton, Carolina and Bontcheva, Kalina and Song, Xingyi}, booktitle={Proceedings of the International AAAI Conference on Web and Social Media}, volume={17}, pages={1052--1062}, year={2023} } 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact The dataset has been downloaded over 240 times and viewed over 800 times as of 12 March 2024. 
URL https://zenodo.org/doi/10.5281/zenodo.7535227
 
Description Collaboration with ICFJ 
Organisation International Center for Journalists
Country United States 
Sector Charity/Non Profit 
PI Contribution Computational analysis of online abuse towards female journalists worldwide
Collaborator Contribution Qualitative research, journalistic expertise, paper writing, joint discussions and research
Impact All joint publications already listed - see those co-authored with Julie Posetti
Start Year 2021
 
Title GATE Teamware 
Description A web application for collaborative document annotation. GATE teamware provides a flexible web app platform for managing classification of documents by human annotators. 
Type Of Technology Software 
Year Produced 2023 
URL https://zenodo.org/record/7899193
 
Title GATE Teamware 
Description A web application for collaborative document annotation. GATE teamware provides a flexible web app platform for managing classification of documents by human annotators. 
Type Of Technology Software 
Year Produced 2023 
URL https://zenodo.org/record/8400054
 
Title GATE Teamware 
Description A web application for collaborative document annotation. GATE teamware provides a flexible web app platform for managing classification of documents by human annotators. 
Type Of Technology Software 
Year Produced 2024 
URL https://zenodo.org/doi/10.5281/zenodo.11151005
 
Title GATE Teamware 
Description A web application for collaborative document annotation. GATE teamware provides a flexible web app platform for managing classification of documents by human annotators. 
Type Of Technology Software 
Year Produced 2024 
URL https://zenodo.org/doi/10.5281/zenodo.11151036
 
Title GATE Teamware 2 
Description A web application for collaborative document annotation. GATE teamware provides a flexible web app platform for managing classification of documents by human annotators. 
Type Of Technology Software 
Year Produced 2023 
URL https://zenodo.org/record/7899194
 
Title GATE Teamware 2 
Description A web application for collaborative document annotation. GATE teamware provides a flexible web app platform for managing classification of documents by human annotators. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
URL https://zenodo.org/record/7821718
 
Title GATE Teamware 2 
Description A web application for collaborative document annotation. GATE teamware provides a flexible web app platform for managing classification of documents by human annotators. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
URL https://zenodo.org/record/7821719
 
Title Text Classification Web Annotation Tool 
Description A web application which enables a team of researchers to jointly annotate vaccine narratives for vaccine hesitancy and narrative category, for the purpose of training machine learning models. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Impact The tool is still being developed and the award plans to use to for the development and release of two datasets. 
 
Description Engagement with the EDMO network of disinformation researchers and fact-checkers 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I spoke about COVID-19 disinformation and AI at EDMO meetings and events between 2022 and 2024.
Year(s) Of Engagement Activity 2022,2023,2024
 
Description Invited talk at IFDaD 2022 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Presented the research outcomes to an international multi-stakeholder audience.
Year(s) Of Engagement Activity 2022
URL https://docs.google.com/presentation/d/1W8yGeWymG8Gg5cuTapi7cNaR6gUAuqCa/edit#slide=id.p1