Learning from COVID-19: An AI-enabled evidence-driven framework for claim veracity assessment during pandemics

Lead Research Organisation: University of Warwick
Department Name: Computer Science

Abstract

The term 'infodemic' coined by the WHO refers to misinformation during pandemics that can create panic, fragment social response, affect rates of transmission; encourage trade in untested treatments that put people's lives in danger. The WHO and government agencies have to divert significant resources to combat infodemics. Their scale makes it essential to employ computational techniques for claim veracity assessment. However, existing approaches largely rely on supervised learning. Present accuracy levels fall short of that required for practical adoption as training data is small and performance tends to degrade significantly on claims/topics unseen during training: current practices are unsuitable for addressing the scale and complexity of the COVID-19 infodemic.

This project will research novel supervised/unsupervised methods for veracity assessment of claims unverified at the time of posting, by integrating information from multiple sources and building a knowledge network that enables cross verification. Key originating sources/agents will be identified through patterns of misinformation propagation and results will be presented via a novel visualisation interface for easy interpretation by users.

This high-level aim gives rise to the following objectives:
RO1. Collect COVID-19 related data from social media platforms and authoritative resources.
RO2. Develop automated methods to extract key information on COVID-19 from scientific publications and other relevant sources.
RO3. Develop novel unsupervised/supervised approaches for veracity assessment by incorporating evidence from external sources.
RO4. Analyse dynamic spreading-patterns of rumour in social media; identify the key sources/agents and develop effective containment strategies.
RO5. Validate the methods via a set of new visualisation interfaces.
 
Description Our key findings are summarised below:

- Our project has created new datasets and developed a number of novel approaches for fact checking and claim verification that have been applied through different experiments to these datasets. The experiments have provided new knowledge on the architecture of such models and how different elements of information (as inference relationships, similarity scores, etc) can be combined for a more efficient veracity assessment, as well as a more deeper knowledge on generalisability of fact verification models opening promising paths for improvement as few-shot learning or updating of embeddings. These findings have advanced our knowledge to claim veracity assessment specifically related to COVID.
Exploitation Route We have organised interviews with BCC Monitoring and Fullfact to gain a better understanding of the fact checking process currently adopted by the journalists in BBC and by the fact checking company. The findings from the interview will guide the development of a fact checking tool in order to meet the requirements of the journalists and fact checkers.
Sectors Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice,Other

URL https://panacea2020.github.io/index.html
 
Title BERT-based Text and Image multimodal model with Contrasting learning (BTIC) 
Description The BERT-based Text and Image multimodal model with Contrasting learning (BTIC) has been developed for unreliable multimodal news detection. It captures both textual and visual information from unreliable articles utilising the contrastive learning strategy. The contrastive learner interacts with the unreliable news classifier to push similar credible news (or similar unreliable news) closer while moving news articles with similar content but opposite credibility labels away from each other in the multimodal embedding space. 
Type Of Material Computer model/algorithm 
Year Produced 2021 
Provided To Others? Yes  
Impact NA 
URL https://github.com/WenjiaZh/BTIC
 
Title COVID-RV - a novel COVID-19 dataset of false claims and relevant Twitter conversations 
Description To facilitate generalisability evaluation of rumour verification models, we introduce the COVID-RV (COVID-Rumour Verification) dataset. It extends CovidLies (Hossain et al., 2020), a manually curated dataset of claims on COVID-19, by associating claims with social media conversations from Twitter. COVID-RV is carefully curated and manually annotated in two stages for tweet relevance and stance towards the claim. Unlike datasets containing only individual posts, COVID-RV matches claims with relevant tweets which are sources of a conversation, as well as associated conversations. This makes it possible to evaluate rumour verification models making use of conversation threads. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? No  
Impact This dataset will be released publicly and used to evaluate existing models for rumour verification and develop novel ones. 
URL https://panacea2020.github.io/
 
Title NLI-SAN 
Description Veracity assessment approaches for automated fact-checking of claims. It uses Natural Language Inference (NLI) and contextualised representations of the claims and evidence. NLI-SAN combines the inference relation between claims and evidence with attention techniques. 
Type Of Material Computer model/algorithm 
Year Produced 2022 
Provided To Others? No  
Impact The description of the approach as well as an online platform implementing it will be soon published (material currently under review). 
URL https://panacea2020.github.io
 
Title PANACEA Dataset 
Description The dataset aggregates a heterogeneous set of COVID-19 claims categorised as True or False. Aggregation of heterogeneous sources involved a careful deduplication process to ensure dataset quality. Fact-checking sources are provided for veracity assessment, as well as additional information sources for True claims. Additionally, claims are labelled with sub-types (Multimodal, Social Media, Questions, Numerical, and Named Entities). The LARGE version of the dataset contains 5,143 claims and the SMALL version 1,709 claims. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? No  
Impact The dataset will be soon published (currently under review) in order to be used by the research community. 
URL https://panacea2020.github.io
 
Title Stance-Augmented VAE Disentanglement model (SAVED) 
Description The SAVED model has been proposed for Twitter rumour veracity assessment. It incorporates a Variational Auto Encoder (VAE) with adversarial learning to disentangle topics which are informative for stance classification from those which are not. Tweet representations are derived based on the word representations learned in the latent stance-dependent topic space, which are then used to train a veracity classifier to classify whether the veracity of an input tweet is true, false or unverified. The model achieves the state-of-the-art accuracy scores on the commonly used PHEME dataset for Twitter veracity assessment. 
Type Of Material Computer model/algorithm 
Year Produced 2021 
Provided To Others? Yes  
Impact The developed SAVED model achieves the state-of-the-art accuracy scores on the commonly used PHEME dataset for Twitter veracity assessment. 
URL https://github.com/JohnNLP/SAVED
 
Title Stance-Aware Evidence Reasoning and Stance-Aware Aggregation model (TARSA) 
Description TARSA was proposed for more accurate fact verification, with the following four key properties: 1) checking topical consistency between the claim and evidence; 2) maintaining topical coherence among multiple pieces of evidence; 3) ensuring semantic similarity between the global topic information and the semantic representation of evidence; 4) aggregating evidence based on their implicit stances to the claim. 
Type Of Material Computer model/algorithm 
Year Produced 2021 
Provided To Others? Yes  
Impact Since the paper was published in 2021, it has been cited by authors from Checkstep Research, Sofia University in Bulgaria, University of Copenhagen in Denmark, Qatar Computing Research Institute, Fudan University in China, ByteDance AI Lab and the University of California, Santa Barbara in the US. 
URL https://github.com/jasenchn/TARSA
 
Description Collaboration with University of California, Irvine (UCI) 
Organisation University of California, Irvine
Department Donald Bren School of Information and Computer Sciences (ICS)
Country United States 
Sector Academic/University 
PI Contribution With the goal of creating dataset of Twitter conversations discussing misconceptions about COVID-19, we have been collecting the Twitter conversations related to the given claims and provided manual relevance annotations. We then performed experiments on methods for claim-tweet matching as well as generalisability of rumour verification models to the new dataset.
Collaborator Contribution We have been collaborating with researchers from University of California, Irvine (UCI), Tamanna Hossain, Robert L. Logan IV, Arjuna Ugarte and Sameer Singh, authors of the COVIDLies dataset. They collaborated studying the stance detection task, posing the question whether the tweet spreads a known misconception about COVID-19. They have provided manually assessed set of misconceptions. They have annotated relevant instances in the dataset that we have created for stance towards the claim, as either supporting, denying or discussing.
Impact The manually annotated dataset of Twitter conversations discussing misconceptions about COVID-19, relevant publication is under review
Start Year 2020
 
Title Code for CIKM Short Paper "Supervised Contrastive Learning for Multimodal Unreliable News Detection in COVID-19 Pandemic" 
Description The code for CIKM short paper "Supervised Contrastive Learning for Multimodal Unreliable News Detection in COVID-19 Pandemic". In this work, we propose a BERT-based multimodal unreliable news detection framework, which captures both textual and visual information from unreliable articles utilising the contrastive learning strategy. The contrastive learner interacts with the unreliable news classifier to push similar credible news (or similar unreliable news) closer while moving news articles with similar content but opposite credibility labels away from each other in the multimodal embedding space. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact NA 
URL https://zenodo.org/record/6342230
 
Description Featured in Futurum, an online magazine 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Schools
Results and Impact Yulan He was featured in Futurum Careers, an online magazine, discussing her work on teaching computers to understand human language and offering guidance to young people interested in AI and NLP. Futurum Careers is a free online resource and magazine aimed at introducing 14-19-year-olds worldwide to the world of work in science, tech, engineering, maths, medicine, social sciences, humanities and the arts for people and the economy.
Year(s) Of Engagement Activity 2022
URL https://futurumcareers.com/teaching-computers-to-understand-our-language
 
Description Invited talk, AI4Media Workshop on "Human- and Society-centred AI", online event 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Workshop included presentations from AI4Media partners and two invited speakers. I, as an invited speaker, have presented a review of existing methods and challenges related to automated rumour verification. The presentation sparked questions and discussion afterwards.
Year(s) Of Engagement Activity 2021
URL https://www.vision4ai.eu/ai4media-workshop-human-society-ai/
 
Description Invited talk, USC Viterbi, Information Sciences Institute, online event 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I was invited to be a guest speaker at ISI AI seminar. The title of my talk was "STATE OF THE ART AND CHALLENGES OF AUTOMATED RUMOUR VERIFICATION IN SOCIAL MEDIA CONVERSATIONS". The seminar had approx 30-40 participants and the talk sparked questions and discussion afterwards. Additionally I had personal meetings with members of their department.
Year(s) Of Engagement Activity 2021
 
Description Mediate workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We have organised a Mediate workshop (https://digitalmediasig.github.io/Mediate2021/) as part of the International AAAI Conference on Web and Social Media (ICWSM) on the topic of Misinformation: automation, uptake, and digital governance. The main goal of the workshop is to bring together media practitioners and technologists to discuss new opportunities and obstacles that arise in the modern era of information diffusion, which include the challenges and discoveries related to COVID-19 misinformation.
Year(s) Of Engagement Activity 2021,2022
URL https://digitalmediasig.github.io/Mediate2021/
 
Description Mediate workshop "Misinformation: automation, uptake, and digital governance" at the International AAAI Conference on Web and Social Media (ICWSM), 202 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I was a co-organiser of the second Mediate workshop which was be held virtually on June 7, as part of the International AAAI Conference on Web and Social Media (ICWSM). The main goal of the workshop was to bring together media practitioners and technologists to discuss new opportunities and obstacles that arise in the modern era of information diffusion. We had six invited keynote speakers who shared their perspectives on the three main themes of the workshop. We had two contributions on automated methods tackling misinformation and three contributions on the uptake of automation, discussing potential solutions that can be implemented by social media platforms to combat the spread of misinformation.
Year(s) Of Engagement Activity 2021
URL https://digitalmediasig.github.io/Mediate2021/
 
Description Talk at QMUL EECS department event 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact The EECS department held an event in which everyone gave a small talk about their research to foster collaboration.
Year(s) Of Engagement Activity 2021
 
Description Truth and Trust Online Conference (TTO) 2021 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I was a publicity chair for the TTO 2021 conference. The annual Conference for Truth and Trust Online is organised as a unique collaboration between practitioners, technologists, academics and platforms, to share, discuss, and collaborate on useful technical innovations and research in the space.
Year(s) Of Engagement Activity 2021
URL https://truthandtrustonline.com/tto-2021/conference-for-truth-and-trust-online-2021/
 
Description Tutorial in the Oxford Machine Learning Summer School 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Yulan He was invited to give a tutorial on recent developments in sentiment analysis in the Oxford Machine Learning Summer School which was held in August 2021. The tutorial has attracted over 200 participants. As participants highly praised the tutorial, Yulan was invited to give a tutorial again in the Summer School in August 2022.
Year(s) Of Engagement Activity 2021
URL https://www.oxfordml.school/2021