Neural-Symbolic Reasoning for the Verification of Complex Claims

Lead Research Organisation: University of Cambridge
Department Name: Computer Science and Technology

Abstract

This PhD focuses on the automatic verification of complex textual claims. This task, also called fact-checking, has traditionally been conducted manually in companies, or in the news industry. Recent developments of social media enable the distribution of information from a much larger variety of sources. Distributing unverified information in such a dynamic environment poses a challenge for misinformation and has already affected the world's socio-political landscape. Manual fact-checking is very time-intensive and seems naturally unfit to solve this issue. Thus, increased interest is shown to the automated verification of claims. Yet, while most real-world claims require more complex reasoning, current fact-checking approaches focus only on very simple and short claims. They reduce the task to one of textual entailment (Thorne and Vlachos, 2018), directly comparing the claim with potential evidence. However, to assess the truthfulness of most complex claims correctly, it is necessary to aggregate and combine information about multiple answers and sources. Moreover, the interpretation of these models' decisions is extremely challenging.

The initial aim is to explore the heavily researched task of question answering for assessing the veracity of complex claims. This approach consists of three steps. First, decomposing the claim into smaller pieces by formulating appropriate questions that all play part in answering the claim, as described in (Vlachos and Riedel, 2014). Secondly, question answering models (QA) with incorporated knowledge bases are employed to generate interpretable answers for the posed questions. Finally, the answers to each question are aggregated and combined to assess the truthfulness correctly. For instance, given the claim "Patients with diabetes have different treatment plans depending on the patient's age", answers to questions such as "What types of diabetes exist" or "How does a treatment plan for diabetes look like?" create intermediate answers to assess the complex claim itself.

To tackle the task of generating answerable questions to a given claim, the research will focus on the neural sequence to sequence models (Sutskever et al., 2014). Finding the most suitable questions for a given claim creates a new task as each generated question should relate to answers that add new knowledge. The first step to solving this task will be the creation of a dataset to enable direct supervision. Given a dataset of non-trivial claims and respective answers, selected on the basis of different fact-checking projects, a set of relevant questions to provide answers for sub-claims of a given claim can be manually created. By using crowd-sourcing and a dedicated interface to speed up the process, the creation of a dataset with several thousand samples appears reasonable.

To answer these generated questions, sophisticated QA systems are needed. To incorporate reasoning processes in these models we aim to use multi-hop QA, where multiple documents are required to be combined to reach the correct answer, as my starting point. To further incorporate structured knowledge in form of knowledge bases, we are to explore neural-symbolic reasoning (d'Avila Garcez et al., 2002), which aims to combine both explicit reasoning and neural learning methods to create interpretable answers to queries/input triples. It will be a great challenge to make these models able to handle natural language input.

We then hope to also use neural-symbolic learning to combine the answers for each question to provide an interpretable final verdict on the initial claim. This highly interpretable approach will be efficient in detecting model biases which we then aim to explore.

In addition to its relevance to computational journalism, the results of the work will hopefully increase factuality in other areas such as in scientific publications, in (insurance) contracts, and in medical records.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513180/1 01/10/2018 30/09/2023
2495733 Studentship EP/R513180/1 01/10/2020 16/07/2024 Rami Aly
EP/T517847/1 01/10/2020 30/09/2025
2495733 Studentship EP/T517847/1 01/10/2020 16/07/2024 Rami Aly