Readers: Evaluation and Development of Reading Systems

Lead Research Organisation: University of Edinburgh

Department Name: Sch of Informatics

Abstract

Machine reading aims to extract knowledge from unstructured text with little human effort. It has been a major goal of AI since its early days. The ever growing amounts of textual data available over the internet further increase the importance and urgency of computer-based methods for knowledge extraction. The success of machine reading will not only help breach the knowledge acquisition bottleneck in AI, but also revolutionize Web search, information extraction, and the automatic construction of resources such as Wikipedia.

In the past, there has been a lot of progress in automating many substasks of machine reading using standard NLP technology such as tagging and parsing. However, end-to-end solutions are still rare, and existing systems typically require substantial human effort in manual engineering and/or labeling examples. As a result, they often target restricted domains and only extract limited types of knowledge (e.g., a pre-specified relation).

In this project we aim to develop an end-to-end system that operates over raw text, extracts knowledge and is able to answer questions and support other end tasks. A key insight in our approach is the use of unsupervised methods that do not rely on large amounts of hand annotation for the acquisition of background knowledge, its linking to existing knowledge bases, and the creation of new ones. Our approach will acquire knowledge at Web-scale, be open to arbitrary domains, genres, and languages.It will constantly integrate new information sources (e.g., new text documents) and learn from user questions and feedback (e.g., via performing end tasks).

Planned Impact

With the rapidly growing amounts of textual data being stored by businesses and available over the Internet, it is becoming increasingly important to develop improved computer-based methods for document access, filtering, and content extraction. The ability to automatically extract and meaningfully organize semantic knowledge carries much practical import for a wide range of applications including question answering, summarization, and information extraction and retrieval. The ultimate goal of this project is to develop Machine Reading technology through the acquisition and amalgamation of large amounts of background knowledge from unstructured text. Several key innovations render the scientific impact of the proposed work high: (1) the deployment of unsupervised methods for the knowledge extraction task in a multilingual setting, (2) the development of novel methods for the organization of knowledge in a systematic fashion, and (3) the use of a rigorous evaluation framework that will create benchmarks for the evaluation of Machine Reading technology.

The proposed project opens up new ground and challenges for the deployment of large scale Machine Reading systems. As the underlying methodology will be primarily unsupervised, the methods developed in this project will be portable across domains, languages, and texts. Furthermore, we hope to make important inroads in the application of such methods on large scale. We will achieve this through technical innovation and dissemination of our technology to the scientific community. Besides Machine Reading, the research undertaken in this project will be beneficial to a range of related tasks such as Textual Entailment (Szpektor and Dagan, 2008), single and multi-document Automatic Summarization (Nenkova and McKeown, 2011), Semantic Parsing (Poon and Domingos, 2010), and Question Answering (Harabagiu et al., 2003).

Finally, there is a a strong need of evaluation methodologies able to measure progress in technologies related to Machine Reading across domains and languages. We expect that the evaluation methodologies and protocols developed in this project, together with the coordination of an international evaluation program will be of high value to the scientific community. As part of this evaluation exercise, we will release benchmark datasets which we anticipate will be useful to researchers and industrial practitioners alike.

Funded Value:

£296,868

Funded Period:

Mar 13 - May 16

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/K017845/1

Principal Investigator:

Mirella Lapata

Research Topic:

Unclassified

Organisations

University of Edinburgh (Lead Research Organisation)

People	ORCID iD
Mirella Lapata (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Lang J (2014) Similarity-Driven Semantic Role Induction via Graph Partitioning in Computational Linguistics

Lopez De Lacalle M (2016) Predicate Matrix: automatically extending the semantic interoperability between predicate resources in Language Resources and Evaluation

Reddy S (2014) Large-scale Semantic Parsing without Question-Answer Pairs in Transactions of the Association for Computational Linguistics

Roth M (2015) Context-aware Frame-Semantic Role Labeling in Transactions of the Association for Computational Linguistics

Roth M (2014) Composition of Word Representations Improves Semantic Role Labelling

Woodsend K (2014) Text Rewriting Improves Semantic Role Labeling in Journal of Artificial Intelligence Research

Woodsend K. (2017) Text rewriting improves semantic role labeling in IJCAI International Joint Conference on Artificial Intelligence

Key Findings


Description	- Using paraphrases can enhance the performance of semantic role labeling. - Linguistic formalisms like CCG are key to large scale robust machine reading and question answering.
Exploitation Route	They have informed the creation of QA systems in acedemia and in industry.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	http://nlp.uned.es/readers-project/demos.html

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications