Generating Descriptive Sentence Labels for Multinomial Sentiment-bearing Topics (GenSent)

Lead Research Organisation: University of Aberdeen
Department Name: Computing Science

Abstract

Sentiment-topic models are a suite of algorithms whose aim is mine and uncover rich opinion structures from text. The utility of sentiment topic models stems from the fact that the inferred hidden sentiment-bearing topics, represented as a multinomial distribution over words, resemble the opinion information of a collection, which can be used as a lens for exploring and understanding opinions from large archives of unstructured text. However, a major challenge in applying sentiment-topic models for exploratory purposes is to interpret the meaning of the discovered sentiment-bearing topics, which, so far, relied entirely on manual interpretation. In addition, current sentiment-bearing models are not able to facilitate accurate opinion and sentiment understanding. For example, by examining the sentiment-bearing topic "amazon order return ship receive refund damaged disappointed policy unhappy", one can interpret that this topic captures opinions relating to "unsatisfactory online shopping experience". But it is impossible to gain deep insight of the opinion, i.e., whether the sentiment unhappy is only targeted to the product being ordered, or it is also related to Amazon's policy.

A solution to automatic interpretation and labelling of sentiment-bearing topics is most timely because: (i) when applying sentiment-topic models for data exploration, users are forced to interpret the inferred sentiment-bearing topics manually, which is slow and impractical when analysing highly dynamic or large scale data; and (ii) automated tools facilitating accurate opinion understanding is crucial for many practical applications (e.g. cybersecurity and business intelligence), as it allows one to derive knowledge from large amounts of text data and to formulate decisions, converting data into actionable knowledge.

The project aims to push the frontier of sentiment-topic modelling through the development of a novel framework for automated generation of sentence labels that can accurately describe the opinions of multinomial sentiment-bearing topics and are optimally suitable for humans in terms of clarity, brevity and information-richness. The main challenges will be the accurate interpretation of opinions encoded in sentiment-bearing topics and the generation of concise sentence labels which convey the essences of sentiment-bearing topics as much as possible. This is both ambitious and adventurous because: (i) it has already been demonstrated to be a challenging task to automatically labelling standard topics concerning topical information alone (as existing evidence seems to support). Labelling sentiment-bearing topics involves capturing and interpreting semantics from both sentiment and topic dimensions and the dependencies between them, thus adding an additional dimension of complexity for the labelling task; (ii) the two requirements for sentence label generation, i.e., maximal opinion coverage and high conciseness, naturally conflict with each other. How to optimise the trade-off between these two orthogonal objectives for generating a most suitable sentence label is an important scientific question.

Planned Impact

The project will run a programme with a view to maximising the impact of the projects' results to deliver three key impact objectives: (i) sustainable economic impact; (ii) Increasing public awareness and understanding; and (iii) developing project team's research and knowledge transfer skills.

Sustainable economic impact will be delivered by means of a two-staged approach. In the first stage, a series of dissemination activities (magazine article, business event, attendance to and presentation at industry conference) will run to engage with potentially interested businesses. We have already secured interest, support and collaboration from the BBC and Lincedo which will also participate in these events. In the second stage, activities with these already secured partners and possibly other selected partners from stage one will run. The BBC, apart from providing a real-world dataset for the project's use, has committed to provide a researcher to collaborate in the project to push research in this topic forward and establish a long-term mutually-benefiting collaboration base. The commercial exploitation of the framework to be developed in this project will be explored (following Aberdeen University's IP protection policy) with Lincedo. GenSent will also utilise other grant mechanisms (e.g. KTP, TSB) for follow-up collaborations with these businesses.

Raising public awareness and understanding of computational tools for data analytics will be delivered by means of a highly active Social Media strategy to disseminate educational videos and blog entries about how these tools can better support data exploration and understanding as well as to disseminate project results. We will present and discuss our research in an informal setting, e.g. using the the established formats of Cafe Scientifique (cafescientifique.org). In addition, to reach a younger audience, we will participate national science festivals, e.g. the Edinburgh International Science Festival or the British Science Festival. We also want to inspire young people to pursue careers in computing science and data science.

The team will enhance their knowledge transfer skills via regular meetings with the project partners, i.e., the BBC and Lincedo, and will participate in the knowledge transfer activities (business education event, etc.). The RF will be working towards the completion of the specified objectives and will be given opportunities to gain academic network experience by attending leading international conferences, and the PhD student (funded by AU) will gain research skills towards his PhD by working with the PI and other project researchers. The team will also learn new skills by participating in activities described above to promote computational tools for data analytics.
 
Description Back in 1985, BBC initiated an ambitious project called the Domesday Project, which aims to document the daily life of people in the UK. Over a million people took part in this project. In the GenSent project funded by EPSRC, we developed an automatic framework for automatically labelling sentiment-bearing topics extracted by sentiment-topic models, and we applied our framework to analyse the Domesday dataset provided by BBC w.r.t. the following questions: (1) what are the key issues people concern about back in the 1980s?
(2) What are people's opinions towards those issues? (3) and in what way our society has changed? We discovered a wide range of interesting topics that people were concerned about including war, unemployment, miner strikes, etc.
Exploitation Route Our developed framework can be directly applied for text mining and automatic text summarisation. Our findings from the Domesday dataset will be very useful for social scientists.
Sectors Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Healthcare,Government, Democracy and Justice,Culture, Heritage, Museums and Collections

 
Description Supporting Security Policy with Effective Digital Intervention (SSPEDI)
Amount £756,644 (GBP)
Funding ID EP/P011829/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 03/2017 
End 03/2020
 
Description Aston-talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact I gave a presentation titled "Automatically Labelling Sentiment-bearing Topics with Descriptive Sentence Labels" at the Aston University. This presentation disseminated the research outcome of the GenSent project, and has generated interest to audiences from both academia and industry. Around 25 people attended the presentation.
Year(s) Of Engagement Activity 2018
 
Description Presentation at Tencent 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact I gave a presentation titled "Extractive and Abstractive Sentence Labelling of Sentiment-bearing Topics" to the NLP group of Tencent AI lab Seattle. This presentation disseminated the research outcome of the GenSent project and has generated interest to audiences from the industry. Around 15 people attended the presentation.
Year(s) Of Engagement Activity 2018