Generating Descriptive Sentence Labels for Multinomial Sentiment-bearing Topics (GenSent)

Lead Research Organisation: University of Aberdeen
Department Name: Computing Science

Abstract

Sentiment-topic models are a suite of algorithms whose aim is mine and uncover rich opinion structures from text. The utility of sentiment topic models stems from the fact that the inferred hidden sentiment-bearing topics, represented as a multinomial distribution over words, resemble the opinion information of a collection, which can be used as a lens for exploring and understanding opinions from large archives of unstructured text. However, a major challenge in applying sentiment-topic models for exploratory purposes is to interpret the meaning of the discovered sentiment-bearing topics, which, so far, relied entirely on manual interpretation. In addition, current sentiment-bearing models are not able to facilitate accurate opinion and sentiment understanding. For example, by examining the sentiment-bearing topic "amazon order return ship receive refund damaged disappointed policy unhappy", one can interpret that this topic captures opinions relating to "unsatisfactory online shopping experience". But it is impossible to gain deep insight of the opinion, i.e., whether the sentiment unhappy is only targeted to the product being ordered, or it is also related to Amazon's policy.

A solution to automatic interpretation and labelling of sentiment-bearing topics is most timely because: (i) when applying sentiment-topic models for data exploration, users are forced to interpret the inferred sentiment-bearing topics manually, which is slow and impractical when analysing highly dynamic or large scale data; and (ii) automated tools facilitating accurate opinion understanding is crucial for many practical applications (e.g. cybersecurity and business intelligence), as it allows one to derive knowledge from large amounts of text data and to formulate decisions, converting data into actionable knowledge.

The project aims to push the frontier of sentiment-topic modelling through the development of a novel framework for automated generation of sentence labels that can accurately describe the opinions of multinomial sentiment-bearing topics and are optimally suitable for humans in terms of clarity, brevity and information-richness. The main challenges will be the accurate interpretation of opinions encoded in sentiment-bearing topics and the generation of concise sentence labels which convey the essences of sentiment-bearing topics as much as possible. This is both ambitious and adventurous because: (i) it has already been demonstrated to be a challenging task to automatically labelling standard topics concerning topical information alone (as existing evidence seems to support). Labelling sentiment-bearing topics involves capturing and interpreting semantics from both sentiment and topic dimensions and the dependencies between them, thus adding an additional dimension of complexity for the labelling task; (ii) the two requirements for sentence label generation, i.e., maximal opinion coverage and high conciseness, naturally conflict with each other. How to optimise the trade-off between these two orthogonal objectives for generating a most suitable sentence label is an important scientific question.

Planned Impact

The project will run a programme with a view to maximising the impact of the projects' results to deliver three key impact objectives: (i) sustainable economic impact; (ii) Increasing public awareness and understanding; and (iii) developing project team's research and knowledge transfer skills.

Sustainable economic impact will be delivered by means of a two-staged approach. In the first stage, a series of dissemination activities (magazine article, business event, attendance to and presentation at industry conference) will run to engage with potentially interested businesses. We have already secured interest, support and collaboration from the BBC and Lincedo which will also participate in these events. In the second stage, activities with these already secured partners and possibly other selected partners from stage one will run. The BBC, apart from providing a real-world dataset for the project's use, has committed to provide a researcher to collaborate in the project to push research in this topic forward and establish a long-term mutually-benefiting collaboration base. The commercial exploitation of the framework to be developed in this project will be explored (following Aberdeen University's IP protection policy) with Lincedo. GenSent will also utilise other grant mechanisms (e.g. KTP, TSB) for follow-up collaborations with these businesses.

Raising public awareness and understanding of computational tools for data analytics will be delivered by means of a highly active Social Media strategy to disseminate educational videos and blog entries about how these tools can better support data exploration and understanding as well as to disseminate project results. We will present and discuss our research in an informal setting, e.g. using the the established formats of Cafe Scientifique (cafescientifique.org). In addition, to reach a younger audience, we will participate national science festivals, e.g. the Edinburgh International Science Festival or the British Science Festival. We also want to inspire young people to pursue careers in computing science and data science.

The team will enhance their knowledge transfer skills via regular meetings with the project partners, i.e., the BBC and Lincedo, and will participate in the knowledge transfer activities (business education event, etc.). The RF will be working towards the completion of the specified objectives and will be given opportunities to gain academic network experience by attending leading international conferences, and the PhD student (funded by AU) will gain research skills towards his PhD by working with the PI and other project researchers. The team will also learn new skills by participating in activities described above to promote computational tools for data analytics.
 
Description The work of the project has already had direct impacts on our project partner BBC. BBC published a high impact dataset called Domesday (http://www.bbc.co.uk/history/domesday), generated by the Domesday Reloaded project launched in the 1980s, with an aim to record a snapshot of everyday life across the UK for future generations. Through our opinion summarisation framework developed in this project, we analysed the Domesday dataset (for the first time since the dataset was released), addressing important research questions that are interested by BBC such as "What are the key issues people concern about back in 1980s?", "How our society has changed over the past 30 years?", etc.
First Year Of Impact 2018
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Societal

 
Description Supporting Security Policy with Effective Digital Intervention (SSPEDI)
Amount £756,644 (GBP)
Funding ID EP/P011829/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 03/2017 
End 03/2020
 
Description Aston-talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact I gave a presentation titled "Automatically Labelling Sentiment-bearing Topics with Descriptive Sentence Labels" at the Aston University. This presentation disseminated the research outcome of the GenSent project, and has generated interest to audiences from both academia and industry. Around 25 people attended the presentation.
Year(s) Of Engagement Activity 2018