Generating Descriptive Sentence Labels for Multinomial Sentiment-bearing Topics (GenSent)

Lead Research Organisation: University of Aberdeen

Department Name: Computing Science

Abstract

Sentiment-topic models are a suite of algorithms whose aim is mine and uncover rich opinion structures from text. The utility of sentiment topic models stems from the fact that the inferred hidden sentiment-bearing topics, represented as a multinomial distribution over words, resemble the opinion information of a collection, which can be used as a lens for exploring and understanding opinions from large archives of unstructured text. However, a major challenge in applying sentiment-topic models for exploratory purposes is to interpret the meaning of the discovered sentiment-bearing topics, which, so far, relied entirely on manual interpretation. In addition, current sentiment-bearing models are not able to facilitate accurate opinion and sentiment understanding. For example, by examining the sentiment-bearing topic "amazon order return ship receive refund damaged disappointed policy unhappy", one can interpret that this topic captures opinions relating to "unsatisfactory online shopping experience". But it is impossible to gain deep insight of the opinion, i.e., whether the sentiment unhappy is only targeted to the product being ordered, or it is also related to Amazon's policy.

A solution to automatic interpretation and labelling of sentiment-bearing topics is most timely because: (i) when applying sentiment-topic models for data exploration, users are forced to interpret the inferred sentiment-bearing topics manually, which is slow and impractical when analysing highly dynamic or large scale data; and (ii) automated tools facilitating accurate opinion understanding is crucial for many practical applications (e.g. cybersecurity and business intelligence), as it allows one to derive knowledge from large amounts of text data and to formulate decisions, converting data into actionable knowledge.

The project aims to push the frontier of sentiment-topic modelling through the development of a novel framework for automated generation of sentence labels that can accurately describe the opinions of multinomial sentiment-bearing topics and are optimally suitable for humans in terms of clarity, brevity and information-richness. The main challenges will be the accurate interpretation of opinions encoded in sentiment-bearing topics and the generation of concise sentence labels which convey the essences of sentiment-bearing topics as much as possible. This is both ambitious and adventurous because: (i) it has already been demonstrated to be a challenging task to automatically labelling standard topics concerning topical information alone (as existing evidence seems to support). Labelling sentiment-bearing topics involves capturing and interpreting semantics from both sentiment and topic dimensions and the dependencies between them, thus adding an additional dimension of complexity for the labelling task; (ii) the two requirements for sentence label generation, i.e., maximal opinion coverage and high conciseness, naturally conflict with each other. How to optimise the trade-off between these two orthogonal objectives for generating a most suitable sentence label is an important scientific question.

Planned Impact

The project will run a programme with a view to maximising the impact of the projects' results to deliver three key impact objectives: (i) sustainable economic impact; (ii) Increasing public awareness and understanding; and (iii) developing project team's research and knowledge transfer skills.

Sustainable economic impact will be delivered by means of a two-staged approach. In the first stage, a series of dissemination activities (magazine article, business event, attendance to and presentation at industry conference) will run to engage with potentially interested businesses. We have already secured interest, support and collaboration from the BBC and Lincedo which will also participate in these events. In the second stage, activities with these already secured partners and possibly other selected partners from stage one will run. The BBC, apart from providing a real-world dataset for the project's use, has committed to provide a researcher to collaborate in the project to push research in this topic forward and establish a long-term mutually-benefiting collaboration base. The commercial exploitation of the framework to be developed in this project will be explored (following Aberdeen University's IP protection policy) with Lincedo. GenSent will also utilise other grant mechanisms (e.g. KTP, TSB) for follow-up collaborations with these businesses.

Raising public awareness and understanding of computational tools for data analytics will be delivered by means of a highly active Social Media strategy to disseminate educational videos and blog entries about how these tools can better support data exploration and understanding as well as to disseminate project results. We will present and discuss our research in an informal setting, e.g. using the the established formats of Cafe Scientifique (cafescientifique.org). In addition, to reach a younger audience, we will participate national science festivals, e.g. the Edinburgh International Science Festival or the British Science Festival. We also want to inspire young people to pursue careers in computing science and data science.

The team will enhance their knowledge transfer skills via regular meetings with the project partners, i.e., the BBC and Lincedo, and will participate in the knowledge transfer activities (business education event, etc.). The RF will be working towards the completion of the specified objectives and will be given opportunities to gain academic network experience by attending leading international conferences, and the PhD student (funded by AU) will gain research skills towards his PhD by working with the PI and other project researchers. The team will also learn new skills by participating in activities described above to promote computational tools for data analytics.

Funded Value:

£100,748

Funded Period:

Feb 17 - Aug 18

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/P005810/1

Principal Investigator:

Chenghua Lin

Research Subject:

Info. & commun. Technol. (60%)

Linguistics (40%)

Research Topic:

Artificial Intelligence (60%)

Comput./Corpus Linguistics (40%)

Organisations

People	ORCID iD
Chenghua Lin (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Barawi M (2017) Natural Language Processing and Information Systems

Barawi M (2019) Extractive and Abstractive Sentence Labelling of Sentiment- bearing Topics in Frontiers of Computer Science

Barawi M. (2017) Automatically Labelling Sentiment-bearing Topics with Descriptive Sentence Labels

Ibeke E (2019) A unified latent variable model for contrastive opinion mining in Frontiers of Computer Science

Ibeke E. (2017) Extracting and understanding contrastive opinion through topic relevant sentences

Mao R (2018) ABDN at SemEval-2018 Task 10: Recognising Discriminative Attributes using Context Embeddings and WordNet

Mao R (2018) Word Embedding and WordNet Based Metaphor Identification and Interpretation

Yusof N (2018) Natural Language Processing and Information Systems

Yusof N. (2017) Encyclopedia of Social Network Analysis and Mining (2nd Edition)

Yusof N. (2017) Analysing the Causes of Depressed Mood from Depression Vulnerable Individuals

Key Findings
Impact Summary
Further Funding
Engagement Activities


Description	Back in 1985, BBC initiated an ambitious project called the Domesday Project, which aims to document the daily life of people in the UK. Over a million people took part in this project. In the GenSent project funded by EPSRC, we developed an automatic framework for automatically labelling sentiment-bearing topics extracted by sentiment-topic models, and we applied our framework to analyse the Domesday dataset provided by BBC w.r.t. the following questions: (1) what are the key issues people concern about back in the 1980s? (2) What are people's opinions towards those issues? (3) and in what way our society has changed? We discovered a wide range of interesting topics that people were concerned about including war, unemployment, miner strikes, etc.
Exploitation Route	Our developed framework can be directly applied for text mining and automatic text summarisation. Our findings from the Domesday dataset will be very useful for social scientists.
Sectors	Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Healthcare,Government, Democracy and Justice,Culture, Heritage, Museums and Collections


Description	The outcomes of the project have been utilised by an SME specialised in data analytics, e.g., extracting insights and intelligence on a company's, sales, marketing, and recruitment activities based on multi-channel data sources. There have been 10 publications from the grant (including a Best Paper Award from NLDB 2017) and a number of engagement activities with both industry and academia.
First Year Of Impact	2021
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Societal,Economic


Description	Supporting Security Policy with Effective Digital Intervention (SSPEDI)
Amount	£756,644 (GBP)
Funding ID	EP/P011829/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	03/2017
End	03/2020


Description	Aston-talk
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	I gave a presentation titled "Automatically Labelling Sentiment-bearing Topics with Descriptive Sentence Labels" at the Aston University. This presentation disseminated the research outcome of the GenSent project, and has generated interest to audiences from both academia and industry. Around 25 people attended the presentation.
Year(s) Of Engagement Activity	2018


Description	Presentation at Tencent
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	I gave a presentation titled "Extractive and Abstractive Sentence Labelling of Sentiment-bearing Topics" to the NLP group of Tencent AI lab Seattle. This presentation disseminated the research outcome of the GenSent project and has generated interest to audiences from the industry. Around 15 people attended the presentation.
Year(s) Of Engagement Activity	2018

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications