Natural Language Generation for Low-resource Domains

Lead Research Organisation: Edinburgh Napier University
Department Name: School of Computing

Abstract

It is expected that by 2021, Artificial Intelligence (AI) based dialogue systems such as Amazon's Alexa and Apple's Siri will exceed the earth's population [1]. Such interactive technology products have already become prevalent in many aspects of everyday life, offering support for decision making, education, and health as well as entertainment, by effectively communicating in natural language to answer questions, describe or summarise data, and assist in multiple areas. To develop such systems, however, AI requires access to vast amounts of examples of dialogues, which can (1) be hard to attain in many domains due to unavailability; and (2) pose privacy concerns, impacting user uptake [2]. Current response generation techniques are heavily based on pre-specified templates that limit language coverage. Generating naturally fluent responses is heavily dependant on example dialogues, that are scarcely available in many domains. To address these interlinked challenges, the project will firstly develop natural language generation techniques that are able to learn from limited resources by reusing the knowledge learnt in other data-rich domains, similar to the way the human brain learns new skills efficiently by building on prior knowledge. Secondly, we will develop novel privacy-preserving AI methods to address the second important challenge, and eliminate the risk for de-anonymisation of data.

Although recent advances in understanding natural language have made it possible to accurately predict the meaning of users' utterances and hence accurately inform the personal assistants' actions, responding in natural language remains a bottleneck for the current generation of dialogue systems and personal assistants. As more interactive systems generating natural language become available, the need for natural variability and novelty in the generated text becomes significant in order to increase end-user satisfaction and engagement. Therefore the project will also develop AI approaches that generate text that shows novelty and variability for enriching the word choice while keeping the semantics of the generated text unchanged. Finally, many real-world applications such as personal assistants (and also chatbots and social robots) that support health or education, will benefit from generated responses that show empathy and adapt to users' psychological state. This requires a deep understanding of emotions from text, therefore, this project will, for the first time, develop and integrate innovative, natural language 'concept' based approaches, to understand user emotions from underlying text, and inform novel text generation approaches. Practical case studies provided by our industrial partners will be used to validate our developed AI approaches, throughout this ambitious project.

References:
[1] https://ovum.informa.com/resources/product-content/virtual-digital-assistants-to-overtake-world-population-by-2021
[2] https://www.independent.co.uk/life-style/gadgets-and-tech/news/amazon-alexa-echo-listening-spy-security-a8865056.html

Planned Impact

This multi-disciplinary project will have impacts beyond academia.

*Privacy-preserving personal assistants/NLG systems*

As the use of personal assistants grows globally, the need for privacy-preserving approaches to NLG increases. Most industries handle sensitive information such as personal and private data and although data scientists strive to anonymise data records, sophisticated de-anonymisation approaches can pose a threat not only to privacy but also current legislation. Therefore, the impact of innovative approaches that respect privacy and ethical constraints will be enormous.

*Increase productivity by automating the descriptions of products*

Additionally, industries that rely on online presence will benefit from approaches that automatically generate text summaries from structured data, such as descriptions of products and services, since these approaches can increase productivity by automating repetitive and laborious tasks. In addition, diversity-enriched NLG approaches can make the content of automatically generated summaries more interesting and less repetitive, and hence increase the overall user experience.

*Automatic narrative and report generation*

Narrative generation from data, such as automatic news story generation, will benefit from more natural NLG approaches that offer variability and empathy as stories can be enriched with emotion, and interesting non-repetitive text. Business intelligence and analytics reporting will also benefit from the approaches developed here, as privacy is integral when communicating data and insights. In addition, NLG has been shown to enhance decision making support [5].

*Support Health and Well-being*

AI-powered personal assistants, such as the Alli-chat developed by our project partner, have started to become prevalent in health support [1]. It can also be preferable for supporting specific groups such as younger people for stigma-attached conditions such as mental health. Younger people are particularly less likely to seek help when facing mental health challenges [4], therefore, it is of vital importance to create a safe space for younger people that empowers them to seek private advice and information regarding mental well-being which can be achieved through trusted, privacy-sensitive, empathy-enriched personal assistants. Promotion of mental well-being, management, and prevention of mental health illnesses has been indeed identified as a core priority of the World Health Organisation's mental health action plan 2013-2020 [2] as well as in NHS's "Five Years Forward View" for mental health [3].

References:
[1] https://www.healthcareitnews.com/news/special-report-ai-voice-assistants-making-impact-healthcare
[2] https://www.who.int/mental_health/publications/action_plan/en/
[3] https://www.england.nhs.uk/wp-content/uploads/2014/10/5yfv-web.pdf
[4] Marcus, M. A., Westra, H. A., Eastwood, J. D., Barnes, K. L., & Mobilizing Minds Research Group (2012). What are young adults saying about mental health? An analysis of Internet blogs. Journal of medical Internet research, 14(1), e17. doi:10.2196/jmir.1868
[5] Gkatzia et al. (2017). Data-to-Text Generation Improves Decision-Making Under Uncertainty. IEEE Computational Intelligence Magazine, Special Issue on Natural Language Generation with Computational Intelligence.

Publications

10 25 50
 
Description Enhancing Labour Market Intelligence using Machine Learning
Amount £60,000 (GBP)
Organisation Skills Development Scotland 
Sector Public
Country United Kingdom
Start 08/2021 
End 10/2025
 
Description Scottish Gaelic Generation for Exhibitions
Amount £5,000 (GBP)
Organisation Arts & Humanities Research Council (AHRC) 
Sector Public
Country United Kingdom
Start 02/2022 
End 09/2022
 
Description Sentinel: Security alert level automation
Amount £5,000 (GBP)
Organisation Government of Scotland 
Department Scottish Funding Council
Sector Public
Country United Kingdom
Start 11/2021 
End 01/2022
 
Title CEC - Commonsense Evaluation Card 
Description The Commonsense Evaluation Card (CEC) aims to standardise human evaluation and reporting of commonsense-enhanced NLG systems, enabling researchers to compare models not only in terms of classic NLG quality criteria, but also by focusing on the core capabilities of such models. 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact This tool has helped in better documenting experiments related to commonsense knowledge. 
URL https://nlgknowledge.github.io/commonsense/
 
Description Multi-party collaboration on Providing Recommendations of Error Analysis of NLG systems 
Organisation Charles University
Country Czech Republic 
Sector Academic/University 
PI Contribution This is a multi-partners collaboration between Edinburgh Napier, Heriot-Watt University, trivago, Charles University in Prague, and others. All partners worked together to analyse the state of error reporting of NLG systems and provide recommendations so that future NLG publications discuss both the benefits but also the errors made by the systems with the aim to focus on bettering these aspects.
Collaborator Contribution All partners worked together to analyse current trends in error reporting and provide recommendations on how error analysis in NLG systems should be performed with the aim to understand the limitations of current scientific advances.
Impact Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondrej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson and Luou Wen. (2021). Underreporting of errors in NLG output, and what to do about it. In INLG 2021.
Start Year 2021
 
Description Multi-party collaboration on Providing Recommendations of Error Analysis of NLG systems 
Organisation Georgetown University
Country United States 
Sector Academic/University 
PI Contribution This is a multi-partners collaboration between Edinburgh Napier, Heriot-Watt University, trivago, Charles University in Prague, and others. All partners worked together to analyse the state of error reporting of NLG systems and provide recommendations so that future NLG publications discuss both the benefits but also the errors made by the systems with the aim to focus on bettering these aspects.
Collaborator Contribution All partners worked together to analyse current trends in error reporting and provide recommendations on how error analysis in NLG systems should be performed with the aim to understand the limitations of current scientific advances.
Impact Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondrej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson and Luou Wen. (2021). Underreporting of errors in NLG output, and what to do about it. In INLG 2021.
Start Year 2021
 
Description Multi-party collaboration on Providing Recommendations of Error Analysis of NLG systems 
Organisation Heriot-Watt University
Country United Kingdom 
Sector Academic/University 
PI Contribution This is a multi-partners collaboration between Edinburgh Napier, Heriot-Watt University, trivago, Charles University in Prague, and others. All partners worked together to analyse the state of error reporting of NLG systems and provide recommendations so that future NLG publications discuss both the benefits but also the errors made by the systems with the aim to focus on bettering these aspects.
Collaborator Contribution All partners worked together to analyse current trends in error reporting and provide recommendations on how error analysis in NLG systems should be performed with the aim to understand the limitations of current scientific advances.
Impact Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondrej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson and Luou Wen. (2021). Underreporting of errors in NLG output, and what to do about it. In INLG 2021.
Start Year 2021
 
Description Multi-party collaboration on Providing Recommendations of Error Analysis of NLG systems 
Organisation Trivago NV
Country Germany 
Sector Private 
PI Contribution This is a multi-partners collaboration between Edinburgh Napier, Heriot-Watt University, trivago, Charles University in Prague, and others. All partners worked together to analyse the state of error reporting of NLG systems and provide recommendations so that future NLG publications discuss both the benefits but also the errors made by the systems with the aim to focus on bettering these aspects.
Collaborator Contribution All partners worked together to analyse current trends in error reporting and provide recommendations on how error analysis in NLG systems should be performed with the aim to understand the limitations of current scientific advances.
Impact Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondrej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson and Luou Wen. (2021). Underreporting of errors in NLG output, and what to do about it. In INLG 2021.
Start Year 2021
 
Description Multi-party collaboration on Providing Recommendations of Error Analysis of NLG systems 
Organisation University of Aberdeen
Country United Kingdom 
Sector Academic/University 
PI Contribution This is a multi-partners collaboration between Edinburgh Napier, Heriot-Watt University, trivago, Charles University in Prague, and others. All partners worked together to analyse the state of error reporting of NLG systems and provide recommendations so that future NLG publications discuss both the benefits but also the errors made by the systems with the aim to focus on bettering these aspects.
Collaborator Contribution All partners worked together to analyse current trends in error reporting and provide recommendations on how error analysis in NLG systems should be performed with the aim to understand the limitations of current scientific advances.
Impact Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondrej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson and Luou Wen. (2021). Underreporting of errors in NLG output, and what to do about it. In INLG 2021.
Start Year 2021
 
Description Multi-party collaboration on Providing Recommendations of Error Analysis of NLG systems 
Organisation University of Helsinki
Country Finland 
Sector Academic/University 
PI Contribution This is a multi-partners collaboration between Edinburgh Napier, Heriot-Watt University, trivago, Charles University in Prague, and others. All partners worked together to analyse the state of error reporting of NLG systems and provide recommendations so that future NLG publications discuss both the benefits but also the errors made by the systems with the aim to focus on bettering these aspects.
Collaborator Contribution All partners worked together to analyse current trends in error reporting and provide recommendations on how error analysis in NLG systems should be performed with the aim to understand the limitations of current scientific advances.
Impact Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondrej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson and Luou Wen. (2021). Underreporting of errors in NLG output, and what to do about it. In INLG 2021.
Start Year 2021
 
Description Multi-party collaboration on Providing Recommendations of Error Analysis of NLG systems 
Organisation University of Tilburg
Country Netherlands 
Sector Academic/University 
PI Contribution This is a multi-partners collaboration between Edinburgh Napier, Heriot-Watt University, trivago, Charles University in Prague, and others. All partners worked together to analyse the state of error reporting of NLG systems and provide recommendations so that future NLG publications discuss both the benefits but also the errors made by the systems with the aim to focus on bettering these aspects.
Collaborator Contribution All partners worked together to analyse current trends in error reporting and provide recommendations on how error analysis in NLG systems should be performed with the aim to understand the limitations of current scientific advances.
Impact Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondrej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson and Luou Wen. (2021). Underreporting of errors in NLG output, and what to do about it. In INLG 2021.
Start Year 2021
 
Description Multi-party collaboration on Providing Recommendations of Error Analysis of NLG systems 
Organisation University of Virginia (UVa)
Country United States 
Sector Academic/University 
PI Contribution This is a multi-partners collaboration between Edinburgh Napier, Heriot-Watt University, trivago, Charles University in Prague, and others. All partners worked together to analyse the state of error reporting of NLG systems and provide recommendations so that future NLG publications discuss both the benefits but also the errors made by the systems with the aim to focus on bettering these aspects.
Collaborator Contribution All partners worked together to analyse current trends in error reporting and provide recommendations on how error analysis in NLG systems should be performed with the aim to understand the limitations of current scientific advances.
Impact Emiel van Miltenburg, Miruna-Adriana Clinciu, Ondrej Dušek, Dimitra Gkatzia, Stephanie Inglis, Leo Leppänen, Saad Mahamood, Emma Manning, Stephanie Schoch, Craig Thomson and Luou Wen. (2021). Underreporting of errors in NLG output, and what to do about it. In INLG 2021.
Start Year 2021
 
Description Multi-party collaboration on Scottish Gaelic Language Generation 
Organisation University of Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution TBA
Collaborator Contribution TBA
Impact Funding from AHRC for a data collection
Start Year 2022
 
Description Multi-party collaboration/study on Evaluation of Commonsense-enhanced NLG systems 
Organisation Heriot-Watt University
Country United Kingdom 
Sector Academic/University 
PI Contribution TBA
Collaborator Contribution TBA
Impact Miruna-Adriana Clinciu, Dimitra Gkatzia, Saad Mahamood. 2021. It's Commonsense, isn't it? Demystifying Human Evaluations in Commonsense-Enhanced NLG Systems. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval) at EACL 2021.
Start Year 2021
 
Description Multi-party collaboration/study on Evaluation of Commonsense-enhanced NLG systems 
Organisation Trivago NV
Country Germany 
Sector Private 
PI Contribution TBA
Collaborator Contribution TBA
Impact Miruna-Adriana Clinciu, Dimitra Gkatzia, Saad Mahamood. 2021. It's Commonsense, isn't it? Demystifying Human Evaluations in Commonsense-Enhanced NLG Systems. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval) at EACL 2021.
Start Year 2021
 
Description Invited seminar talk at the National Research Council of Canada. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact David Howcroft presented "Disentangling 20 years of confusion in NLG: toward standards for human evaluation" at the National Research Council of Canada's Natural Language Processing seminar, having been invited by Cyril Goutte. The discussion included useful similarities between evaluation for Natural Language Generation (NLG) and machine translation in particular, including gaps in terms of designing studies to measure the preferences of individual target groups as well as discussions of performing evaluation in low-resource settings.
Year(s) Of Engagement Activity 2021
 
Description Professorial Talk at Edinburgh Napier University open day 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Undergraduate students
Results and Impact Around 160 students attended my professorial talk on "How close are we to achieving Human-like AI? From Eliza to Alexa and beyond", which described the current state of dialogue systems and natural language generation, discussed the limitations of current systems, and discussed the "misinformation" about AI as presented in media. The talk sparked a vivid discussion in the area.
Year(s) Of Engagement Activity 2021