Affective Mechanisms for Modelling Lifelong Human-Robot Relationships

Lead Research Organisation: University of Cambridge
Department Name: Computer Science and Technology

Abstract

As robots become an integral part of human life, it is important that they are equipped with enhanced interaction capabilities. Human-Robot Interaction (HRI) research for Social Robots has thus gained momentum with researchers focussing on making these interactions as smooth and natural as possible. It is important for robots to become natural extensions to their human environment, allowing them to hold extended interactions with users, repeatedly. Emotional Intelligence (EIQ) is central to human-human interactions, adding meaning and context. EIQ is therefore indispensable for naturalistic and engaging human-robot interactions, enabling robots to adapt their responses and provide their users with personalised interaction experiences.

Although most of the current HRI studies embed emotion recognition capabilities in robots, they rely on frame-based absolute annotations. These are limited to a handful of disjointed emotional categories such as anger, happiness or sadness, with little to no overlap amongst them. This broad generalisation with respect to emotions seems counter-intuitive when we look at how humans interact with each other and express emotions. Human emotions develop over time and vary with individuals, interaction partners or environments. It is thus beneficial to adopt a continuous view of emotions which allows us to map the valence (the positive or negative nature of an emotion), as well as its intensity, providing smoother transitions. It is also important to model emotions in a developmental and evolving manner where a series of evaluations over time yield a robust model of the affective context in an interaction. This emotional understanding will enable robots to form intrinsic affective responses as an evaluation of its state in the interaction. Based on these evaluations, it shall learn to interact with users while performing different tasks under various HRI scenarios.

To address these open questions, this research will focus on modelling long-term relationships between humans and companion robots using deep and hybrid neural architectures. Multi-modal emotion perception techniques will be devised combining multiple modalities such as vision and speech. The robot shall use this perception to incrementally learn the emotional context of its interaction with users by monitoring their responses. Evolving neural representations that model short-term as well as the long-term impact of such an affective interaction shall form the basis for learning optimal behaviour under different environmental conditions. This understanding shall also develop as the robot interacts with different users, generalising its learning in the process. New reinforcement learning mechanisms shall be investigated to achieve lifelong adaptation of robot behaviour in various HRI contexts. Interacting with different user groups, the robot shall learn to assist/coach them in performing complex cognitive tasks such as playing collaborative/competitive games for cognitive training while focusing on their mental health and cognitive development.

This research, aligning itself with the primary supervisor's EPSRC grant Adaptive Robotic EQ for Well-being (ARoEQ), aims to develop a holistic and autonomous system for emotional understanding and robot behaviour modelling, attempting to move away from Wizard-of-Oz approaches. Equipping companion robots with such an affective understanding will enable them to engage users in cognitive tasks using affective interaction capabilities. Inspired by the central principles of affective computing, this PhD project shall (i) bridge the gap between feature-dependent computational models and the deeper psychological and cognitive understanding of human factors and (ii) build a holistic model for actualising affective behaviour in robots for assisting humans.
 
Description This research investigates Continual or Lifelong Learning (CL) as a learning paradigm for affective robots to develop long-term Human-Robot Interaction (HRI) solutions. This requires not only learning to be sensitive towards individual user behaviour but also adapting robot responses to learn complement such personalisation.

Towards this aim, our first work focussed on developing a novel CL-based learning framework (CLIFER) for Facial Expression Recognition that enables person-specific learning and adaptation. To the best of our knowledge, the proposed framework was the first implementation of the continual learning paradigm for facial expression recognition. The framework was able to incrementally learn to recognise different expressions for each subject in the dataset. Our key findings from this work were as follows:

1. CL offers an effective learning paradigm to develop personalised facial expression recognition systems. The proposed framework is able to learn to recognise facial expression, one expression class at a time, without forgetting previously learnt expressions.
2. Simulating synthetic person-specific data using a generative model greatly enhances the model's performance on both new and previously learnt expression classes.
3. The order in which the model learns the expression classes has a significant impact on model performance. Starting to recognise neutral (no expression) faces first and then learning different expressions enhances the model's performance.

One of the key challenges identified from our experience working on the CLIFER framework was the lack of formal definitions that could be leveraged to implement and evaluate CL solutions for affective robots. Thus, in our second work, we focused on providing a formalisation of CL as a learning paradigm for affective robotics research. We reformulated personalisation towards individual affective expression and context-specific behavioural learning in robots as CL problems. We proposed a theoretical framework that can allow affective robots to continually learn and adapt as they interact with the users. Such lifelong learning is essential for modelling long-term human-robot interactions where agents learn and adapt towards individual users based on their preferences. Our key findings from this work were as follows:
1. Modelling user-specific attributes such as user preferences, contextual attributions or personality-specific traits can allow models to form semantic groupings of users and learn appropriate robot behaviours.
2. Conducting interactions with the robot under context-neutral settings during introductions allows the robot form normative baselines. These baselines can act as anchors for user behaviour and enable the robot to easily sense deviations from this norm.
3. Real-world application of CL solutions entails a memory vs. compute trade-off. To balance this, robots can make use of cloud-based solutions or utilise forgetting mechanisms on unused memory locations or parts of the model.

Another challenge for machine learning-based models for affect perception is handling biases. Fair and unbiased analysis and interpretation of human affective behaviour are among the factors that can contribute to the realisation of effective long-term HRI. Successful long-term HRI can be used to provide physical and emotional support to the users, engaging them in a variety of application domains that require personalised human-robot interaction, including healthcare, education and entertainment. Yet, as training FER models becomes heavily data-dependent, it may be prone to biases originating from imbalances in the data distribution with respect to attributes like gender, race, age or skin-color, implicitly encoded in the data. To address this, in our next work, we proposed the novel application CL, benefiting from its ability for lifelong learning, to develop fairer FER models for affective robots that can balance learning with respect to different attributes of gender and race. The key outcomes from this work were:
1. Learning to recognise facial expressions, one domain (gender or race) group at a time, applying continual learning principles enhances fairness in the models.
2. Continual Learning models tend to focus on balancing learning across different domain groups, rather than striving for high accuracy scores for one group at the cost of the other. This helps mitigate the effects of biases arising from skewed training data distributions.

Most facial affect perception models rely on evaluating static image frames, encoding a snapshot of heightened facial activity. In real-world interactions, however, facial expressions are more subtle and evolve over time requiring models to evaluate how facial expressions are formed, over time, incorporating temporal information. In our next work, we focused on combining spatial (image-based) and patio-temporal (sequence-based) analysis to encode the temporal evolution of facial expressions. The key outcomes from this work were:
1. During apex frames, where facial expressions are fully formed at peak intensity, paying more attention to only relative spatial changes within frames provides relevant information to understand the facial expressions.
2. When the expressions is in the onset (beginning to form from neutral) or offset (returning back to neutral from peak intensity) phases, it is more important to focus on temporal changes in frames, understanding which parts of the face move and how.
3. Combining both spatial and spatio-temporal features and learning when to selectively focus on these features improves the ability of facial affect perception models to capture subtle changes in facial expressions.
Exploitation Route The outcomes of this project can enable development of fully autonomous affective robots that are able to sense and adapt to user behaviour, engaging the users in long-term interactions. Such learning capabilities can help develop companion robots to be used in education or healthcare domains. For the education domain, continually learning robots can act as instructors or coaches that learn and adapt to the needs of their pupils, sensing their affective responses to provide assistance when necessary. For the healthcare domain, such robots can help promote well-being amongst individuals by acting as companions that interact with individuals on a daily basis, caring for them and providing daily assistance.
Sectors Digital/Communication/Information Technologies (including Software),Education,Healthcare,Other

 
Description Theoretical Framework for Continual Learning for Affective Robotics: Why, What and How? 
Organisation Middle East Technical University
Country Turkey 
Sector Academic/University 
PI Contribution Developed and published a theoretical framework for the application of Continual Learning for Affective Robotics. We contributed towards the conception and development of the framework as well as primarily contributed towards the manuscript resulting from this work.
Collaborator Contribution We collaborated with Dr Sinan Kalkan from METU, Turkey for the development of a theoretical framework for the application of Continual Learning for Affective Robotics. Dr Kalkan provided his expertise on Continual Learning, computer vision and robotics that helped shape the theoretical framework. He also contributed to the writing of the resulting manuscript in a secondary role.
Impact As as result of this collaboration, a conference article was published describing the theoretical framework developed with Dr Kalkan. The details of the framework can be found here: https://doi.org/10.1109/RO-MAN47096.2020.9223564
Start Year 2019
 
Description McMenemy Seminar talk at Trinity Hall, Cambridge 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact McMenemy seminars at Trinity Hall, Cambridge bring together students from different departments and encourage different perspectives on research projects. The talk was attended by 20+ Postgraduate students (including some Faculty members) including active discussions on the real-world applicability of the research.
Year(s) Of Engagement Activity 2019
 
Description Short Presentation at the Department 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact As a part of the one-minute research presentations, I made an `elevator-pitch' style presentations about the goals of my research to the postgraduate students and faculty at my department.
Year(s) Of Engagement Activity 2019