Discovering Individual and Social Preferences through Inverse Reinforcement Learning

Lead Research Organisation: University of Essex
Department Name: Computer Sci and Electronic Engineering

Abstract

Organisations that provide services and create products often base their decisions on questionnaires and/or other explicit forms of communication with their user base (e.g. patients, customers, citizens). The aim of this information exchange between providers and users is to uncover the users' "reward function", i.e. what users actually want from their interactions and what issues exist with the current product/service line-up. Explicit forms of information exchange can be cumbersome and expensive to design for organisations and are intrusive to the user. Furthermore, response bias is a well-known problem for survey based methods, particularly around sensitive topics, where respondents maybe unwilling to engage due to social or cultural concerns. Some practical solutions to response bias are provided by indirect questioning methods (item count and randomized response techniques). However, none of these solutions are practical for large scale and real time settings.
We postulate that ideally an organisation should try to elicit the reward function of its user base (i.e. what states are preferred by users) by using observational data generated from user activity. Inspired by recent literature in AI research, we propose a three-facet programme that aims to directly attack the problem of what users want by a) trying to infer the user reward function through the collection of behavioural data (e.g. website clicks, traffic behaviour, movie preferences); b) creating short, non-intrusive online questionnaires that will remove any uncertainties; and c) exploiting user preferences in order to improve service and product provision.
The proposed research aims to contribute to developing methods that can be embedded in artificial intelligence systems which must elicit and understand preferences by interacting with humans in order to adapt their behaviour and allow for a more natural experience and interaction.
Through this research we have four key objectives: (a) understand user preferences and develop methods to uncover and learn the reward function through data and behaviours; (b) develop interactive and conversational methods for eliciting responses and interactions from users that allow for a more natural user experience with automatic systems; (c) explore the social limitations of our approach (for instance, to what extend are personal rewards not dictated by individual preferences, but rather by social coercion?); and (d) investigate what steps can be taken to fully automate the procedure of provisioning new services and products through eliciting preferences via the methods developed under (a) and (b).
This Fellowship provides a unique opportunity to bring together artificial intelligence techniques and social science to tackle problems that are faced by a range of businesses and organisations in dealing with clients and customers and attempting to elicit preferences and needs through behaviours and interactions. We will be working closely with our industry partners in this project, British Telecom (BT) and the Essex County Council (ECC), to investigate the issues and challenges of eliciting and understanding preferences as being faced in their own contexts to inform and shape the programme of work.

Planned Impact

The project has been conceived in partnership with the collaborating business partners with an emphasis on practical deployment of the developed methods and results of the project during the latter stages of the research. The business partners will be working with us during the Fellowship to guide and co-design the research and ensure a two-way feedback loop that will maximise the impact of the work undertaken and also facilitate dissemination beyond academic. Close engagement with stakeholders will also ensure the relevance of the generated results and outcomes and their take-up and embedding in business practices of organisations. Our user (partner) and stakeholder activities consist of a number of elements design to enable cross-fertilisation of ideas and truly participatory research and dissemination of outcomes:
Initial scoping workshop. To help shape the research questions and context of the work to be undertaken, the Fellowship will commence with a a focused workshop to involve key people from the business partners, the Fellow, and the Academic Team (PI and CIs). The purpose of the workshop will be three-fold: (i) the Fellow and Academic Team will have the opportunity to meet the key people to be involved in the project from the business partners' side and form a close working relationship; (ii) the business partners will have the opportunity to describe the problems that they face and provide the context of the work; (iii) the mode of collaboration will be discussed and agreed. This will then help the Fellow to work more closely with the business partners in the next step to refine the questions that the research will aim to address and position them in the context of the problems that these organisations face. Following the workshop, a detailed workplan with the research questions, milestones, and timelines will be drafted with further input from the partners.
Continuous engagement and refinement of research questions and approach. The research questions will continue to be refined as part of the short-term placements of the Fellow within the respective Data Science teams of the partners. Both organisations have committed to hosting the Fellow within their Data Science teams for short periods of time to enable the research to remain aligned and relevant to the problems that they face and to enable true knowledge exchange, cross-fertilisation of ideas and imparting new skills. These short placements will also enable the Fellow to build close working relationships and obtain a more comprehensive understanding of the business problem and needs in-situ. Presentations at meetings of the Data Science teams and other teams within the business partners will facilitate raising awareness of the project and dissemination of intermediate project outcomes.
Dissemination of project outcomes to the business partners. At the end of the Fellowship a workshop will be organised to disseminate the results of the project. This will bring together the two business partners as well as the other businesses that have expressed an interest to be involved in the dissemination of the project outcomes (as well as new business partners and organisations that the Fellow may develop links with during the Fellowship) in order to discuss and evaluate the results and see how these can be embedded in their business practices. One of the project outputs at the end will be a report for each partner on how the project results could be embedded within their businesses practices and recommendations on how this can be taken forward.
 
Description This project explore preference elicitation methods to extract information from settings where users may not have a clear idea of their requirements.
Exploitation Route The work that has been carried out can be built upon further to develop more advanced artificial intelligence based methods.
Sectors Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Retail

 
Title Matlab Inverse Reinforcement Learning Tool , With personal additions and modifications 
Description This software toolbox contains 9 of the most common Inverse Reinforcement Learning algorithms. I have included some modifications and all the papers this software is based on. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This is a correctly functioning example of the implementation of IRL. 
 
Description BT Challenge Lab 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact 15 senior members of the BT research and development team headed by Prof. Detlef Nauck, attended this challenge labat the Iniversity of Essex. From the University of Essex 10 Senior Research Officers from the Institute of Analytics and Data Sciences, and the Centre from Business and Local Government also attended. The subjects discussed was the pursuit of a Knowledge Transfer Partnership (KTP) and how BT and the University of Essex may use cutting edge research done by my colleagues and I, to develop more efficient recommender systems and engage with their nation-wide customer base.
Year(s) Of Engagement Activity 2020
 
Description ESRC Business and Local Government Data Research Centre: Making changes stick in complex collaborations 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Keynote speaker, honorary Prof. Stephen Kavanagh discussed problems faced by the police, such as low cost, austerity, failure in sharing data, and short-term management objectives. The need for a new definition for efficiency was discussed. Problems arise as new communities are being defined that are not based on shared geography. The very strict hierarchy in the police and limited training also cause difficulties. I discussed the use of human experts to define features of risk definition so Inverse Reinforcement Learning can assign values to each category from the data to make more accurate assessments of risk in the future, as only high-risk cases are passed on to be investigated.
Year(s) Of Engagement Activity 2019
 
Description Healthwatch Essex Limited, Annual meeting 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Patients, carers and/or patient groups
Results and Impact During this event, it was discussed that there were 12500 cases of safeguarding referrals in Essex last year. The work of the "Collaboration Essex Forum was discussed. Carers talked about how the most helpful thing to people in need is compassionate human interaction. The current method of preference elicitation in use is initially through interviews and then designing questionnaires. This puts a heavy load on carers, whom need relief from brain-dead paperwork. The focus of questionnaires was on free interviews. There is a noticeable lack of structure to avoid bias. I discussed the potential of Inverse Reinforcement Learning to replace traditional questionnaires.
Year(s) Of Engagement Activity 2019
 
Description Healwatch Essex Introduction to research opportunities at the University of Essex 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact During this event the structure of available internships and student placements was discussed. The ESRC Business and Local Government Centre presented its senior research fellows and their respective ares of research. This was followed by an introduction to EIRA and collaborative opportunities. The next speaker represented "Understanding Society" and provided an introduction to the institute for Social and Economic Research. Through further collaboration with the "Understanding Society" organisation I gained access to a potentially valuable dataset for my research on Inverse Reinforcement Learning. Finally, the UK Data Archive, which has an active safe at the University of Essex, presented the scope of their activities. Following this event I went through the formal safe training to gain access to the UK Data Archive for access to many valuable datasets for my research.
Year(s) Of Engagement Activity 2019
 
Description Institute for Analytics and Data Sciences Keynote, Honorary Prof. Stephen Kavanagh 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact A chance to speak with Stephen Kavanagh, Essex Chief Constable, police ICT company, and recipient of the Queen's Policy Medal (QPM). We discussed examples of data science being used on already gathered data to identify activities that were invisible or meaningless to a human observer and how Inverse Reinforcement Learning could be used. The example Mr Kavanagh gave was on detecting language patterns of pedophiles pretending to be younger than they are by recognising linguistic patterns. I discussed the need for real-world understanding of the dynamics of a system to accurately define the parameters of a reward function (individual preferences) for predictive modelling.
Year(s) Of Engagement Activity 2019
 
Description Pioneer Sailing Trust, a consultation and setting up future collaboration 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact During this event, The CEO of Pioneer Sailing trust discussed their need for some clear indicator of their impact. The discussion resulted in an update of their questionnaire to gather more useful information from their customers and trainees. The new questionnaire will include categories on improvements on mental and physical wellbeing, improvement in social interaction, better healing and recovery for patients, and opportunities for growth for young people.
Year(s) Of Engagement Activity 2020