Multi-Objective Reinforcement Learning and Fairness

Lead Research Organisation: University of Warwick

Department Name: Mathematics

Abstract

In Reinforcement Learning (RL) an agent is looking to find a policy that maximizes the expected rewards. A significant number of RL algorithms have been proposed in the literature for policy optimisation, with important practical applications. However, In many scenarios optimizing for a single scalar reward is not sufficient. For example, think of a scenario where an agent has to monitor a traffic network, where traffic lights are needed to coordinate cars coming from roads with different volumes of traffic. In this context, the agent needs to learn an optimal policy that satisfies all the parties involved in some sort of fair way. This shows that for many real-world control problems, there is a need for a Multi-Objective RL (MORL) framework that can satisfy the needs of all the parties involved in some sort of fair way. The aim of this project is to construct a new RL framework that optimises a policy coming from multiple reward functions in a fair manner. Although the last decade or so has seen significant
achievements in the development of planning and RL algorithms for multi-objective problems, it remains a niche area compared to the amount of research on single-objective agents. In addition,
a number of challenges arise in the context of multiple objectives which do not exist in the single-objective domain. Thus, this project will tackle the following challenges:
1) A limited number of benchmark problems have been proposed for MORL research, and many of those are quite simple. These benchmark problems lack the complexity of many real-world problems that deal with multiple conflicting objectives. Therefore, there is a need for more benchmarks with complex state and action spaces and many objectives so that the quality of proposed MORL algorithms can be thoroughly tested.
2) Within the field of MORL, the task of handling problems with many objectives (usually defined as four or more objectives) has emerged as a distinct sub-field, in recognition that algorithms that work well for a small number of objectives may scale poorly to many objectives. Most of the work on many-objective RL problems focuses on bandit problems and the use of Principal Component Analysis, in which the original objectives are mapped to a lower dimension. Thus, the development of a broader suite of algorithms to handle many-objective problems remains an open challenge for future work.
3) Numerous real-world problems involve both multiple actors and objectives that should be considered when making a decision. Multi-objective multi-agent systems represent an ideal setting to study such problems. However, despite its high relevance, it remains an understudied domain, perhaps due to the increasingly complex dimensions involved. There are countless open challenges in this sub-field of MORL, ranging from how to develop negotiation strategies for selecting between multiple potential solutions, to how equilibria are affected by the choice of the optimisation criteria and utility functions of the agents. This research will use already existing MORL benchmarks which contains a number of MORL environments, where the quality of different algorithms can be compared. Additionally, this project will
highly benefit from the collaboration with the AI Lab at the Vrije Universiteit in Brussels. The VUB AI Lab is part of the EUTOPIA network and has a large number of connections with industry. Thus, there will be an opportunity to work on concrete case studies with industrial partners and use real-world datasets.

Planned Impact

In the 2018 Government Office for Science report, 'Computational Modelling: Technological Futures', Greg Clarke, the Secretary of State for Business Energy and Industrial Strategy, wrote "Computational modelling is essential to our future productivity and competitiveness, for businesses of all sizes and across all sectors of the economy". With its focus on computational models, the mathematics that underpin them, and their integration with complex data, the MathSys II CDT will generate diverse impacts beyond academia. This includes impacts on skills, on the economy, on policy and on society.

Impacts on skills.
MathSys II will produce a minimum of 50 PhD graduates to support the growing national demand for advanced mathematical modelling and data analysis skills. The CDT will provide each of them with broad core skills in the MSc, a deep knowledge of their chosen research specialisation in the PhD and a complementary qualification in transferable skills integrated throughout. Graduates will thus acquire the profiles needed to form the next generation of leaders in business, government and academia. They will be supported by an integrated pastoral support framework, including a diverse group of accessible leadership role models. The cohort based environment of the CDT provides a multiplier effect by encouraging cohorts to forge long-lasting professional networks whose value and influence will long outlast the CDT itself. MathSys II will seek to maximise the influence of these networks by providing topical training in Responsible Research and Innovation, by maintaining a robust Equality, Diversity & Inclusion policy, and by integration with Warwick's global network of international partnerships.

Economic impacts.
The research outputs from many MathSys II PhD projects will be of direct economic value to commercial, public sector and charitable external partners. Engagement with CDT partners will facilitate these impacts. This includes co-supervision of PhD and MSc projects, co-creation of Research Study Groups, and a strong commitment to provide placements/internships for CDT students. When commercial innovations or IP are generated, we will work with Warwick Ventures, the commercial arm of the University of Warwick, to commercialise/license IP where appropriate. Economic impact may also come from the creation of new companies by CDT graduates. MathSys II will present entrepreneurship as a viable career option to students. One external partner, Spectra Analytics, was founded by graduates of the preceding Complexity Science CDT, thus providing accessible role models. We will also provide in-house entrepreneurship training via Warwick Ventures and host events by external start-up accelerator Entrepreneur First.

Impacts on policy.
The CDT will influence policy at the national and international level by working with external partners operating in policy. UK examples include Department of Health, Public Health England and DEFRA. International examples include World Health Organisation (WHO) and the European Commission for the Control of Foot-and-mouth Disease (EuFMD). MathSys students will also utilise the recently announced UKRI policy internships scheme.

Impacts on society.
Public engagement will allow CDT students to promote the value of their research to society at large. Aside from social media, suitable local events include DataBeers, Cafe Scientifique, and the Big Bang Fair. MathSys will also promote a socially-oriented ethos of technology for the common good. Concretely, this includes the creation of open-source software, integration of software and data carpentry into our computational and data driven research training and championing open-access to research. We will also contribute to the 'innovation culture and science' strand of Coventry's 2021 City of Culture programme.

Student:

Sotirios Stamnas

Period of Study:

Oct 22 - Sep 26

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2737772

Research Topic:

Unclassified

Organisations

People	ORCID iD
Sotirios Stamnas (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/S022244/1			01/10/2019	31/03/2028
2737772	Studentship	EP/S022244/1	03/10/2022	30/09/2026	Sotirios Stamnas