Conditions and methods for Decentralised Reinforcement Learning

Lead Research Organisation: Imperial College London
Department Name: Electrical and Electronic Engineering

Abstract

Advances in Reinforcement learning (RL) in the last decade have made it a hot topic for research. Improvements in hardware performance and the combination of RL with the use of neural networks have allowed for the development of algorithms that achieve state-of-the-art performance in many control problems, including computer games in which they beat human champions. Some open questions that remain in the field however are how to learn in more complex environments, how to learn more efficiently from limited samples and how to learn for more general tasks. One approach used to learn in very complex environments is to decentralise the control task to multiple agents, rather than a single centralised one. This can greatly reduce the complexity of learning by each agent with the possible expense of more limited policies (action plans) that can be enacted by the group of agents and other technical issues affecting the stability of the training process. The decentralisation occurs quite naturally in many scenarios, such as in self-driving cars, in which each agent can be one car or in a resource assignment task in a cluster of computers, in which each agent could control the tasks assigned to each computer. These decentralised agents can have various levels of communication and synchronisation with each other, which affects the size of the set of possible policies to be taken by the agents. My research aims to deal with agents that communicate implicitly, meaning that they do not directly share their status (state information) with each other, however they observe common features of the environment that allow to collect information about the status of the other agents. The first question that I aim to answer is what are the scenarios in which such a decentralised RL system can achieve the same level of performance as a centralised single agent? This involves setting mathematical conditions on the states, the rewards, and the policy. This is done by modelling the decentralised solution as a decentralised partially observable Markov decision process (dec-POMDP), which allows to consider the decentralisation of the agents and the partial observability of the environment from each agent. Then, I want to investigate in more general scenarios, what is the effect of applying decentralisation? Can I derive theoretical bounds on the performance loss due to decentralisation under certain conditions? Are there special conditions under which decentralisation is especially useful? Subsequently, I want to use these conditions to develop an algorithm that can easily distinguish between tasks that are decentralizable and those that are not. Depending on what the mathematical conditions are, this may be easily done directly using the derived formula, but it could also involve massive computation. In this case, it would be useful to create approximations that would allow to easily test how decentralising the solution of the problem task affect the theoretical performance bounds.
While there has been much existing research about decentralised RL algorithms trying to achieve the maximum performance in every kind of scenario, the relationship between centralised and decentralised RL solutions has not been explored in depth. My PhD research aims to provide a theoretical foundation about this relationship and aims to provide novel tools in the form of algorithms that would allow the designer of a decentralised solution to know the maximum theoretical performance that a certain design of a decentralised solution can achieve. This research has the potential to be applied to the control of many systems having components that require cooperating behaviour to achieve the optimal performance. Examples of such systems can be found in self-driving cars, robotics, communication networks, etc. My research is aligned with the ESPRC field "Artificial Intelligence technologies" and "ICT networks and distributed systems".

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/T51780X/1 01/10/2020 30/09/2025
2619847 Studentship EP/T51780X/1 01/10/2021 30/06/2025 Edoardo David Santi