Learning to Efficiently Plan in Flexible Distributed Organizations

Lead Research Organisation: Delft University of Technology

Department Name: Intelligent systems - INSY

Abstract

Teams of robots are expected to revolutionise industry and other other parts of society. However, decision making in such so-called multiagent systems (MASs) under uncertainty is computationally very complex. The decentralized partially observable Markov decision process (Dec-POMDP) framework facilitates principled formulation of such decision making problems, but currently there are no scalable solution methods that provide guarantees on task performance. To simplify coordination in MASs, agent organisations assign an abstracted, easier problem to each agent. Typically only the most rigid organisations, which completely decouple the agents, have led to clear computational benefits. However, these come at the expense of task performance: full decoupling means that agents can no longer collaborate to divide the workload.

This project will focus on flexible distributed organisations (FDOs) for Dec-POMDPs, which restrict considered interactions to spatially nearby agents without imposing full decoupling. Currently no scalable decision making methods with guarantees on task performance exist for FDOs: the main goal of the project is to develop such methods along with the theory that supports their formalisation. To accomplish this goal, it will investigate the use of deep learning techniques to learn representations of 'influence' in FDOs and use those representations to develop novel planning methods. If successful, this will provide the proof-of-concept that learned influence representations can enable principled decision making in large-scale MASs. This will be the basis for a larger research program investigating such influence representations for different forms of abstraction and will spark applied research that investigates deployment of the developed algorithms in real robotic teams.

Planned Impact

This is a relatively short project that pursues basic research in the field of AI. As such, the expected short-term impact will mostly be in the form of knowledge (developed techniques), the scientific output (articles and software) and the influence on research questions picked up by peers (of which citations to those output are an indication). These academic impacts are a crucial link in the pathways to longer-term impact on society and economy. In particular, the project aims to make simulation-based planning dramatically more efficient, and thus more effective or even feasible in cases where it was not before. This could have a great mid-to-long-term impact since the potential of application of these methods is huge; they are advanced versions of model-predictive control (MPC) methods which are widely applied in industry. The recent application of simulation-based planning in the mastering of the game of Go is likely to attract attention in many areas.

Another reason that simulation-based planning methods have not seen more application yet is their large computational costs. However, this is precisely what we address for the class of problems that admit flexible distributed organizations. For such problems, the proposed research will have a crucial impact in terms of just making it feasible at all to apply simulation-based planning to these domains. For instance, we expect that this could be the case for robotic teams collaborating in future factories or warehouses, but also for other problems that can be modelled as spatial task allocation problems, such as dispatching of emergency vehicles, or for applications areas such as optimizing traffic control by simulating large traffic networks, optimizing routing policies by simulation of communication networks, optimizing UAV patrolling policies in security domains or for law enforcement, etc.

Therefore, a large part of the longer term impact will be of the economic kind (new companies, improved products and services) with, given the possible application areas, the potential to improve quality of life. The first companies to adopt these techniques will be logistics and manufacturing companies since these already are starting to use robotic teams. Many of the other aforementioned applications (UAV patrolling etc.) will probably need more time and are likely to be tackled by AI and robotics start-up companies. The discipline of high-performance computing (HPC) is involved in optimizing both hardware and software, e.g., to make simulations maximally efficient. The basic idea put forth in this proposal---that one can mostly use 'local model simulations' that consider only a small subset of the variables considered in 'full model' simulations, while still converging to the same behaviour---could potentially make an impact on this community, and thus may affect the design of computers dedicated to doing simulations.

I will contribute to the impact of the developed methods by contributing to awareness (interacting with potential industrial partners, as well as with academic peers, and organising a workshop), clarifying potential gains (an expected output of the research), and facilitating adoption by releasing open source software.

Funded Value:

£40,802

Funded Period:

Oct 18 - Nov 19

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/R001227/2

Principal Investigator:

Frans Oliehoek

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (100%)

Organisations

People	ORCID iD
Frans Oliehoek (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Castellini Jacopo (2019) Analysing Factorizations of Action-Value Networks for Cooperative Multi-Agent Reinforcement Learning in arXiv e-prints

Czechowski A (2020) Decentralized MCTS via Learned Teammate Models

Katt S (2019) Bayesian Reinforcement Learning in Factored POMDPs

Katt Sammie (2018) Bayesian Reinforcement Learning in Factored POMDPs in arXiv e-prints

Oliehoek F (2018) Interactive Learning and Decision Making: Foundations, Insights & Challenges

Oliehoek Frans A. (2018) Beyond Local Nash Equilibria for Adversarial Networks in arXiv e-prints

Key Findings
Impact Summary
Collaboration


Description	When planning for agent (e.g., robots) in a joint task (such as cleaning a large space, or picking/dropping items in a warehouse), a key difficulty is to predict the other team members. Especially, if those other team members use complex AI strategies. We introduced a method called "Alternating maximization with Behavioral Cloning" show that it is possible to put together learning models of other teammates, and optimizing each teammates behavior in turn, in a way that the behaviors are guaranteed to converge.
Exploitation Route	The results that we did obtain (under submission currently), make a first step towards some of the goals that were envisioned in the proposal. As such, I still believe that the objectives of the original proposal can still be realized. I myself am still actively exploring this, and other researchers will be able to do so too if our paper gets accepted.
Sectors	Digital/Communication/Information Technologies (including Software),Manufacturing, including Industrial Biotechology,Retail,Transport


Description	Based on the online planning approach developed, we have started a collaboration with the Dutch bridge association, who are interested in developing AI tools to further bring insight into their plays.
First Year Of Impact	2020
Sector	Leisure Activities, including Sports, Recreation and Tourism
Impact Types	Cultural


Description	Oliehoek&Amato
Organisation	Northeastern University - Boston
Country	United States
Sector	Academic/University
PI Contribution	As part of the proposed research, I have made a research visit to NEU where I and my postdoc have collaborated with Dr. Amato and his research team. In the lead up to this visit, I have already collaborated with Dr. Amato and his student Sammie Katt leading to a conference paper. During the research visit we have explored some further ideas to combine this prior work with some of the current ideas that we are investigating in our project. These will be further pursued in the remainder of the project.
Collaborator Contribution	This has been a true joint collaboration where we have shared ideas back and forth. As such I can only repeat the above explanation.
Impact	Sammie Katt, Frans A. Oliehoek, and Christopher Amato. Bayesian Reinforcement Learning in Factored POMDPs. arXiv e-prints, pp. arXiv:1811.05612, November 2018. This paper has also been accepted to AAMAS.
Start Year	2018


Description	Oliehoek&Savani
Organisation	University of Liverpool
Department	Friends of the University of Liverpool
Country	United Kingdom
Sector	Academic/University
PI Contribution	As part of its commitments, the department of computer science has supported my research with a PhD student who is performing research in a closely related topic. This student, Jacopo Castellini, is co-supervised by Rahul Savani. As such, this has lead to a longstanding collaboration with Prof. Savani, that is ongoing.
Collaborator Contribution	We have not tracked individual contributions to our collaboration, but overall Prof. Savani brings his expertise on game theory, where I bring expertise on machine learning, reinforcement learning and multiagent planning.
Impact	Frans A. Oliehoek, Rahul Savani, Jose Gallego-Posada, Elise van der Pol, and Roderich Gross. Beyond Local Nash Equilibria for Adversarial Networks. In Proceedings of the 27th Annual Machine Learning Conference of Belgium and the Netherlands (Benelearn), November 2018. And a forthcoming paper accepted at AAMAS 2019.
Start Year	2017

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications