Automated Plan-Based Policy-Learning for Surveillance Problems

Lead Research Organisation: King's College London
Department Name: Informatics

Abstract

Surveillance problems give rise to many challenges including the management of uncertainty in an unpredictable environment, the management of restricted resources and the communication of commitments and requests between multiple heterogeneous agent ``observers''. At the heart of surveillance problems lies the need to plan complex sequences of behaviour that achieve surveillance goals. These goals are typically expressed in terms of gathering as much information as possible given constraints, and communicating findings to a human operator. Planning is combinatorially hard, and planning problems involving metric resources, continuous time and concurrency, as would be required in the solution of non-trivial surveillance problems, are time-consuming to solve. This complexity is greatly exacerbated if uncertainty is captured explicitly within the planning domain models. Although online planning, and plan repair in the case of failure, are feasible in stable situations, they take too long in situations that are changing rapidly. Online planning also requires significant on-board computational resources, which are often not available in surveillance vehicles. Planning under uncertainty cannot therefore be done online in situations typical of many surveillance problems, where computational resources are limited and rapid responses are frequently required. On the other hand, forward planning is certainly required in order to avoid the observers behaving in a purely reactive (and therefore easily distracted) manner.

Since online planning, and planning under uncertainty, are both unrealistic for large-scale, fast-moving surveillance problems, we propose an alternative approach based on plan-based policy-learning. We assume that time and resources are available offline to train effective policies. Our approach is based on Monte Carlo sampling: we sample many instances of the stochastic problem, each instance being a challenging temporal and metric planning problem. We then solve each instance using a high-performing planner, and then apply a classifier to learn a policy as a mapping from states to actions, using the set of solutions as input. We have already demonstrated the effectiveness of this approach in two single-agent cases: management of the loading of multiple batteries, and the control of an autonomous underwater vehicle following the edge of a patch (distinguished by high chlorophyll or high temperature readings) in the coastal waters of the Monterey Bay. We know from our work in both cases that the resulting policies can be very high-performing in terms of robustness to the high degree of uncertainty that often occurs in the physical execution environment. We are now proposing to scale up the approach we took in the batteries and patch-following cases, to the multi-agent coordination problem, addressing the challenges that arise when many agents are coordinating in solving a surveillance problem that requires the integration of multiple policies.

Planned Impact

In the last few years there has been increasing interest, amongst companies concerned with robotics, in the automation of intelligent reasoning. We have contributed to the development of autonomous decision-making in collaboration with Scottish Power and National Grid, SciSys Ltd and, most recently, BAE Systems. In recent months there has been an explosion of interest in the role that planning can play in these developments, and we have been approached by Thales Ltd, USAF, and Mako Surgical Corp., all concerned with automating different aspects of the control of their advanced robotic systems. The names of the companies behind the Call to which this proposal is addressed, demonstrate that industry is ready for the benefits of automated plan-based reasoning.

The impact of the proposed work will be achieved through close collaboration with the industrial sponsors named in the Call. In order to ensure the impact of our approach across the scenarios they outlined, we will work closely with the companies, working with them to derive accurate models of their problem domains and appropriate sampling techniques for sampling the space of problem instances in each case. As outlined in the case for support, we will produce a number of deliverables (domain models, planning software, learned policies and simulation results) that will inform discussion and enable refinement of our approaches towards more closely tailored solutions. We will then work with the companies to field the technology we develop, first at the level of demonstrations and then as components in deployed systems.

Our solutions will have wider application than to surveillance scenarios, as the features of Intelligence-Gathering, Tracking and Hazard Investigation arise in many application domains including search and rescue, domestic robotic support, healthcare, computer-aided learning, future power systems, future transport systems and many others. The class of surveillance problems we have defined raises many new research challenges, in modelling, planning and machine learning, in which we are experts. We expect that, in addressing the specific needs of the scenarios described in the Call, we will produce technology that moves the AI Community forward in terms of fielding fast, light-weight and adaptive autonomous deliberative reasoning.
 
Description We have proposed a method for combining plan-based policy-learning with forward search planning to undertake search-and-track missions under uncertainty. We use a search pattern based method, and plan search strategies over time. Our approach is more scalable than using planning under uncertainty when the search area is large and heterogeneous. We have extended our approach to handle probabilistic rewards, within a deterministic modelling framework. Our work presents an alternative way of planning under uncertainty by harnessing external probabilistic methods to the planner via a generic interface.
Exploitation Route The approach we have developed is relevant in any context where autonomous vehicles are engaged in search and track missions. Furthermore, the approach is of interest whenever a problem can be decomposed into a core planning part and a sub-component involving reasoning (numeric or probabilistic) that is not well-modelled in propositional and deterministic planning terms.
Sectors Aerospace, Defence and Marine,Creative Economy,Security and Diplomacy,Transport,Other

 
Description The project has attracted interest from the aerospace and underwater autonomy sectors. We have been working in collaboration with BAE Systems, and have a jointly authored paper. Findings from this project led to a Phase 1 ASUR project being funded, with a new collaboration between KCL and Seebyte Ltd. Also, new collaborations were started with Rescue Global. Subsequently, 2 investigators moved into industry and were able to apply planning to real industrial problems, using approaches inspired by the sub-solver integration and search control methods developed during this project.
First Year Of Impact 2016
Sector Aerospace, Defence and Marine,Energy,Other
Impact Types Economic