Application of Reinforcement Learning to the Flight Control of Unmanned Aerial Vehicles

Lead Research Organisation: University of Bristol
Department Name: Aerospace Engineering

Abstract

Project description:
Complex urban environments pose a significant challenge for the operation of Unmanned Autonomous Systems (UAS). To operate in such areas, vehicles require the ability to rapidly change direction, avoid obstacles, and land in confined areas. This is especially challenging for a fixed-wing platform, due to the minimum airspeed needed to prevent aircraft stall. Fixed-wing platforms offer a number of advantages of rotary-wing vehicles, such as increased flight endurance and range, and greater payload capacity. As such, there is significant research in improving the agility of fixed-wing platforms, to improve their ability to operate in complex environments.

This research proposal aims to build upon previous research projects conducted by the University of Bristol Flight Lab [1] [2]. These projects used a variable-sweep, fixed-wing platform to perform a bio-inspired perched landing manoeuvre. This agile landing manoeuvre, taking advantage of dynamic stall, enabled to UAS to land safely on small landing site, with minimal aircraft velocity, without the need for a long landing strip or arresting equipment. In particular, such a manoeuvre is applicable to challenging operational environments, such as in a complex urban setting, or operating from the deck of a ship. Non-linear control strategies were evaluated to generate the necessary perching manoeuvres. The reinforcement learning process, using a Deep Q-Network (DQN), generated trajectories with the lowest cost function, and showed the ability to generate trajectories from a range of starting conditions.

Research in the first year aimed to modernise the perching UAV learning process, integrating and evaluating state-of-the-art reinforcement learning algorithms and frameworks. Compared to the DQN algorithm used previously, modern algorithms, such as Proximal Policy Optimisation (PPO), demonstrate the ability to attain higher rewards, as well as improved stability and convergence during the learning process. [3] This research has also explored the use of continuous control outputs, to increase the granularity of actuator control available to the learning agent. The next stage of this research is transitioning to real-world flight testing of the perching manoeuvre using these improvements to the process. This project has also transitioned to using state-of-the-art frameworks, such as OpenAI's Gym toolkit, to modernise and modularise the learning architecture. This lays the foundation for simpler, faster implementation of alternative algorithms and scenarios moving forward.

This research project will aim to build on previous projects of the research group, and incorporate state-of-the-art algorithms and techniques, to develop reinforcement learning-based flight controllers which can perform a number of agile flight manoeuvres. The current flight dynamics model of a model UAV will be improved and expanded, to improve accuracy when performing agile manoeuvres, and by incorporating the lateral degrees of freedom into the current longitudinal-only model. Methods to improve the accuracy of the trained model will be evaluated and implemented, such as incorporating flight data into the offline, simulated learning process, and conducting online learning on the real-world vehicle. A number of agile flight manoeuvres, applicable to the operating in complex environments, will be selected, tested and evaluated. Examples of candidate algorithms include rapid changes of direction, and minimum distance 180 turns, such that the vehicle can avoid obstacles and navigate cluttered environments. A key focus of this research will be generating trained controllers and the necessary software frameworks such that they can be tested and used on real-world platforms.

Planned Impact

Rapid growth in the already burgeoning Robotics and Autonomous Systems (RAS) market has been estimated from many sources. This growth is driven by socio-economic needs and enabled by advances in algorithms and technologies converging on robotics. The market potential for applications of robotics and autonomous systems is, therefore, of huge value to the UK. There are four major areas where FARSCOPE will strive to fulfil and deliver on the impact agenda.

1. Training: A coherent strategy for impact must observe the value of the 'innovation pipeline'; from training of world-class researchers to novel products in the 'shop window'. The FARSCOPE training programme described in the Case for Support will produce researchers who will be able to advance knowledge, expertise and skills in the many associated aspects of academic pursuit in the field. Crucially, they will be guided by its industrial partners and BRL's Industrial Advisory Group, so that they are grounded in the real-world context of the many robotics and autonomous systems application domains. This means pursuing research excellence while embracing the challenges set within the context of a range of real-world factors.

2. Economic and Social Exploitation: The elevated position of advanced robotics, in the commercial 'value chain', makes it imperative that we create graduates from our Centre who are acutely aware of this potential. BRL is centrally engaged in its regional SME and business ecology, as evidenced by its recent industry workshop and 'open lab' events, which attracted some 60 and 280 industrial delegates respectively. BRL is also a key contributor to regional economic innovation. BRL has engaged two business managers and allocated some dedicated space to specifically support work on RAS related industrial engagement and innovation and, importantly, technology incubation. BRL will be creating an EU-funded Robotics Innovation Facilities to help coordinating a EUR 20m a programme to specifically promote and encourage direct links between academia and industry with a focus on SMEs. All of these high-impact BRL activities will be fed directly into the FARSCOPE programme.

A critical mass of key industrial and end-user partnerships across a diverse array of sectors have given their support to the FARSCOPE centre. All have indicated their interest in engaging through the FARSCOPE mechanisms identified in the Case for Support. These demonstrate the impact of the FARSCOPE centre in engaging existing, and forming new, strategic partnerships in the RAS field.

3. Fostering links with other Research Institutions and Academic Dissemination: It is essential that FARSCOPE CDT students learn to share best practice with other RAS research centres, both in the UK and beyond. In addition to attendance and presentation at academic conferences nationally and overseas, FARSCOPE will use the following mechanisms to engage with the academic community. BRL has very many strong links with the UK, EU and global RAS research community. We will use these as a basis for cementing existing links and fostering new ones.

4. Engaging the Public: FARSCOPE will train and then encourage its student cohorts to engage with the general public, to educate about the potential of these new technologies, to participate in debates on ethics, safety and legality of autonomous systems, and to enthuse future generations to work in this exciting area. UWE and the University of Bristol, BRL's two supporting institutions, host the National Coordinating Centre for Public Engagement. In addition, UWE's Science Communication Unit is internationally renowned for its diverse and innovative activities, which engage the public with science. FARSCOPE students will receive guidance and training in public engagement in order to act as worthy RAS research 'ambassadors'.

Publications

10 25 50