Information Geometry and Reflexive Reinforcement Learning

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Planned Impact

The Centre will have immediate short-term impacts on people skills and pipeline, alongside advances in scientific knowledge and techniques. However, with the strength of the program's training emphasis on innovation and social/societal challenges we also target longer term economic and societal benefits.
People: Centre graduates will be grounded in fundamental RAS topics and acquire advanced specialist scientific knowledge of crucial interaction themes. They will be skilled at teamwork, with a broader appreciation of RAS ethical issues. They will have international contacts and experience, with public presentation experience. Most importantly, they will be Innovation Ready - skilled in the principles of how technical and commercial disruption occurs, understanding how finance and organization realize new products and services in startup, SME and corporate situations. Their economic impact will be as industrial leaders of the future, foundational in realizing new products and services. This impact will be accelerated by our #Cauldron training programme in the interlinked areas of Scientific Cohesion, Research and Creativity Skills, Social and Societal Challenges, and programmed engagements and activities with our User Partners who shape the Centre's direction.
Science: The Centre will realize scientific advances, e.g. greater understanding of AI vs biomimetic approaches to persistent autonomy, advanced empathetic multimodal interaction between people and machines in smart spaces, advanced robotic micro-sensing and computing in soft embodiments, adaptive compliant actuation at a multitude of scales and form factors, semantic understanding of environments from noisy sensor data and more. Not only the advances, but also the research methods and practice to achieve them will be realized, e.g. hardware-in-the-loop architectures for re-usability and easy, low cost experimentation. The impact of these advances will be enhanced by strongly supported opportunities for dissemination, including conference presentations and publications (and training in presentation and writing skills), reciprocal secondments with Associate Research Partners, international student robot competitions, public outreach activities, CDT hosted international researcher visitors and workshops.
Society: Robotic and autonomous systems decrease cost and risk, increasing productivity while removing human operators from the 'dull, dirty and dangerous' tasks across the industries of our User Partners. Centre graduates and technology will contribute to maintaining UK business competitiveness and exports in this emerging Euro15.5Billion market, whilst improving quality of life for example a) more interesting (and prestigious) day-to-day employment for workers, b) assisted healthcare for an ageing population (including the Centre Directors), and c) greater awareness of environmental impacts and changes leading to policy and legislation.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
NE/W50287X/1 31/03/2021 30/03/2022
1944359 Studentship NE/W50287X/1 31/08/2017 14/05/2022 William Lyons
 
Description In the context of reinforcement learning, a lot of information is encountered over the trial and error learning process, but much of this information is lost as the agent reduces this down to an "expert policy" doing whatever is the optimal action in a given state. We believed this information may have additional uses in a dynamic environment and as such shouldn't be discarded but should be included in some way.

In doing so we felt we may be able to improve the efficacy of "Inverse Reinforcement Learning", a subset of reinforcement learning in which an agent learns a reward function implied by the behaviour of some expert. There are inherent issues with Inverse Reinforcement Learning as things stand, whereby, multiple reward functions can represent the observed behaviours. Our novel informational approach has meant that we have developed an agent capable of dynamically changing its behaviour, even to suboptimal behaviours, to better teach another agent the true values in an environment. We have a paper to submit for this shortly.
Exploitation Route The hope I have for this work, personally, is that rather than having to retrain agents every time a new platform is developed, we will instead be able to train an agent once, and then, as rapid prototyping takes place we will be able to have the agent observe the current expert agent in a "teaching" behaviour, where it performs optimal and suboptimal actions to show the true value of the environment over a decreased time frame and training.

I think this will be particularly useful in any highly dynamic environments.
Sectors Environment

Retail

Transport

URL https://www.scitepress.org/PublicationsDetail.aspx?ID=9NOSxB6sHK4=&t=1