Data- and model-based Reinforcement Learning for Performance, Requirements, and Multi-Agent setups

Lead Research Organisation: University of Oxford
Department Name: Autonom Intelligent Machines & Syst CDT


Brief description of the context of the research including potential impact:

Despite many recent successes in the field of AI, AI systems can still only solve a narrow set of tasks in a restricted environment. Reinforcement learning (RL) is a machine learning technique that holds promise for achieving generality because almost all real-world cognitive tasks can be cast as a reinforcement learning problem. This is one where an agent is coupled with an environment and gets reward according to which action it takes in each situation. The agent must decide on a policy of actions to maximise its expected cumulative future reward.
Two key shortcomings limiting the applications of current RL systems are reward misspecification and inefficient sampling. Reward misspecification refers to the fact that it is difficult for a user to codify exactly what they want in an objective function. This can result in negative side effects or 'reward hacking' where an agent learns to exploit a loophole in the objective function to gain reward for undesired behaviours. RL's inefficient sampling refers to the fact that RL agents must currently acquire vast amounts of experience before reaching any degree of competence at a task.
Inverse Reinforcement Learning (IRL) and Active Learning try to address these shortcomings. IRL seeks to determine the objective function given observations of optimal behaviour. Several approaches to IRL have recently been put forward including Maximum entropy IRL, Cooperative IRL and Bayesian IRL. The idea behind Active Learning is that if one prioritises training on data, trajectories, or samples that would result in the greatest learning effect, then one can significantly increase the sample efficiency of learning systems (including RL agents or IRL algorithms). By addressing shortcomings in existing RL systems, I will be advancing and expediting the project of creating safe and scalable RL systems to tackle real world problems and benefit humanity.

Aims and Objectives:

- Develop novel approaches to combat reward misspecification and sampling inefficiencies.
- Extend existing frameworks to multi-agent settings.

Novelty of the research methodology:

AI safety is a nascent field which aims to address potential near-, medium-, and long-term risks of AI technologies. Current AI concerns include social media, algorithmic bias, security, and privacy, and as the applications of AI become more powerful and pervasive, it is clear that research progress should be seen through a safety lens. With an eye on safety, we hope to improve upon existing RL approaches and extend existing frameworks to multi-agent settings.

Alignment to EPSRC's strategies and research areas:

- Artificial Intelligence technologies
- Statistics and applied probability
- Theoretical Computer Science

Companies or collaborators involved: None


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S024050/1 01/10/2019 31/03/2028
2242815 Studentship EP/S024050/1 01/10/2019 30/09/2023 James Fox