Meta reinforcement learning

Lead Research Organisation: University of Oxford

Abstract

Despite the fact that reinforcement learning (RL) has achieved great success in the past decades, its application in real world is still quite limited due to sample inefficiency and poor generalization. Meta-reinforcement learning (meta-RL) provides a potential solution to these challenges by meta-training on a set of tasks to learn a good inductive bias, which enables fast learning on unseen (meta-test) tasks.
However, existing meta-RL algorithms are still quite limited in the range of tasks they can handle. For example, many meta-RL algorithms require a consistent state and action space between meta-training and meta-test, which may hold only in carefully designed research benchmarks but seldom in real-world applications. On the contrary, human beings seem to be very flexible and efficient at utilizing all kinds of previous experience for fast adaptation to even very different tasks. As a step towards more intelligent and adaptive agents like human, this research aims to design meta-RL algorithms that can learn fast on a broader range of tasks compared to existing SOTA methods, which may further facilitate the application of meta-RL in real-world problems for efficient learning.

Aims and Objectives
The objective of this research is to design meta-RL algorithms which can enable fast learning across a wide range of tasks, e.g., tasks with different state and action space, tasks that require quite different skills to solve, etc.

Novelty of the research methodology
Most existing meta-RL algorithms work in a "parametric" approach, i.e., the adaptation to a new task is achieved via parameter update. For example, MAML [1] learns a good policy initialization which enables fast adaptation to different tasks, thus it is expected that the range of tasks it can solve is limited by the policies which can be reached from the learned initialization within a few gradient steps. Context-based methods [2][3] parameterize their belief over the task being solved as a context vector, and update the context based on historical experience. A universal policy is learned to solve different tasks by conditioning on both the state and the inferred context. Its application range is limited to the tasks that can be correctly represented with the context encoder.
Different from the parametric approach, a specific methodology this research plans to investigate is structured meta-RL [4], i.e., we learn a set of primitive skill modules, and different tasks are solved by dynamically combining different modules (maybe also plus fast parametric update within each module). Intuitively, this approach can solve a wide range of tasks for its nature of combinatorial generalization, in consistent with the principle that enables the construction of many very complex systems (like language, computer programs) based on simple primitives (like words, computer instructions). However, a challenge here is that we need to learn both the high-level structure and the low-level primitives, which significantly increases the difficulty and instability of learning. This research aims to seek for potential solutions to this challenge and design high-performance structured meta-RL algorithms with strong generalization.

Alignment to EPSRC's strategies and research areas
Artificial intelligence technologies

Planned Impact

AIMS's impact will be felt across domains of acute need within the UK. We expect AIMS to benefit: UK economic performance, through start-up creation; existing UK firms, both through research and addressing skills needs; UK health, by contributing to cancer research, and quality of life, through the delivery of autonomous vehicles; UK public understanding of and policy related to the transformational societal change engendered by autonomous systems.

Autonomous systems are acknowledged by essentially all stakeholders as important to the future UK economy. PwC claim that there is a £232 billion opportunity offered by AI to the UK economy by 2030 (10% of GDP). AIMS has an excellent track record of leadership in spinout creation, and will continue to foster the commercial projects of its students, through the provision of training in IP, licensing and entrepreneurship. With the help of Oxford Science Innovation (investment fund) and Oxford University Innovation (technology transfer office), student projects will be evaluated for commercial potential.

AIMS will also concretely contribute to UK economic competitiveness by meeting the UK's needs for experts in autonomous systems. To meet this need, AIMS will train cohorts with advanced skills that span the breadth of AI, machine learning, robotics, verification and sensor systems. The relevance of the training to the needs of industry will be ensured by the industrial partnerships at the heart of AIMS. These partnerships will also ensure that AIMS will produce research that directly targets UK industrial needs. Our partners span a wide range of UK sectors, including energy, transport, infrastructure, factory automation, finance, health, space and other extreme environments.

The autonomous systems that AIMS will enable also offer the prospect of epochal change in the UK's quality of life and health. As put by former Digital Secretary Matt Hancock, "whether it's improving travel, making banking easier or helping people live longer, AI is already revolutionising our economy and our society." AIMS will help to realise this potential through its delivery of trained experts and targeted research. In particular, two of the four Grand Challenge missions in the UK Industrial Strategy highlight the positive societal impact underpinned by autonomous systems. The "Artificial Intelligence and data" challenge has as its mission to "Use data, Artificial Intelligence and innovation to transform the prevention, early diagnosis and treatment of chronic diseases by 2030". To this mission, AIMS will contribute the outputs of its research pillar on cancer research. The "Future of mobility" challenge highlights the importance the autonomous vehicles will have in making transport "safer, cleaner and better connected." To this challenge, AIMS offers the world-leading research of its robotic systems research pillar.

AIMS will further promote the positive realisation of autonomous technologies through direct influence on policy. The world-leading academics amongst AIMS's supervisory pool are well-connected to policy formation e.g. Prof Osborne serving as a Commissioner on the Independent Commission on the Future of Work. Further, Dr Dan Mawson, Head of the Economy Unit; Economy and Strategic Analysis Team at BEIS will serve as an advisor to AIMS, ensuring bidirectional influence between policy objectives and AIMS research and training.

Broad understanding of autonomous systems is crucial in making a society robust to the transformations they will engender. AIMS will foster such understanding through its provision of opportunities for AIMS students to directly engage with the public. Given the broad societal importance of getting autonomous systems right, AIMS will deliver core training on the ethical, governance, economic and societal implications of autonomous systems.

Student:

Zheng Xiong

Period of Study:

Oct 20 - Sep 24

Funder:

EPSRC

Project Status:

Active

Project Category:

Studentship

Project Reference:

2420767

Research Topic:

Unclassified

Organisations

People	ORCID iD
Michael Osborne (Primary Supervisor)
Shimon Whiteson (Primary Supervisor)
Zheng Xiong (Student)

Publications

Author Name Title Publication

Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/S024050/1			01/10/2019	31/03/2028
2420767	Studentship	EP/S024050/1	01/10/2020	30/09/2024	Zheng Xiong