Improved methods and analyses for high-dimensional online learning and reinforcement learning
Lead Research Organisation:
Imperial College London
Department Name: Mathematics
Abstract
Online learning and reinforcement learning are prominent branches of machine learning that aim to solve sequential decision-making problems. These techniques have a wide range of applications, including autonomous driving, robotics, recommendation systems, advertising, and healthcare. In some of these areas, the cost of making an incorrect decision can be severe. For example, in autonomous driving, human lives are at stake. Therefore, it is crucial that we fully understand how these methods operate and ensure they perform as intended.
In recent years, machine learning models have expanded significantly in both size and complexity, particularly with the rise of large-scale models such as large language models (LLMs). These models often contain billions of parameters and are trained on large datasets, making the optimization problem of training them very high-dimensional and induces an additional set of challenges. Addressing these challenges in the context of online learning and reinforcement learning is essential for improving the performance, efficiency and reliability of these methods, especially in real-world applications.
The goal of this project is to develop theoretical insights into high-dimensional online learning and reinforcement learning problems. Specifically, we aim to derive convergence and regret bounds for the algorithms in question, with the objective of determining how quickly an algorithm can arrive at a solution that is mathematically close to optimal. Improving methods and analyses in this area involves proving that algorithms can converge faster than previously established.
Our focus is threefold. We study existing methods to understand their performance in high-dimensional settings by providing convergence guarantees with a particular focus on how they scale with dimension. Second, we propose new methods or adapt existing ones to ensure optimal performance in these high-dimensional environments. Finally, we explore the fundamental limits of the high-dimensional setting by deriving lower bounds that describe how well any algorithm can solve a given problem to a specified accuracy and allow us to establish optimality.
This project is part of the StatML CDT, a joint initiative between Imperial College London and the University of Oxford, and falls within the EPSRC statistics and applied probability research area. While it has strong ties to optimization, the project remains fundamentally statistical in nature, as the methods we study rely on data that is inherently random.
In recent years, machine learning models have expanded significantly in both size and complexity, particularly with the rise of large-scale models such as large language models (LLMs). These models often contain billions of parameters and are trained on large datasets, making the optimization problem of training them very high-dimensional and induces an additional set of challenges. Addressing these challenges in the context of online learning and reinforcement learning is essential for improving the performance, efficiency and reliability of these methods, especially in real-world applications.
The goal of this project is to develop theoretical insights into high-dimensional online learning and reinforcement learning problems. Specifically, we aim to derive convergence and regret bounds for the algorithms in question, with the objective of determining how quickly an algorithm can arrive at a solution that is mathematically close to optimal. Improving methods and analyses in this area involves proving that algorithms can converge faster than previously established.
Our focus is threefold. We study existing methods to understand their performance in high-dimensional settings by providing convergence guarantees with a particular focus on how they scale with dimension. Second, we propose new methods or adapt existing ones to ensure optimal performance in these high-dimensional environments. Finally, we explore the fundamental limits of the high-dimensional setting by deriving lower bounds that describe how well any algorithm can solve a given problem to a specified accuracy and allow us to establish optimality.
This project is part of the StatML CDT, a joint initiative between Imperial College London and the University of Oxford, and falls within the EPSRC statistics and applied probability research area. While it has strong ties to optimization, the project remains fundamentally statistical in nature, as the methods we study rely on data that is inherently random.
Planned Impact
The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.
Organisations
People |
ORCID iD |
| Emmeran Johnson (Student) |
Studentship Projects
| Project Reference | Relationship | Related To | Start | End | Student Name |
|---|---|---|---|---|---|
| EP/S023151/1 | 31/03/2019 | 29/09/2027 | |||
| 2602524 | Studentship | EP/S023151/1 | 01/10/2021 | 01/02/2026 | Emmeran Johnson |