Adaptive and Robust Methods in Statistical Machine Learning

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

In today's rapidly changing world, statistical machine learning models have become crucial to many
applications, ranging from recommendation systems to autonomous vehicles. However, two fundamental
challenges can compromise their effectiveness: continual adaptation and data scarcity. Continual adaptation
refers to the ability of statistical machine learning models to learn and evolve as the environment changes
over time. For instance, in the financial market, models must rapidly adapt to the changing market
conditions. Data scarcity refers to the lack of adequate information in specific domains, which can arise due
to privacy concerns, data unavailability, or the high costs associated with data collection.
The primary focus of this project is on developing novel robust statistical machine learning methods that
ensure reliability in dynamic and/or data-scarce environments. We place a strong emphasis on the
robustness of these methods, which means they should maintain their performance in the face of changing
conditions. The main objectives of this project are:
- To develop new statistical machine learning methods that can adapt continually to evolving data
distributions, while ensuring robustness and without compromising performance in dynamic environments.
- To investigate novel techniques for knowledge transfer between related tasks and domains, placing
emphasis on the robustness of knowledge transfer to mitigate the impact of data scarcity and enable more
accurate learning even when data is limited.
- To study the foundations of these techniques, aiming to establish robust guarantees and theoretical
properties that ensure the reliability and transparency of our model predictions.
The proposed research project has broad relevance across various fields. For instance, in healthcare, it can
enable diagnostic models to function effectively even when medical data is limited to provide more accurate
diagnoses. Similarly, in autonomous vehicles, these techniques can ensure safer and more reliable driving by
constantly adapting to shifting road conditions. Furthermore, in environmental monitoring, these methods
can improve the accuracy of climate models by incorporating new data as it becomes available, making
predictions more reliable for decision-making. These are just a few examples of how adaptive and robust
statistical machine learning can address complex challenges.

This project falls within the EPSRC Mathematical Sciences research area.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2748915 Studentship EP/S023151/1 03/10/2022 30/09/2026 Guiomar Pescador Barrios