EPSRC Centre for Doctoral Training in Modern Statistics and Statistical Machine Learning

Lead Research Organisation: Imperial College London
Department Name: Dept of Mathematics

Abstract

The CDT will train the next generation of leaders in statistics and statistical machine learning, who will be able to develop widely-applicable novel methodology and theory, as well as create application-specific methods, leading to breakthroughs in real-world problems in government, medicine, industry and science. The research will focus on the development of applicable modern statistical theory and methods as well as on the underpinnings of statistical machine learning. The research will be strongly linked to applications.
There is an urgent national need for graduates from this CDT. Large volumes of complicated data are now routinely collected in all sectors of society, encompassing electronic health records, massive scientific datasets, governmental data, and data collected through the advent of the digital economy. The underpinning techniques for exploiting these data come from statistics and machine learning. Exploiting such data is crucial for future UK prosperity. However, several reports from government and learned societies have identified a lack of individuals able to exploit this data.
In many situations, existing methodology is insufficient. Off-the-shelf approaches may be misleading due to a lack of reproducibility or sampling biases which they do not correct. Furthermore, understanding the underlying mechanisms is often desired: scientifically valid, interpretable and reproducible results are needed to understand scientific phenomena and to justify decisions, particularly those affecting individuals. Bespoke, model-based statistical methods are needed, that may need to be blended with statistical machine learning approaches to deal with large data. Individuals that can fulfill these more sophisticated demands are doctoral level graduates in statistics who are well versed in the foundations of machine learning. Yet the UK only graduates a small number of statistics PhDs per year, and many of these graduates will not have been exposed to machine learning.
The Centre will bring together Imperial and Oxford, two top statistics groups, as equal partners, offering an exceptional training environment and the direct involvement of absolute research leaders in their fields. The supervisor pool will include outstanding researchers in statistical methodology and theory as well as in statistical machine learning.
We will use innovative and student-led teaching, focussing on PhD-level training. Teaching cuts across years and thus creates strong cohort cohesion not just within a year group but also between year groups. We will link theoretical advances to application areas through partner interactions as well as through a placement of students with users of statistics.
The CDT has a large number of high profile partners that helped shape our application priority areas (digital economy, medicine, engineering, public health, science) and that will co-fund and co-supervise PhD students, as well as co-deliver teaching elements.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Organisations

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2247853 Studentship EP/S023151/1 01/10/2019 30/09/2023 Stefanos Bennett
2247868 Studentship EP/S023151/1 01/10/2019 30/09/2023 Jason Clarkson
2247701 Studentship EP/S023151/1 01/10/2019 30/09/2023 Michael Hutchinson
2247869 Studentship EP/S023151/1 01/10/2019 30/09/2023 Anna Menacher
2247906 Studentship EP/S023151/1 01/10/2019 30/09/2023 Sahra Ghalebikesabi
2260831 Studentship EP/S023151/1 01/10/2019 30/09/2023 Daniel Moss