Towards AI that is forever learning efficiently

Lead Research Organisation: University of Bristol
Department Name: Mathematics

Abstract

This project falls within the EPSRC Statistics and Applied Probability research area. It will look at researching techniques for AI that allow a model to continually learn on different datasets which includes adapting to perform on current training data and not 'forgetting' how to perform well on previous data. If all the data a neural network has trained on up to a certain point in time is not independently identically distributed, we can run into a problem as the gradient updates used to train the network aren't on average changing the neural network to perform optimally on all the data. This problem arises in many areas including when we try to train a network on a sequence of tasks containing different data, in this scenario the network will optimize on the current data and loose performance on previous data which has been named catastrophic forgetting. The area of research named 'continual learning' looks at how to train models in such a scenario and one of the significant issues to overcome for this research area is mitigating catastrophic forgetting. Research in this area could lead to being able to continually update models with new data whilst its deployed, being able to teach models by interacting with them and more easily train on very large datasets.

It may also look at researching techniques that allow for the efficient use of data. One way to more efficiently use data is being able to work with more data, ie. data with less requirements. A classic example of this is working with unlabelled data to be able to get good performance on an image classification task where we only have a small number of labelled images available. Such techniques are good at learning ways to represent data that can be useful for a myriad of downstream applications. The area that focuses on trying to train models that are good at extracting information from data in an unsupervised way that is useful for downstream applications is called representation learning. Advantages of (unsupervised) representation models are that they allow for training models on much larger amounts of data and therefore solve problems that due to data restrictions were previously unsolvable.

The project may also look at the intersection of these two research areas, ie. continual learning from data with less requirements such as without the need for labels. This is an area not yet widely explored by researchers and could have a large impact on the kind of problems we can apply deep learning to. It could allow for models that whilst deployed can train continually on raw incoming data, forever improving and learning and without the need for lengthy data preparation.

Aims and objectives of this project include:
Develop new methods to facilitate learning in the continual setting.
Develop new methods for the efficient use of data for representation learning which includes representation learning on data with less requirements (eg. images without labels).
Develop new methods that facilitate learning in the continual setting that are efficient at using data.
Apply these methods to problems to validate their performance empirically.

Planned Impact

The COMPASS Centre for Doctoral Training will have the following impact.

Doctoral Students Impact.

I1. Recruit and train over 55 students and provide them with a broad and comprehensive education in contemporary Computational Statistics & Data Science, leading to the award of a PhD. The training environment will be built around a set of multilevel cohorts: a variety of group sizes, within and across year cohort activities, within and across disciplinary boundaries with internal and external partners, where statistics and computation are the common focus, but remaining sensitive to disciplinary needs. Our novel doctoral training environment will powerfully impact on students, opening their eyes to not only a range of modern technical benefits and opportunities, but on the power of team-working with people from a range of backgrounds to solve the most important problems of the day. They will learn to apply their skills to achieve impact by collaborative working with internal and external partners, such as via our Rapid Response Teams, Policy Workshops & Statistical Clinics.

I2. As well as advanced training in computational statistics and data science, our students will be impacted by exposure to, and training in, important cognate topics such as ethics, responsible innovation, equality, diversity and inclusion, policy, effective communication and dissemination, enterprise, impact and consultancy skills. It is vital for our students to understand that their training will enable them to have a powerful impact on the wider world, so, e.g., AI algorithms they develop should not be discriminatory, and statistical methodologies should be reproducible, and statistical results accurately and comprehensibly communicated to the general public and policymakers.

I3. The students will gain experience via collaborations with academic partners within the University in cognate disciplines, and a wide range of external industrial & government partners. The students will be impacted by the structured training programmes of the UK Academy of Postgraduate Training in Statistics, the Bristol Doctoral College, the Jean Golding Institute, the Alan Turing Institute and the Heilbronn Institute for Mathematical Sciences, which will be integrated into our programme.

I4. Having received an excellent training, the students will then impact powerfully on the world in their future fruitful careers, spreading excellence.

Impact on our Partners & ourselves.

I5. Direct impacts will be achieved by students engaging with, and working on projects with, our academic partners, with discipline-specific problems arising in engineering, education, medicine, economics, earth sciences, life sciences and geographical sciences, and our external partners Adarga, the Atomic Weapons Establishment, CheckRisk, EDF, GCHQ, GSK, the Office for National Statistics, Sciex, Shell UK, Trainline and the UK Space Agency. The students will demonstrate a wide range of innovation with these partners, will attract engagement from new partners, and often provide attractive future employment matches for students and partners alike.

Wider Societal Impact

I6. COMPASS will greatly benefit the UK by providing over 55 highly trained PhD graduates in an area that is known to be suffering from extreme, well-known, shortages in the people pipeline nationally. COMPASS CDT graduates will be equipped for jobs in sectors of high economic value and national priority, including data science, analytics, pharmaceuticals, security, energy, communications, government, and indeed all research labs that deal with data. Through their training, they will enable these organisations to make well-informed and statistically principled decisions that will allow them to maximise their international competitiveness and contribution to societal well-being. COMPASS will also impact positively on the wider student community, both now and sustainably into the future.

People

ORCID iD

Henry Bourne (Student)

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023569/1 01/04/2019 30/09/2027
2740621 Studentship EP/S023569/1 01/10/2022 18/09/2026 Henry Bourne