Exploring Efficient Automated Design Choices for Robust Machine Learning Algorithms

Lead Research Organisation: University of Liverpool
Department Name: Electrical Engineering and Electronics

Abstract

Are you familiar with machine learning? Do you have an aptitude for analysing and disseminating information from a variety of outcomes? Would you like to assist GCHQ develop and design new algorithms for both time-efficiency and energy-efficiency solutions? Are you keen on developing novel approaches and using modern computing architectures that make it easy to apply Deep Learning and Gaussian Processes to real problems?

Applying Machine Learning (ML) currently requires the data scientist to make design choices. These choices might relate, for example, to: choosing the number of layers and the number of neurons in each layer of a Deep Neural Network; choosing which kernel family to use in a Gaussian Process. Since ML algorithms often involve time-consuming training regimes, data scientists often find it laborious to iterate between (re)-identifying candidate design choices and (re)-training the ML algorithms. Furthermore, different design choices can alter both how many hyper-parameters (e.g. neuron weights or kernel widths and cross-covariance terms) need to considered but also how challenging it is to optimise the hyper-parameters of the ML algorithm. Since practitioners have limited time to perform sensitivity analyses with respect to these parameters, design choice are typically based on estimated performance (calculated as an average over the test set) with very limited, if any, consideration for the variance in this estimate. It is important that this variance is considered since it will determine how likely it is that performance on the test set will accurately predict empirical performance when the algorithm is deployed operationally. Indeed, robust performance requires that we do not optimise the hyper-parameters (e.g. using stochastic gradient descent) but generate a set of samples for the hyper-parameters that are consistent with the data and then average across these sampled values for the hyper-parameters.

Numerical Bayesian algorithms exist that can explore the design choices and the possible hyper-parameter values associated with each design choice. Mature variants of these algorithms exist and involve the use of Markov-Chain Monte Carlo (MCMC), with Reversible Jump MCMC (RJMCMC) being a variant applicable in contexts where the design choice alters the number of hyper-parameters that need to be considered. In general, and particularly in the case of RJMCMC, these mature algorithms are sufficiently slow and computationally demanding that they are widely assumed to be impractical for practical use in real-world scenarios.

Recent advances at the University of Liverpool have identified that Sequential Monte Carlo (SMC) samplers are an alternative family of numerical Bayesian algorithms that offer the potential to improve on both the time-efficiency and energy-efficiency of MCMC algorithms. In this context, SMC samplers can be considered to comprise a team of sub-algorithms that collaborate to explore the space of design choices and associated hyper-parameters. By distributing the sub-algorithms across parallel computational resources, SMC samplers can improve time-efficiency. Since the sub-algorithms only need to avoid all failing at once, they can each be more adventurous in their exploration than the single MCMC algorithm: this can lead to energy-efficiency gains. Perhaps surprisingly, the potential for SMC samplers to automate design choices, while also exploring the associated hyper-parameter values, is largely unexplored. This PhD will investigate the significant potential to apply SMC samplers in this context.

Planned Impact

This CDT's focus on using "Future Computing Systems" to move "Towards a Data-driven Future" resonates strongly with two themes of non-academic organisation. In both themes, albeit for slightly different reasons, commodity data science is insufficient and there is a hunger both for the future leaders that this CDT will produce and the high-performance solutions that the students will develop.

The first theme is associated with defence and security. In this context, operational performance is of paramount importance. Government organisations (e.g., Dstl, GCHQ and the NCA) will benefit from our graduates' ability to configure many-core hardware to maximise the ability to extract value from the available data. The CDT's projects and graduates will achieve societal impact by enabling these government organisations to better protect the world's population from threats posed by, for example, international terrorism and organised crime.

There is then a supply chain of industrial organisations that deliver to government organisations (both in the UK and overseas). These industrial organisations (e.g., Cubica, Denbridge Marine, FeatureSpace, Leonardo, MBDA, Ordnance Survey, QinetiQ, RiskAware, Sintela, THALES (Aveillant) and Vision4ce) operate in a globally competitive marketplace where operational performance is a key driver. The skilled graduates that this CDT will provide (and the projects that will comprise the students' PhDs) are critical to these organisations' ability to develop and deliver high-performance products and services. We therefore anticipate economic impact to result from this CDT.

The second theme is associated with high-value and high-volume manufacturing. In these contexts, profit margins are very sensitive to operational costs. For example, a change to the configuration of a production line for an aerosol manufactured by Unilever might "only" cut costs by 1p for each aerosol, but when multiplied by half a billion aerosols each year, the impact on profit can be significant. In this context, industry (e.g., Renishaw, Rolls Royce, Schlumberger, ShopDirect and Unilever) is therefore motivated to optimise operational costs by learning from historic data. This CDT's graduates (and their projects) will help these organisations to perform such data-driven optimisation and thereby enable the CDT to achieve further economic impact.

Other organisations (e.g., IBM) provide hardware, software and advice to those operating in these themes. The CDT's graduates will ensure these organisations can be globally competitive.

The specific organisations mentioned above are the CDT's current partners. These organisations have all agreed to co-fund studentships. That commitment indicates that, in the short term, they are likely to be the focus for the CDT's impact. However, other organisations are likely to benefit in the future. While two (Lockheed Martin and Arup) have articulated their support in letters that are attached to this proposal, we anticipate impact via a larger portfolio of organisations (e.g., via studentships but also via those organisations recruiting the CDT's graduates either immediately after the CDT or later in the students' careers). Those organisations are likely to include those inhabiting the two themes described above, but also others. For example, an entrepreneurial CDT student might identify a niche in another market sector where Distributed Algorithms can deliver substantial commercial or societal gains. Predicting where such niches might be is challenging, though it seems likely that sectors that are yet to fully embrace Data Science while also involving significant turn-over are those that will have the most to gain: we hypothesise that niches might be identified in health and actuarial science, for example.

As well as training the CDT students to be the leaders of tomorrow in Distributed Algorithms, we will also achieve impact by training the CDT's industrial supervisors.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023445/1 01/04/2019 30/09/2027
2748823 Studentship EP/S023445/1 01/10/2022 30/09/2026 Dominika Soltysik