Optimization methods for deep learning: training and testing in machine learning

Lead Research Organisation: University of Oxford

Department Name: Mathematical Institute

Abstract

This project falls within the EPSRC numerical analysis research areas.
This project is undertaken with Industrial Partner NAG.

Optimization problems form the modelling and numerical core of machine learning and statistical methodologies, such as in the training of supervised machine learning classifiers. In such applications, a large amount of data (feature vectors) is available that has been already classified (namely labelled). Then a parameterised classifier is selected and trained on this data, namely, values of the parameters are calculated so that the output of the classifier at the given features matches their labels in some optimal way. The ensuing classifier is then used for testing, in order to label/classify unseen data. The training problem is formulated as an optimization problem that minimises, for example, the average amount of errors that the classifier makes on the test set. Various formulations of the optimization problem are used, most commonly considering some continuous loss function to measure the error at each data point and either (deterministic) finite sum or (probabilistic) expectation of the loss terms as the total error; the latter may be convex (such as in the case of binary classification) but, more and more nowadays, it may instead be nonconvex due to the prevalence of deep learning applications. The scale of the ensuing optimization is commonly huge, with millions of parameters and terms in the objective sum of functions. This makes the calculation of a single function or gradient value prohibitively expensive and leads to the need for inexact optimization algorithms that effectively exploit problem structure. The practical method of choice in ML applications is the (batch) stochastic gradient method (Robbins-Munro, 1950) that computes only the gradients of a small, randomly chosen number of the loss terms and controls both variance and convergence by means of a predefined stepsize. A grand challenge in this area is how to augment stochastic gradient methods with inexact second order derivative information, so as to obtain more efficient methods especially in the nonconvex case of deep learning, both in terms of achieving higher accuracy but also robustness to ill-conditioning.

In this project, we will investigate ways to approximate second-order information in the finite-sum structure of ML optimization problems, from subsampling second-order derivatives to approximating them by differences in batch gradients, such as in block (stochastic) quasi-Newton approaches and Gauss-Newton methods. In place of the usual predefined stepsize, we will consider the impact of more sophisticated stepsize techniques from classical optimization that are adaptive to local optimization landscapes such as variable/adaptive trust-region radius, regularization and linesearch. We will also consider preconditioning techniques that contain inexpensive second-derivative information so as to help the performance of first order methods. We will also investigate parallel and decentralised implementations of these methods, which is a challenge especially for higher-order techniques.

Potential outcomes:
(i) State-of-the-art deep learning and optimization formulations and methods for machine learning.
(ii) Novel optimization methods that use inexact (deterministic/stochastic) problem information.
(iii) Evaluation of methods for deep neural net training and testing.

Student:

Constantin Puiu

Period of Study:

Oct 19 - Sep 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2282418

Research Topic:

Unclassified

Organisations

People	ORCID iD
Coralia Cartis (Primary Supervisor)
Constantin Puiu (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/R513295/1			01/10/2018	30/09/2023
2282418	Studentship	EP/R513295/1	01/10/2019	30/09/2023	Constantin Puiu

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects