Optimization for Machine Learning

Lead Research Organisation: University of Oxford

Department Name: Computer Science

Abstract

This project falls within the EPSRC Artificial Intelligence research area. It falls thereby within the Information and Communication Technologies (ICT) research theme.

Machine learning is currently a very active area of research. It is a pervasive field that has found applications in almost every area of knowledge. This means that the development of new ideas or the improvement of the current state of the art can have a rapid impact in many areas. Many heuristic algorithms are used in practice and a major challenge is understanding why these heuristics work and in which contexts they work. This would allow, among other things, to select better algorithms to tackle a particular problem and would facilitate the design of new ones that improve on the current state of the art or alternatively could lead to guarantees that show the impossibility to improve within a certain regime.

The aim of this project is to research open problems in optimization for machine learning. Unlike in other branches of optimization, the interest here is in computing only as an approximation of the optimum value of functions to be optimized. Moreover, these functions have a particular structure. A notable example is in the training of neural networks in supervised machine learning. Here, one is interested in learning (minimizing) a function over a labeled instance space, computed as an expectation of a loss function under an unknown distribution from which samples can be drawn. Usually one takes the Empirical Risk Minimization (ERM) approach and approximately minimizes the average of the loss function over several samples. Getting arbitrary close to a global optimum is impractical. Naturally, there arises the problem of finding algorithms that provably find local minima of smooth functions with smooth Hessians, possibly non convex, with high probability. It is an open problem to give an algorithm with a lower bound that matches the complexity of the algorithm, both in the case in which one has a deterministic oracle that returns the function and gradient evaluations of a point and the case in which one has a stochastic oracle which can be modeled by the deterministic oracle plus a zero mean noise with bounded
variance. The stochastic oracle model is particularly important, since it can model the computation of a noisy gradient by taking the sum of a random subset of the gradients of the functions comprising the ERM objective, which is convenient for efficiency and can be implemented in a distributed manner. In particular, algorithms whose complexity does not depend on the number of samples in ERM can be obtain using the stochastic oracle.

The methodology will make use of new theoretical tools and insights developed in this branch optimization. Some examples are: a better understanding of acceleration phenomena via a coupling of dual and primal algorithms, new solutions for the online eigenvector problem that allow for faster identification of negative curvature of functions and fast detection and avoidance of saddle points, and noise analysis to transform iterative optimization methods that use Hessian vector products into methods that use first order information only, without deteriorating the complexity of the original method.

My main collaborators are my supervisor and co-supervisor: Varun Kanade and Patrick Rebeschini.

Student:

David Martinez

Period of Study:

Sep 18 - Sep 22

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2053152

Research Topic:

Unclassified

Organisations

University of Oxford (Lead Research Organisation)

People	ORCID iD
David Martinez (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Criado F. (2022) Fast Algorithms for Packing Proportional Fairness and its Dual in Advances in Neural Information Processing Systems

Lezcano-Casado M. (2019) Cheap orthogonal constraints in neural networks: A simple parametrization of the orthogonal and unitary group in 36th International Conference on Machine Learning, ICML 2019

Martínez-Rubio D. (2022) Global Riemannian Acceleration in Hyperbolic and Spherical Spaces in Proceedings of Machine Learning Research

Martínez-Rubio D. (2019) Decentralized cooperative stochastic bandits in Advances in Neural Information Processing Systems

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509711/1			30/09/2016	29/09/2021
2053152	Studentship	EP/N509711/1	30/09/2018	29/09/2022	David Martinez
EP/R513295/1			30/09/2018	29/09/2023
2053152	Studentship	EP/R513295/1	30/09/2018	29/09/2022	David Martinez

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects