Generalisation and expressiveness for over-parameterised neural networks

Lead Research Organisation: University of Oxford

Department Name: Statistics

Abstract

More than half a century has passed since the mathematical analysis of learning processes truly began, that is when F. Rosenblatt proposed the first model of learning machine, the Perceptron. Since then, huge progresses have been achieved in learning theory, both from theoretical and practical perspectives.
However the development of deep neural networks, able to perform better than any theoretical expectation, has underlined the lack of a complete mathematical theory, capable of giving an enlightening insight on the mechanisms underlying the learning process.
The basic idea behind learning theory is to find a way to quantify the generalisation capacity of an algorithm, meaning the ability to extrapolate, from a finite and relatively small training dataset, some unknown rule allowing to coherently classify data which are not part of the original training set.
A learning algorithm is usually evaluated via the introduction of a loss function, measuring the discrepancy between the correct answer and the answer given by the learning machine. The expectation of this function on the whole set of existing data is called the risk functional. However, in practice there's no way to evaluate it directly, and one has to deal with its approximation evaluated on the training set only, the empirical risk.
An efficient learning theory should be able to give tight probability bounds on the difference (generalisation error) between the empirical risk and the risk functional.
The Vapnik-Chervonenkis theory, which has been the main direction of theorists' work for about three decades (from the 60's to the 90's), cannot explain the high generalisation capacity of modern deep neural networks, where the machine is able to infer many more parameters than the available training data.
Among the theory developed in the last 20 years, of relevant interest is the PAC-Bayes approach, PAC standing for Probably Almost Correct. The main idea in PAC-Bayes is to consider a probability distribution (prior) on the hypothesis space (meaning the space encoding all the possible "classification rules" that the algorithm might choose) which is independent of the training set data. Another distribution (posterior) has to be chosen afterwards and, thanks to a well known result by McAllester, it is then possible to quantify the distance between empirical risk and risk functional in terms of the KL divergence between prior and posterior.
In a recent work, Dziugaite and Roy have shown that by optimising a PAC-Bayes bound it is possible to compute nonvacuous numerical bounds for generalisation error.
The main goal of this doctoral project is to try to use the PAC-Bayes framework in order to get new theoretical insights allowing for a better understanding of the underlying learning mechanisms in deep neural networks.

Student:

Eugenio Clerico

Period of Study:

Oct 19 - Apr 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2278529

Research Topic:

Unclassified

Organisations

University of Oxford (Lead Research Organisation)

People	ORCID iD
GEORGIOS DELIGIANNIDIS (Primary Supervisor)	http://orcid.org/0000-0002-0821-4607
Eugenio Clerico (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509711/1			01/10/2016	30/09/2021
2278529	Studentship	EP/N509711/1	01/10/2019	21/04/2023	Eugenio Clerico
EP/R513295/1			01/10/2018	30/09/2023
2278529	Studentship	EP/R513295/1	01/10/2019	21/04/2023	Eugenio Clerico

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects