Towards educational machine learning

Lead Research Organisation: University of Cambridge

Department Name: Computer Science and Technology

Abstract

Natural language processing (NLP) is undergoing a paradigm shift with the rise of Transformer-based models (e.g. BERT, GPT-3, etc.) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks via fine-tuning (Rogers et al., 2020). While these models dominate a plethora of standard benchmarks (Wang et al., 2019a), there remain central open questions on improving robustness (Bau et al., 2017; Madry et al., 2018), making training efficient (Kornblith et al., 2019; Raghu et al., 2020), and ensuring better generalisation (Zhang et al., 2017). A promising approach for addressing these issues is machine teaching (Zhu, 2015), in which meta-information about a task is provided to a model to better guide the training process and improve downstream performance.
For example, a common assumption in machine learning is that training data are generated by some uninformative process (i.e. i.i.d. sampling). This might be justified in certain cases (e.g. collecting naturally occurring examples). However, there are also instances where the generative process for the data is informative. For example, we might
want to helpfully teach a model by feeding it the right examples. Shafto et al. (2014) demonstrated that teaching data have very different properties and behaviour compared to uninformatively generated data. Thus, educating a machine learning model is an attempt to maximise certain outcomes of the learning process (e.g. re-weighting examples, designing loss functions, etc.) in relation to the tasks of interest. In other words, if two models use the same learning mechanism, the one that receives better education will perform better the tasks of interest. Therefore, there are many compelling reasons to study machine teaching for training models: (i) it gives insights into the intrinsic difficulty of teaching certain tasks;
(ii) it provides a lower bound on the number of samples needed for training;
(iii) teaching can be used to design models that better leverage highly informative samples.
Therefore, the goal of this research proposal is to gain insights into the learning dynamics of deep neural networks and improve downstream performance by utilising task-specific meta-information to deliver supervision signals to guide the training process. In doing so, we intend to explore a wide range of applications, from lifelong learning, to automating data augmentation, to enabling faster learning with synthetic data. Ultimately, in this research proposal, we argue that equal attention, if not more, should be paid to machine teaching, besides viewing the inductive biases of the model, and/or the properties of the optimiser as central to the success of deep learning.

Student:

Michail Korakakis

Period of Study:

Oct 20 - Sep 23

Funder:

ESRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2616041

Research Topic:

Unclassified

Organisations

University of Cambridge (Lead Research Organisation)

People	ORCID iD
Andreas Vlachos (Primary Supervisor)
Michail Korakakis (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
ES/P000738/1			01/10/2017	30/09/2027
2616041	Studentship	ES/P000738/1	01/10/2020	30/09/2023	Michail Korakakis

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects