Towards educational machine learning

Lead Research Organisation: University of Cambridge
Department Name: Computer Science and Technology

Abstract

Natural language processing (NLP) is undergoing a paradigm shift with the rise of Transformer-based models (e.g. BERT, GPT-3, etc.) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks via fine-tuning (Rogers et al., 2020). While these models dominate a plethora of standard benchmarks (Wang et al., 2019a), there remain central open questions on improving robustness (Bau et al., 2017; Madry et al., 2018), making training efficient (Kornblith et al., 2019; Raghu et al., 2020), and ensuring better generalisation (Zhang et al., 2017). A promising approach for addressing these issues is machine teaching (Zhu, 2015), in which meta-information about a task is provided to a model to better guide the training process and improve downstream performance.
For example, a common assumption in machine learning is that training data are generated by some uninformative process (i.e. i.i.d. sampling). This might be justified in certain cases (e.g. collecting naturally occurring examples). However, there are also instances where the generative process for the data is informative. For example, we might
want to helpfully teach a model by feeding it the right examples. Shafto et al. (2014) demonstrated that teaching data have very different properties and behaviour compared to uninformatively generated data. Thus, educating a machine learning model is an attempt to maximise certain outcomes of the learning process (e.g. re-weighting examples, designing loss functions, etc.) in relation to the tasks of interest. In other words, if two models use the same learning mechanism, the one that receives better education will perform better the tasks of interest. Therefore, there are many compelling reasons to study machine teaching for training models: (i) it gives insights into the intrinsic difficulty of teaching certain tasks;
(ii) it provides a lower bound on the number of samples needed for training;
(iii) teaching can be used to design models that better leverage highly informative samples.
Therefore, the goal of this research proposal is to gain insights into the learning dynamics of deep neural networks and improve downstream performance by utilising task-specific meta-information to deliver supervision signals to guide the training process. In doing so, we intend to explore a wide range of applications, from lifelong learning, to automating data augmentation, to enabling faster learning with synthetic data. Ultimately, in this research proposal, we argue that equal attention, if not more, should be paid to machine teaching, besides viewing the inductive biases of the model, and/or the properties of the optimiser as central to the success of deep learning.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
ES/P000738/1 01/10/2017 30/09/2027
2616041 Studentship ES/P000738/1 01/10/2020 30/09/2023 Michail Korakakis