Learning reliable representations when proxy objectives fail

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

Representation learning involves using an objective to learn a mapping from data space to a representation space. When the downstream task for which a mapping must be learned is unknown, or is too costly to cast as an objective, we must rely on proxy objectives for learning. In this Thesis I focus on representation learning for images, and address three cases where proxy objectives fail to produce a mapping that performs well on the downstream tasks.

When learning neural network mappings from image space to a discrete hash space for content-based image retrieval, a proxy objective is needed which captures the requirement for relevant responses to be nearer to the hash of any query than irrelevant ones. At the same time, it is important to ensure an even distribution of image hashes across the whole hash space for efficient information use and high discrimination. Proxy objectives fail when they do not meet these requirements. I propose using a standard classifier to predict class labels and convert these to a binary representation for state-of-the-art performance on the image retrieval task. I also propose a binary deep decision tree layer (DDTL) to model further intra-class differences and produce approximately evenly distributed hash codes. The DDTL requires no discretisation during learning and produces hash codes that enable better discrimination between data in the same class when compared to previous methods, while remaining robust to real-world augmentations in the data space.

In the scenario where we require a neural network to partition the data into clusters that correspond well with ground-truth labels, a proxy objective is needed to define how these clusters are formed. One such proxy objectives involves maximising the mutual information between cluster assignments made by a neural network from multiple views. In this context views are different augmentations of the same image and the cluster assignments are the representations computed by a neural network. I demonstrate that this proxy objective produces parameters for the neural network that are sub-optimal in that a better set of parameters can be found using the same objective and a different training method. I introduce deep hierarchical object grouping (DHOG) as a method to learn a hierarchy (in the sense of easy-to-hard orderings, not structure) of solutions to the proxy objective and show how this improves performance on the downstream task.

When there are features in the training data from which it is easier to compute class predictions (e.g., background colour), when compared to features for which it is relatively more difficult to compute class predictions (e.g., digit type), standard classification objectives (e.g., cross-entropy) fail to produce robust classifiers. The problem is that if a model learns to rely on `easy' features it will also ignore `complex' features (easy versus complex are purely relative in this case). I introduce latent adversarial debiasing (LAD) to decouple easy features from the class labels by first modelling the underlying structure of the training data as a latent representation using a vector-quantised variational autoencoder, and then I use a gradient-based procedure to adjust the features in this representation to confuse the predictions of a constrained classifier trained to predict class labels from the same representation. The adjusted representations of the data are then decoded to produce an augmented training dataset that can be used for training in a standard manner.

I show in the aforementioned scenarios that proxy objectives can fail and demonstrate that alternative approaches can mitigate against the associated failures. I suggest an analytic approach to understanding the limits of proxy objectives for every use case in order to make the adjustments to the data or the objectives and ensure good performance on downstream tasks.

Planned Impact

The proposed Centre has the potential to bring significant economic benefit to the UK. Data science has applications throughout industry, commerce, science, and the public sector. Methods based on data science are coming to underly digital commerce, energy sustainability, and digital health care. The application areas that benefit from data science are truly diverse, ranging from genome sequencing to social media, from energy analytics to translational medicine. Our broad consortium of partners reflects the huge number and variety of users of data science methods. The Centre will help to address the immense skills need for data science (see summary, above), bringing about economic benefit to the UK. A deep talent pool of data scientists is likely to provide a strong incentive for companies that require these skills to expand their operations in the UK.

The UK government has recognized the need for increased university provision in data science. The Council for Science and Technology, part of the UK government's Department of Business, Innovation, and Skills, recently recommended to Prime Minister David Cameron: "Computer science departments should work in partnership with other university departments and with the private sector to develop multidisciplinary courses with a suitable focus on building aptitude for the practical application of data science. Universities should be encouraged to develop new options including Data Science MSc and PhD programmes.'' (7 June 2013)

As additional economic benefit, the concentration of excellent students will naturally lead to exciting startups and spinouts. The University of Edinburgh is number one in the UK for spin-out and start-up creation, having recorded 250 startups and spinouts since 2000, with 47 such companies arising from the School of Informatics in the past six years. We have a rich existing infrastructure to support students in commercializing their ideas, including business training and events for connecting students with potential business partners and investors.

Additionally, there is a large potential social benefit to data science. Many charities and public sector organizations have large data sets that they wish to understand in order to create social value. A prime example of this are our project partners the City of Edinburgh Council, who wish to combine a large number of disparate resources to build a unified view of a citizen that can be used to improve social services. The skilled cadre of data scientists that will be produced by our Centre will have the potential to bring new techniques to bear on these longstanding problems.

Publications

10 25 50