Semi-supervised learning of deep hierarchical hidden representations

Lead Research Organisation: University of Bristol
Department Name: Computer Science


Until the end of the 20th century, most of the computer programs were manually
implemented to perform repetitive tasks that could be automated, thus alleviating human
work. However, at the end of the century, the field of Machine Learning emerged in order to
create algorithms that could generate programs automatically by means of data and
examples. These methods together with an exponential growth of available data and
computational power allowed the training deep hierarchical models. Nowadays, deep
hierarchical models are achieving and occasionally surpassing human performance on a
variety of tasks like object recognition, automatic translation, speech recognition,
autonomous transportation and medical applications.
One of the main problems of the current state-of-the-art models is that they need fully
annotated data to solve any specific task. This type of problems is known as Supervised
Learning tasks. For this reason, one of the bottlenecks for training these models is the
generation of good and large datasets, as they require lots of manual annotation.
To solve this problem, the field of Semi-Supervised learning uses data that has not been
annotated in order to help the Supervised Learning part. For example, in problems where
labels are scarce, it is possible to use unlabeled data to learn hierarchical hidden
representations that can be used to improve the performance of Supervised models. New
methods are still being investigated and this is one of the main topics of this Ph.D. Another
problem is that most of the current literature in Machine Learning assumes that data
available during the training of the models follows the same distribution as the future data
available during the deployment time. However, this assumption is only true in a few
controlled scenarios; for example in a closed factory. On the contrary, most of the real case
scenarios evolve and change with new objects, words or patterns. For this reason, it is
important to provide Machine Learning models with the ability to notify when new patterns
appear, thus avoiding possible mistakes.
This Ph.D. proposes to address this problem by means of Semi-Supervised Learning
techniques. Using new techniques we want to enhance existent models by giving them the
ability to discern between known and unknown patterns. In such a way that models are able
to manifest the confidence on their predictions. This is important in order to make confident
predictions given familiar patterns while being able to ask for further inspection otherwise.
In conclusion, there is an increasing popularity of machine learning models that learn deep
hierarchical hidden representations of the data. These models are being applied in a wide
range of problems, some of which have important implications. However, they need large
amounts of annotated data and are not able to output confidence values in their predictions.
For that reason, the topic of this Ph.D. is to improve models using unlabeled data, make
them aware of new situations, and able to avoid uninformed decisions.

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509619/1 01/10/2016 30/09/2021
1793885 Studentship EP/N509619/1 19/09/2016 18/03/2020 Miquel Perello Nieto
Description In the field of Machine Learning, multi-class classification is a really important task. This tasks consists on creating mathematical models that are able to classify instances into different categories (eg. an autonomous car needs to classify objects into pedestrians, cars, animals, and traffic signs; between others). However, in most of the literature the data available during the training of the models is assumed to be a good representation of any future example. This fact may lead to multiple issues like Google apologising for non-appropriate automatic photo tagging. We have proposed a new generic method that equips arbitrary probabilistic classifiers with the ability to discern between predictions that are; or are not; similar to previous examples. We have published an article in the 16th International Conference on Data Mining (ICDM 2016) demonstrating that our proposed method can be applied into multiple scenarios, and it performs equal; and in occasions outperforms; state-of-the-art and non-generic approaches. Similarly, it is important that the probabilities output by a multiclass classifier are a good representation of the expected proportions of the corresponding classes. We have proposed a new method which enhances the current state-of-the-art methods, and we have presented it in the Neural Information Processing Systems Conference 2020, with title "Beyond Temperature Scaling: Obtaining well-calibrated multiclass probabilities with Dirichlet calibration".

Another important concern about training multic-lass classification models is the requirement of having good annotations. These annotations are most of the time human annotations, and are time consuming, and in occasions may require expert annotators (eg. deciding if a mammography presents breast cancer). There are related tasks like Semi-supervised learning in which the use of non-annotated data can be included during the training of the models to improve their performance. However, this is research that is currently being developed and it is not clear in what circumstances an improvement can be achieved. On the other hand, we propose to use annotations with different degrees of quality. In this scenario, we would be able to collect cheaper annotations by accepting certain number of labelling mistakes. This could be done by crowd-sourcing the annotations, obtaining automatic annotations with machine learning models, or allowing to annotate coarse labels instead of more fine labels (eg. indicating that a picture includes a mammal or a plant, instead of specifying which animal or plant). For that reason, we are studying the empirical applicability of a set of theoretical results that allow the use of annotations with different levels of quality (namely weak labels). The empirical results showed that with real data and different types of noise it was possible to obtain good results. The outcome of this finding was also published in the Advances in Intelligent Data Analysis XVI: 16th International Symposium, IDA 2017, London. We also show in our recent article "Recycling weak labels for multiclass classification" published in the Journal of Neurocomputing 2020, that it is possible to aggregate labels with different quality into a larger dataset, and obtain better performance than only ussing a smaller but perfectly labelled dataset.
Exploitation Route It is important to understand what are the limitations of the common machine learning classification methods. With my current work, researchers can use the created tools to improve the interpretability of these models. It is also possible to benefit from my work by reducing the costs of manual annotations, as I demonstrated empirically under what types of weak annotations it is still possible to train machine learning classifiers. The applicability of these techniques have a great span, from researchers on machine learning, to companies driven by current use cases.
Sectors Digital/Communication/Information Technologies (including Software)