From trivial representations to learning concepts in AI by exploiting unique data

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Engineering

Abstract

The prospect of an AI-based revolution and its socio-economic benefits is tantalising. We want to live in a world where AI learns effectively with high performance and minimal risks. Such a world is extremely exciting.

We tend to believe that AI learns higher level concepts from data, but this is not what happens. Particularly in data such as images, AI extracts rather trivial (low-level) notions from the data even when provided with millions of examples. We often hear that providing more data with high diversity should help improve the information that AI can extract.

This data amassing does have though privacy and cost implications. Indeed, considerable cost comes also by the need to pre-process and to sanitise data (i.e. remove unwanted information).
More critically, though, in several key applications (e.g. healthcare) some events (e.g. disease) can be rare or truly unique. Collecting more and more data will not change the relative frequency of such rare data. It appears that current AI is not data efficient: it poorly leverages the goldmine of information present in unique and rare data.

This project aims to answer a key research question:
**Why does AI struggle with concepts, and what is the role of unique data? **

We suspect there are several reasons why AI struggles with concepts:

A) The mechanisms we use to extract information from data (known as representation learning) rely on very simple assumptions that do not reflect how real data exist in the world.
For example, we know that data have correlations, and we now make simplified assumptions of no correlation at all.
We propose to introduce stronger assumptions of causal relationships in the concepts we want to extract. This should in turn help us extract better information.

B) To learn any model, we do have to use optimisation processes to find the parameters of the model. We find a weakness in these processes: data that are unique and rare do not get so much attention, or if they do get some, it happens by chance.
This leads to considerable inconsistency in the extraction of information. In addition, sometimes wrong information is extracted, either because we found suboptimal representations or because we latched on some data that escaped from the sanitisation process -since no such perfect process can always be guaranteed.
We want to understand why such inconsistency exists and propose to devise methods that can ensure that when we train models, we can consistently extract information even from rare data.

There is a tight connection between B and A. Without new methods that better optimise learning functions we cannot extract representations reliably from rare data, and hence we cannot impose the causal relationships we need.

There is an additional element about this work that helps answer the second part of the question. Rare and unique data may actually reveal unique causal relationships. This is a very tantalising prospect that the work we propose aims to investigate.

There are considerable and broad rewards of the work we propose.

We put herein the underpinnings for an AI that, because it is data efficient, should not require blind amassing of data with all the privacy fears this engenders for the general public. Because it learns high-lever concepts it will be more adept to empower decision tools that can support how decisions have been reached. And because we introduce strong causal priors in extracting these concepts, we reduce the risk of learning trivial data associations.

Overall, a major goal of the AI research community is to create AI that can generalise to new unseen data beyond what was available during training time. We hope that our AI will bring us closer to this goal, thus further paving the way to broader deployment of AI to the real world.