Incisive Tagging: Humans-in-the-Loop in Selection and Labelling of Remote Sensing Data Sets

Lead Research Organisation: Swansea University
Department Name: College of Science

Abstract

We are developing a pre-trained deep neural network to function 'under the hood' of multiple solutions to extract geospatial information from remote sensing imagery. So far, better results are achieved with larger data sets. However, we observe that much of the imagery contains little apparent unique information and are interested in developing a way to select only the pivotal examples for training. Further, we are keen to work more effectively with our in-house image interpretation experts - using their experience and specific abilities in ways that are rewarding and motivating. Within our work, image interpreters may be labelling data for training networks - and hence may be best deployed labelling the most pivotal examples. Another application is to present image interpreters with multiple examples of image clips that highly activate specific parts of the network and ask them to provide their own interpretation of the representations learned by the deep network. Our questions (not all of which may be addressed in this PhD): Can we improve sample efficiency? Even with an unsupervised target Can we actively label data? Presenting humans with the most pivotal examples for labelling Can we make labelling enjoyable? Using our experts most effectively?

Aims and Intended Impact
More efficient training and updating of machine learning models with remote sensing data
More efficient and rewarding labelling of training examples
More human-interpretable neural networks

This project will be grounded in investigating the interplay of human creativity, intelligence and fulfilment with efficient ML tools. The work will begin and iterate around deep and intensive understandings of the labellers' perspectives of the task; leading to prototypes and evaluations. These prototypes might involve novel gamification (e.g. [1]); visualisation techniques; or even the use of multiple modalities - e.g. from simple gestures to emotional state recognition [2] - to provide input to ML tools. The interaction design of the labelling tool will inform and be informed by algorithmic innovations within the ML tool. For instance: One approach to making the task more efficient, immersive and less onerous would be to make spotting of pivotal examples easier. So, for example, we can present samples clustered on similarity as groups of thumbnails, allowing the labeller to spot outliers faster. Another strategy might be to use active labelling - i.e. given a small labelled data set can we present larger sets to users and gain their feedback to (a) label large amounts of data quickly
and (b) resultingly make the labelling task less onerous. We might also consider how to improve sample efficiency - that is, reducing the number of samples required without reducing the efficacy of the approach. Some theoretical
models on sample complexity have been investigated [3]. Monte Carlo techniques would be a suggested research direction. For example, Importance Sampling has long been studied in Path Tracing and more advanced techniques such as Hamiltonian Monte Carlo or Gradient Domain [4] demonstrate orders of magnitude performance gains through a great reduction in required samples. Ensembles are used to increase robustness and stability which lend well to importance sampling. We could also examine the literature on robust statistics and M-estimators as methods for drawing samples (see [5] for a recent review). In these sorts of investigation, we will draw on the labelers' experience and insights to supplement any quantitative or theoretical evaluations of the power or limitations of the proposed approaches. The human-centered improvements discussed above could also drive machine performance improvement. Models suffer from needing long training times. There is a potential that the current problem size can be compressed so it just fits into GPU memory to
improve cache coherency during training

Planned Impact

The Centre will nurture 55 new PhD researchers who will be highly sought after in technology companies and application sectors where data and intelligence based systems are being developed and deployed. We expect that our graduates will be nationally in demand for two reasons: firstly, their training occurs in a vibrant and unique environment exposing them to challenging domains and contexts (that provide stretch, ambition and adventure to their projects and capabilities); and, secondly, because of the particular emphasis the Centre will put on people-first approaches. As one of the Google AI leads, Fei-Fei Li, recently put it, "We also want to make technology that makes humans' lives better, our world safer, our lives more productive and better. All this requires a layer of human-level communication and collaboration" [1]. We also expect substantial and attractive opportunities for the CDT's graduates to establish their careers in the Internet Coast region (Swansea Bay City Deal) and Wales. This demand will dovetail well with the lifetime of the Centre and provide momentum for its continuation after the initial EPSRC investment.

With the skills being honed in the Centre, the UK will gain a important competitive advantage which will be a strong talent based-pull, drawing in industrial investment to the UK as the recognition of and demand for human-centred interactions and collaborations with data and intelligence multiplies. Further, those graduates who wish to develop their careers in the academy will be a distinct and needed complement to the likely increased UK community of researchers in AI and big data, bringing both an ability to lead insights and innovation in core computer science (e.g., in HCI or formal methods) allied to talents to shape and challenge their research agenda through a lens that is human-centred and that involves cross-disciplinarity and co-creation.

The PhD training will be the responsibility of a team which includes research leaders in the application of big data and AI in important UK growth sectors - from health and well being to smart manufacturing - that will help the nation achieve a positive and productive economy. Our graduates will tackle impactful challenges during their training and be ready to contribute to nationally important areas from the moment they begin the next steps of their careers. Impact will be further embedded in the training programme with cohorts involved in projects that directly involve communities and stakeholders within our rich innovation ecology in Swansea and the Bay region who will co-create research and participate in deployments, trials and evaluations.

The Centre will also impact by providing evidence of and methods for integrating human-centred approaches within areas of computational science and engineering that have yet to fully exploit their value: for example, while process modelling and verification might seem much removed from the human interface, we will adapt and apply methods from human-computer interaction, one of our Centre's strengths, to develop research questions, prototyping apparatus and evaluations for such specialisms. These valuable new methodologies, embodied in our graduates, will impact on the processes adopted by a wide range of organisations we engage with and who our graduates join.

Finally, as our work is fully focused on putting the human first in big data and intelligent systems contexts, we expect to make a positive contribution to society's understandings of and involvement with these keystone technologies. We hope to reassure, encourage and empower our fellow citizens, and those globally, that in a world of "smart" technology, the most important ingredient is the human experience in all its smartness, glory, despair, joy and even mundanity.

[1] https://www.technologyreview.com/s/609060/put-humans-at-the-center-of-ai/

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S021892/1 01/04/2019 30/09/2027
2440657 Studentship EP/S021892/1 10/10/2020 30/09/2024 Michael Johns