Geospatial machine learning in safety critical systems

Lead Research Organisation: Swansea University
Department Name: College of Science

Abstract

To meet the project aim, the project will fulfil the following objectives (described as research questions) What is the minimum volume of labelled data needed in order to achieve an accurate, geo-generalisable model?
How many samples required for machine learning is a currently active and open research question. Recent results have examined the theoretical sample complexity
But in this project it is the reliability of the labels of the training data that is in question
How can User Interfaces facilitate the labelling of large datasets? Our prior expertise is in the domain of effective user interfaces for labelling large time-series data sets [3,4,5]. We will extend our research into this application area and integrate further tools to solve the problems below
How can variation in the interpretation of imagery by labellers be measured, and what degree of confidence should we have in a labelled dataset?Non-expert labellers could produce adequate labels with fewer resources compared to experts, but a research question is how much we could rely on those labels, and whether we could track any uncertainty through the training to the model. Current approaches examine consensus labelling or label noise from various angles, but this is still an open area of research [6,7]
What impact does a label corpus's characteristics have on model accuracy? Once an initial model is trained, various metrics can be used to describe the accuracy and success of the model. We need to analyse performance in the presence of label noise and label bias. Label noise, mislabelling or bias impacts models [8,9] and some mitigation factors are known. We need to ascertain what proportion of the model's performance is a result of inaccuracy, uncertainty or bias in the labelled data and how bias, accuracy and uncertainty can be formally quantified and propagated through different stages of a model's development. Model training, testing and refinement is ideally a non-linear process, instead model training is iterative as each stage of model creation informs the next (and previous steps in a loop). Techniques such as re-labelling, resampling and pseudo-labelling are available to update a labelled dataset. Based on assessments of initial model performance:

Planned Impact

The Centre will nurture 55 new PhD researchers who will be highly sought after in technology companies and application sectors where data and intelligence based systems are being developed and deployed. We expect that our graduates will be nationally in demand for two reasons: firstly, their training occurs in a vibrant and unique environment exposing them to challenging domains and contexts (that provide stretch, ambition and adventure to their projects and capabilities); and, secondly, because of the particular emphasis the Centre will put on people-first approaches. As one of the Google AI leads, Fei-Fei Li, recently put it, "We also want to make technology that makes humans' lives better, our world safer, our lives more productive and better. All this requires a layer of human-level communication and collaboration" [1]. We also expect substantial and attractive opportunities for the CDT's graduates to establish their careers in the Internet Coast region (Swansea Bay City Deal) and Wales. This demand will dovetail well with the lifetime of the Centre and provide momentum for its continuation after the initial EPSRC investment.

With the skills being honed in the Centre, the UK will gain a important competitive advantage which will be a strong talent based-pull, drawing in industrial investment to the UK as the recognition of and demand for human-centred interactions and collaborations with data and intelligence multiplies. Further, those graduates who wish to develop their careers in the academy will be a distinct and needed complement to the likely increased UK community of researchers in AI and big data, bringing both an ability to lead insights and innovation in core computer science (e.g., in HCI or formal methods) allied to talents to shape and challenge their research agenda through a lens that is human-centred and that involves cross-disciplinarity and co-creation.

The PhD training will be the responsibility of a team which includes research leaders in the application of big data and AI in important UK growth sectors - from health and well being to smart manufacturing - that will help the nation achieve a positive and productive economy. Our graduates will tackle impactful challenges during their training and be ready to contribute to nationally important areas from the moment they begin the next steps of their careers. Impact will be further embedded in the training programme with cohorts involved in projects that directly involve communities and stakeholders within our rich innovation ecology in Swansea and the Bay region who will co-create research and participate in deployments, trials and evaluations.

The Centre will also impact by providing evidence of and methods for integrating human-centred approaches within areas of computational science and engineering that have yet to fully exploit their value: for example, while process modelling and verification might seem much removed from the human interface, we will adapt and apply methods from human-computer interaction, one of our Centre's strengths, to develop research questions, prototyping apparatus and evaluations for such specialisms. These valuable new methodologies, embodied in our graduates, will impact on the processes adopted by a wide range of organisations we engage with and who our graduates join.

Finally, as our work is fully focused on putting the human first in big data and intelligent systems contexts, we expect to make a positive contribution to society's understandings of and involvement with these keystone technologies. We hope to reassure, encourage and empower our fellow citizens, and those globally, that in a world of "smart" technology, the most important ingredient is the human experience in all its smartness, glory, despair, joy and even mundanity.

[1] https://www.technologyreview.com/s/609060/put-humans-at-the-center-of-ai/

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S021892/1 01/04/2019 30/09/2027
2284850 Studentship EP/S021892/1 01/10/2019 30/09/2023 Tulsi Patel