Data Deletion in Machine Learning
Lead Research Organisation:
University of Bristol
Department Name: Computer Science
Abstract
The overfitting of supervised machine learning models can result in a model that learns the training data too well. As a consequence, this allows an attacker to learn private membership or attributes about the training data, thus causing the ML models and its output to become indirect stores of the training data. My research project attempts to address this vulnerability by splitting the interdisciplinary problem space into three phases; Phase 1: Legal and Research Investigation, Phase 2: Proposing a Framework for Deletion and Phase 3: Development and Evaluation.
The first phase seeks to contextualise the problem of enforcing GDPR's Art.17 (Right to Erasure) within ML. Through a collaborative study with Bristol Law School, we consider a specific use case involving personal information in the training data set. The second phase involves a critical analysis of the state-of-the-art in ML data deletion techniques, as well as evaluating other methods including anonymisation and machine unlearning.
The key objective from these phases is to gather a formal understanding of what could be (or should) be expected to build technical solutions for data deletion, both within the models training set and the ML model itself. This will aid progress into the final phase to develop a ML data deletion solution that considers the obligations of regulation and addresses the gaps.
The first phase seeks to contextualise the problem of enforcing GDPR's Art.17 (Right to Erasure) within ML. Through a collaborative study with Bristol Law School, we consider a specific use case involving personal information in the training data set. The second phase involves a critical analysis of the state-of-the-art in ML data deletion techniques, as well as evaluating other methods including anonymisation and machine unlearning.
The key objective from these phases is to gather a formal understanding of what could be (or should) be expected to build technical solutions for data deletion, both within the models training set and the ML model itself. This will aid progress into the final phase to develop a ML data deletion solution that considers the obligations of regulation and addresses the gaps.
Planned Impact
Who will benefit?
The inter-disciplinary doctoral graduates trained within the CDT will play a key role in addressing the acute shortage of highly skilled workers in this area, hence meeting industry and government needs. The research they will conduct in the CDT and their future work will strongly impact industry, government, academia and society. Industrial applications cover those involving large-scale, socio-technical infrastructures where resilience-at-scale is a fundamental need, such as, intelligent transportation, finance, digital healthcare, energy generation & distribution and advanced manufacturing. The globally unique capacity focusing on TIPS-at-Scale will position the UK as a world-leader, offering major economic benefits by ensuring that the UK is a safe place in which to do business, and social benefits in terms of security and privacy of the individual.
More specifically, the CDT's research and training programme will provide graduates with capabilities to address socio-technical challenges of TIPS-at-Scale, including understanding of user and adversarial behaviours. This is of major importance to digital infrastructure providers, government agencies and law enforcement agencies. This is in addition to the wider business and health sectors where the protection of data and the physical processes controlled by large-scale infrastructure is vital. Research on resilience in partially-trusted environments will lead to new architectures and new technologies to significantly enhance integrity and resilience, including new authentication methods and trust models. Research on empirically-grounded assurances for TIPS will break new ground by providing new interdisciplinary techniques and design principles to underpin infrastructures of the future. Last, but by no means least, by embedding Responsible Innovation into the programme throughout, the CDT ensures that TIPS-at-Scale approaches take a values-based view that considers TIPS across the full lifecycle of digital infrastructures: from conception to design, implementation and deployment through to maintenance, evolution and decommissioning. Such a Responsible Innovation approach will benefit society-at-large.
How will they benefit?
There is a critical need within the UK for a new breed of researchers and future leaders, equipped with a breadth of interdisciplinary skills to tackle TIPS issues at play in future infrastructures and a depth of knowledge, drawing upon interdisciplinary skills, to develop novel and innovative solutions to address TIPS-at-Scale. The CDT will produce a pipeline of such researchers and leaders trained to PhD level. It will build on very strong existing links with organisations such as Vodafone, Google, HP, Airbus , Thales, Symantec, IBM, Babcock, NCC Group, Altran, Wessex Water, Cybernetica and Embecosm, all of which have contributed to co-creation of the CDT and are committed to close engagement with it. Both universities will use their business development teams to further engage with these and other relevant organisations. Major opportunities for generating economic and societal benefits exist with the planned Temple Quarter Enterprise Campus of University of Bristol (due to open in 2021) - with a focus on co-creation of a suite of PG training programmes with industry - and the Bath Innovation Centre. The CDT will also leverage the various impact channels of the three EPSRC-NCSC Research Institutes, the PETRAS Hub and the CREST Centre in which the two Universities play a major role. Both universities already have research and PhD studentships directly funded by industry and agencies such as DSTL, NCSC and GCHQ as well as iCASE awards hence close relationships already exist to maximise impact. The CDT will also organise public debates and social media campaigns to encourage public participation and shaping of TIPS-at-scale discussions and solutions.
The inter-disciplinary doctoral graduates trained within the CDT will play a key role in addressing the acute shortage of highly skilled workers in this area, hence meeting industry and government needs. The research they will conduct in the CDT and their future work will strongly impact industry, government, academia and society. Industrial applications cover those involving large-scale, socio-technical infrastructures where resilience-at-scale is a fundamental need, such as, intelligent transportation, finance, digital healthcare, energy generation & distribution and advanced manufacturing. The globally unique capacity focusing on TIPS-at-Scale will position the UK as a world-leader, offering major economic benefits by ensuring that the UK is a safe place in which to do business, and social benefits in terms of security and privacy of the individual.
More specifically, the CDT's research and training programme will provide graduates with capabilities to address socio-technical challenges of TIPS-at-Scale, including understanding of user and adversarial behaviours. This is of major importance to digital infrastructure providers, government agencies and law enforcement agencies. This is in addition to the wider business and health sectors where the protection of data and the physical processes controlled by large-scale infrastructure is vital. Research on resilience in partially-trusted environments will lead to new architectures and new technologies to significantly enhance integrity and resilience, including new authentication methods and trust models. Research on empirically-grounded assurances for TIPS will break new ground by providing new interdisciplinary techniques and design principles to underpin infrastructures of the future. Last, but by no means least, by embedding Responsible Innovation into the programme throughout, the CDT ensures that TIPS-at-Scale approaches take a values-based view that considers TIPS across the full lifecycle of digital infrastructures: from conception to design, implementation and deployment through to maintenance, evolution and decommissioning. Such a Responsible Innovation approach will benefit society-at-large.
How will they benefit?
There is a critical need within the UK for a new breed of researchers and future leaders, equipped with a breadth of interdisciplinary skills to tackle TIPS issues at play in future infrastructures and a depth of knowledge, drawing upon interdisciplinary skills, to develop novel and innovative solutions to address TIPS-at-Scale. The CDT will produce a pipeline of such researchers and leaders trained to PhD level. It will build on very strong existing links with organisations such as Vodafone, Google, HP, Airbus , Thales, Symantec, IBM, Babcock, NCC Group, Altran, Wessex Water, Cybernetica and Embecosm, all of which have contributed to co-creation of the CDT and are committed to close engagement with it. Both universities will use their business development teams to further engage with these and other relevant organisations. Major opportunities for generating economic and societal benefits exist with the planned Temple Quarter Enterprise Campus of University of Bristol (due to open in 2021) - with a focus on co-creation of a suite of PG training programmes with industry - and the Bath Innovation Centre. The CDT will also leverage the various impact channels of the three EPSRC-NCSC Research Institutes, the PETRAS Hub and the CREST Centre in which the two Universities play a major role. Both universities already have research and PhD studentships directly funded by industry and agencies such as DSTL, NCSC and GCHQ as well as iCASE awards hence close relationships already exist to maximise impact. The CDT will also organise public debates and social media campaigns to encourage public participation and shaping of TIPS-at-scale discussions and solutions.
Organisations
People |
ORCID iD |
Awais Rashid (Primary Supervisor) | |
Katie Hawkins (Student) |
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/S022465/1 | 31/03/2019 | 29/09/2027 | |||
2440421 | Studentship | EP/S022465/1 | 30/09/2020 | 04/05/2025 | Katie Hawkins |