Missing Data as Useful Data
Lead Research Organisation:
University of Glasgow
Department Name: School of Geographical & Earth Sciences
Abstract
Timely yet safe decisions require real-time ingestion and assimilation of data pertaining to system dynamics, cognisant of incompleteness, uncertainty, inherent under/over-representation, and bias. This fellowship will devise and implement novel procedures for accommodating the source and operation of missingness and biases in 'found' or "new forms of" data, such as social media and mobile phone data, and propose new ways of triangulating them with traditional statistical sources. It will provide generic and novel methods to use "new forms of data" in ways that are efficient, effective, and safe to use.
Our "digitalised" lives and the popularity of social media, ubiquitous sensors, and gadgets have provided us with an unprecedented opportunity to understand society, the economy, wellbeing, and the physical world at a much higher frequency than traditional surveys and polls. However, this information will normally be obtained with uncontrolled recording mechanisms, e.g. observationally, presenting challenges around over/under-representation and biases, missingness, sparsity, and latent dependencies. This inhibits these sources from being integrated effectively with traditional data sources to build a richer, more comprehensive, resource to build and train the latest statistical and cutting-edge deep learning AI models. Any decisions, patterns, and models that arise from such data-even if they constitute the majority of the population-can overlook the needs of those who do not participate. The foundational statistics that will be developed based on proof-of-concept evidence delivered in the first phase of the fellowship, will be applied to a wide range of applications and disciplines, including policing (e.g. under-reported crime, hidden online harms), social care, public health, inclusive city planning, aligned with Office for National Statistics (ONS) strategy, the dynamic census.
This fellowship will develop novel models and frameworks based on a paradigm-shifting perspective that considers or even use biases, sparsity, and missing data as useful data. It will provide solutions and mechanisms to reliably use "whole datasets" and integrate user-generated data and traditional survey data to have meaningful, realistic, and timely data-driven policies and decisions.
Novelty:
- Considering "new forms of data" as useful data to be integrated/triangulated with traditional data to provide a reliable, timely, and updated understanding of the systems can open up a wide range of applications that are nationally important and strategic, including managing under-reported crime, better social care and protection of society, inclusive city planning, and dynamic census using administrative and alternative data.
- Considering missingness as useful data, enabling the use of both available and unavailable data to compensate for the missing data.
- Providing an effective procedure to combine new forms of data with traditional datasets with quantifiable measures for quality and fitness for purpose.
- Ethical, legal, and liability considerations of using new forms of data, such as ethics of data we do not have, can open a wider discussion about the ethical, legal, security, fairness, reliability, safety, transparency, and accountability. While it improves inclusivity and makes the unheard more visible, the ethical questions regarding agency, privacy and wider benefit of data.
This fellowship will support me to establish my growing team and my area of research to deliver world-class fundamental and applied research involving "new forms of data". In doing so, I deliver a suite of methods and mechanisms that enable the effective use of non-standard data sources (potentially in conjunction with traditional data) to maximise benefits and deliver a (near) real-time understanding of cities, and societies.
Our "digitalised" lives and the popularity of social media, ubiquitous sensors, and gadgets have provided us with an unprecedented opportunity to understand society, the economy, wellbeing, and the physical world at a much higher frequency than traditional surveys and polls. However, this information will normally be obtained with uncontrolled recording mechanisms, e.g. observationally, presenting challenges around over/under-representation and biases, missingness, sparsity, and latent dependencies. This inhibits these sources from being integrated effectively with traditional data sources to build a richer, more comprehensive, resource to build and train the latest statistical and cutting-edge deep learning AI models. Any decisions, patterns, and models that arise from such data-even if they constitute the majority of the population-can overlook the needs of those who do not participate. The foundational statistics that will be developed based on proof-of-concept evidence delivered in the first phase of the fellowship, will be applied to a wide range of applications and disciplines, including policing (e.g. under-reported crime, hidden online harms), social care, public health, inclusive city planning, aligned with Office for National Statistics (ONS) strategy, the dynamic census.
This fellowship will develop novel models and frameworks based on a paradigm-shifting perspective that considers or even use biases, sparsity, and missing data as useful data. It will provide solutions and mechanisms to reliably use "whole datasets" and integrate user-generated data and traditional survey data to have meaningful, realistic, and timely data-driven policies and decisions.
Novelty:
- Considering "new forms of data" as useful data to be integrated/triangulated with traditional data to provide a reliable, timely, and updated understanding of the systems can open up a wide range of applications that are nationally important and strategic, including managing under-reported crime, better social care and protection of society, inclusive city planning, and dynamic census using administrative and alternative data.
- Considering missingness as useful data, enabling the use of both available and unavailable data to compensate for the missing data.
- Providing an effective procedure to combine new forms of data with traditional datasets with quantifiable measures for quality and fitness for purpose.
- Ethical, legal, and liability considerations of using new forms of data, such as ethics of data we do not have, can open a wider discussion about the ethical, legal, security, fairness, reliability, safety, transparency, and accountability. While it improves inclusivity and makes the unheard more visible, the ethical questions regarding agency, privacy and wider benefit of data.
This fellowship will support me to establish my growing team and my area of research to deliver world-class fundamental and applied research involving "new forms of data". In doing so, I deliver a suite of methods and mechanisms that enable the effective use of non-standard data sources (potentially in conjunction with traditional data) to maximise benefits and deliver a (near) real-time understanding of cities, and societies.