Data fusion and a 2021 global census

Lead Research Organisation: University of Liverpool
Department Name: Geography and Planning

Abstract

My project aims to achieve a method of creating geodemographic classifications using open-source high resolution satellite imagery, therefore reducing the need to rely on traditional methods that typically use Census data. As Census data is not uniformly available across all countries, there are difficulties in determining geodemographic clusters in data-poor areas. However, this project aims to eliminate the need for rich census data by generating classifications using features extracted from globally available satellite data. The project will be based around three key countries: the UK, China and India, reflecting distinctive pathways of development. Features such as building density, roof materials, green space density and road network composition will be used from satellite imagery to infer the socioeconomic status of local areas, with the assumption that socioeconomic status can be used as a proxy for geodemographics. This is based on theory that the built environment can inform socioeconomics, for example, the composition of housing estates as seen from space can allow assumptions to be made. This can include but not be limited to whether the estate is of an affluent nature, including wider and less structured road networks, greater density of green space and lower housing density, compared to a less affluent estate comprised of narrower roads, less green spaces and perhaps houses of a terraced nature. Yet underlying theories of how socio-economic status can be derived from these features differs for each country, as well as within countries. Therefore, extensive research into these underlying theories will be key to understanding the correlations between the urban built environment and socioeconomic status across different areas. Census data, where available, will be initially used to create classifications using traditional methods, such as the Output Area Classifications (OAC) in the UK. This will then be compared to classifications created using satellite imagery from the same year, or as close as possible as the Census. Once a good comparison is achieved, the method will be applied to satellite data from years without Census data. It is thought that methods will include the likes of object detection using machine learning, utilising both supervised and unsupervised learning methods to create clusters.

This PhD project is important to better understand the geodemographic structure of countries around the world, especially those that do not have rich data sources. The project aims to tackle the issue of data sparsity and also data spatial scales, therefore creating a method that can theoretically be applied to any country for any given year. Moreover, the time period over which Censuses are conducted can be lengthy, for example every 10 years in the UK, therefore it is often difficult to obtain national scale geodemographic information in the years between Cenuses, unless other expensive and time-consuming surveys are conducted. Therefore, having a method that can be applied to frequent high-resolution satellite imagery is beneficial, as it eliminates the need for Census data and can be applied as estimates for the years without Census data. Furthermore, the outcomes of the project can be used in wider research, and also in the commercial sector through uses of urban and retail planning based on geodemographic clusters.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
ES/P000401/1 01/10/2017 30/09/2024
2108242 Studentship ES/P000401/1 24/09/2018 20/10/2020 Chloe Steele