Assessing spatial heterogeneity through random walks on graphs

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Mathematical Sciences

Abstract

This proposal addresses the problem of studying the heterogeneity of a variable of interest in spatially-distributed structures. Quite often the spatial domain under study can be coarsened and represented as a graph, and the variable of interest can take a discrete number of different values, indicated by labels or colours. Hence, the quantification of the spatial heterogeneity of a variable can be mapped onto the study of the distribution of labels in a graph with coloured nodes.

A typical example is the problem of identifying the social, economic, or ethnic segregation of an urban area. We say that the population of a city is segregated with respect to ethnicity or income level if people prefer to live in areas where other people of the same ethnicity or income level already live, avoiding areas that are inhabited by people of other ethnic groups or income levels. In this case, the system consists of a discrete set of regions (neighbourhoods, boroughs, etc), and can thus be represented by a planar graph of adjacency among regions, where each region (node) is assigned a label or colour according to the income level or most abundant ethnicity of its inhabitants. To determine the level of segregation of a city, one needs to quantify the heterogeneity of the distribution of those labels, under the assumption that higher heterogeneity usually corresponds to higher segregation. In general, the presence of high levels of segregation is the cause of economic and social tension. Consequently, assessing the level of segregation of a urban area is fundamental for policy makers, with potential repercussions on millions of lives. Despite being a long-standing question in urbanism and economics, there is currently no universal agreement about how segregation should be quantified, or on how to compare the levels of segregation measured in different areas.

A very different but relevant example is that of tumor growth. The cells of a tumor normally belong to several strains or sub-clones, due to the occurrence of genetic mutations, and some of those sub-clones often develop resistance to chemotherapy, accelerating the growth of the tumor. Recent research suggested that the number of distinct sub-clones of a tumor and the irregularity of their spatial distribution correlate with the speed at which the tumor grows. Also this system can be represented as a spatial graph, where each region of the tumor (node) is associated to a label or colour corresponding to the most abundant cancer sub-clone in that region. And also in this case, the quantification of the spatial heterogeneity of the distribution of labels on a graph can have important repercussions on the life of thousands of people.

This project proposes a novel way of quantifying heterogeneity in spatial systems, based on the trajectories of random walks over the associated adjacency graphs with coloured nodes. A random walk is the simplest way to explore a graph -if you are on a node, you jump to one if its neighbours chosen at random- yet its trajectories retain a lot of information about the correlations among node properties. The main aim of this project is to study the statistical properties of trajectories of random walks on graphs with coloured nodes, focusing in particular on inter-class first passage times (the number of jumps needed to a random walk to get from a node of a certain colour to a node of another colour) and cover times (the number of jumps needed to visit at least one node of each colour). The Principal Investigator will provide analytical expressions for inter-class passage and cover times in different real and synthetic spatial graphs with coloured nodes, will use those expressions to quantify the heterogeneity of a given colour distribution, and, thanks to the collaboration with several research groups in the UK and abroad, will apply those measures to urban segregation, plant biology, and cancer research.

Planned Impact

If the research objectives of the project are fully achieved, this proposal will have concrete impact in five areas, respectively:

i) The immediate research field. The first beneficiaries of the results of the project will be researchers in network science, graph theory, dynamical systems, and statistical physics. The analytical and computational methods developed in the project can be extended to non-spatial graphs and different flavours of non-interacting random walks, including biased and non-backtracking walks. The distributions of passage and cover times of such walks can be used to identify outliers in graphs, and the null-models developed in the project can be employed to detect communities and coarsen graphs with coloured nodes. The introduction of feasible walks will open an entirely new line of research, and the related concept of attraction basin of a node will provide a new geometrical interpretation of the distribution of node properties in a graph. This impact will be mainly realised through publication of papers in high-impact journals, presentations to conferences, organisation of a satellite conference on the topic, and publication of all the software developed in the project under Free Software licenses.

ii) Closely-related research fields. Being focused on spatial graphs, the project will have concrete impact on several related research problems, including the characterisation of street networks, the study of the structure and function of brain networks, and the quantification of geographical heterogeneity. In particular, the methods developed in the project will be useful in the analysis of a variety of large-scale spatial data sets, including those recently made available as Open Data in the UK and in other countries. This impact will be realised through the creation of a seminar on modelling of spatial systems at the Institute of Applied Data Science, and the organisation of a mini-school for young researchers in all branches of science.

iii) Application domains. The methods developed in the project will be readily made available to researchers in urbanism, plant biology, and oncology, respectively for the quantification of urban segregation, the prediction of the duplication probability of cells in the apical meristem of plants of different species, and the classification of the aggressiveness of tumors. This impact will be mainly realised by the collaboration with the scientific partners of the project, the publication of joint scientific papers in high-impact journals, and the presentation of the results in interdisciplinary conferences.

iv) The general public. The web application developed by the project will provide access to interactive maps of attraction basins from census data to citizens, businesses, and policy-makers. This will allow the immediate visualisation of complex information about different regions, and will allow to identify potential business opportunities and concrete areas of intervention for local authorities.

v) Competitiveness of UK research. With many world-leading groups across the nation, the UK is currently one of the most active countries in network science research and its applications. The research promoted by the current project is transversal to several cutting-edge research lines (spatial networks, random processes, data science) for which the UK is already well-known internationally, and the essential interdisciplinary nature of the project is perfectly aligned with the UKRI's aspiration of supporting interdisciplinary research excellence. The success of the project will foster the emergence and consolidation of national and international research collaborations, that will confirm the leading role played by the UK in the field. Moreover, the collaboration with the partners of the project will attest the ability of UK-funded research in Mathematics to be immediately usable for the solution of concrete problems in several domains.

Publications

10 25 50
 
Description The project has made very good progress towards all the objectives. We have obtained a mean-field theory of class mean first passage times and class coverage times of random walks on graphs with coloured nodes, with and without spatial constraints (several papers are in preparation at this stage), and we have applied it to a variety of practical problems as envisaged in the original project. These methods allow to assess if a specific patterns of classes or colours observed in a graph has a non-trivial structure. We have applied these methods to describe urban segregation, to link it with the disproportionate incidence of COVID-19 in certain social groups (Black Americans in the US), to quantify the level of polarisation in parliaments, and to help predicting the potential aggressiveness of a tumor by looking only at representative tissue samples.

These results have opened up more questions about the possibility of using random walks as a natural way to enrich graphs through the integration of meta-data about nodes. In this direction, I have been collaborating with colleagues at the University of Umea (Sweden), on the usage of partial-absorbing random walks for the integration of metadata in graph community detection tasks. In short, we associate node metadata to node classes, making it possible to incorporate metadata information when looking for relevant structural clusters and communities. The method we developed allows to strike a balance between purely structural information about node adjacency and metadata about any node property. This opens a relatively wide line of possible applications in urban studies, medicine, biology, sociology.

The project has produced and made available software to compute CMFPT and CCT on generic graphs, as well as an extensive data set of adjacency and commuting graphs among US census tracts in more than 130 urban areas.
Exploitation Route By means of further collaboration with practitioners in other fields. This is something I am currently working on.
Sectors Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Government, Democracy and Justice,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy

URL https://medicalxpress.com/news/2021-01-commuting-patterns-higher-incidence-covid-.html
 
Description Impact Acceleration Award Large Grant Scheme
Amount £40,000 (GBP)
Organisation Queen Mary University of London 
Sector Academic/University
Country United Kingdom
Start 07/2020 
End 03/2021
 
Title Diffusion segregation and the disproportionate incidence of COVID-19 in African American communities 
Description Each network corresponds to a metropolitan area, where nodes represent census tracts. For each tract we report information about the number of people belonging to each of the seven high-level ethnic groups defined in the US Census. Physical adjacency networks are undirected and unweighted, and an edge between two tracts A and B indicates that A and B are bordering each other. Commuting flow graphs are undirected and weighted, and the weight of an the edge between A and B corresponds to average of the total number of work commuting trips from A to B and from B to A. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact The analysis of this data set allowed to show that the urban mobility patterns of African Americans in the US are strongly associated with the disproportionate incidence of COVID-19 in African American communities. The data set might be used by other studies to look for other strong socio-economic correlates of COVID-19 incidence disparity 
URL http://datadryad.org/stash/dataset/doi:10.5061/dryad.hqbzkh1f9
 
Title Programs to compute the dynamic segregation indices based on Class Mean First Passage Times (CMFPT) and Class Coverage Times (CCT) 
Description A set of C/Python programs to compute the dynamic segregation indices based on Class Mean First Passage TImes and Class Coverage Times as defined in the paper "Diffusion segregation and the disproportionate incidence of COVID-19 in African American communities". 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact The repository has been accessed about 200 times in the first three weeks since the publication of the paper to which the software bundle is associated. Since this resource has been published very recently, we don't know yet how widely it is being used. 
URL https://royalsocietypublishing.org/doi/10.1098/rsif.2020.0961
 
Title segregation-rw/ethnic-segregation-rw: First release of code and data used in the manuscript 
Description Software and datasets for the computation of ethnic segregation using Class Coverage Times, accompanying the paper "Quantifying ethnic segregation in cities through random walks". 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact The software provides a tangible way of assessing the ethnic and socio-economic segregation of cities, and has been used in the paper to correlate ethnic segregation and socio-economic deprivation. 
URL https://zenodo.org/record/6874241
 
Description Complex-Space - Analysis and Modelling of Spatial Complex Systems - Series of Satellite Workshops of the Conference on Complex Systems 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact This interdisciplinary workshop aims at putting together researchers and practitioners interested in spatial complex systems, allowing them to share problems, ongoing research, best practices, and results. In its first instalment (hosted by the prestigious International Conference on Complex Systems) the workshop hosted three invited talks by leading scientists in urban studies, random graphs, and human mobility, plus 8 contributed talks by researchers in different fields. The workshop was attended by about 45 people on the day, but the material of the workshop (slides and presentations) was circulated more widely to a list of email contacts interested in the outcome. The second instalment in 2020 took place remotely (via Zoom), and saw three invited speakers, 7 oral contributions, and about 40 participants. Website available at: http://ccs2020.complex-space.net/

The main outcome of the workshop series was the decision to make it become a stable event, which is what has been happening in the last few years (the third instalment will be hosted at the International Conference on Complex Systems 2021). Moreover, some research collaborations have started after interactions at the workshop, and in particular I am now in contact with at least two researchers who are applying for fellowships to come to work in my research group.
Year(s) Of Engagement Activity 2019,2020
URL http://ccs2020.complex-space.net/