Multi-scale Dynamical Community Detection for the Digital Economy: from analyzing to influencing policy through Open Government data

Lead Research Organisation: Imperial College London
Department Name: Institute for Mathematical Sciences


The digital age has brought with it an unprecedented gathering of detailed, real-time data from our daily lives, from mobile phone usage to specialized hospital sensors. The availability of such real-world data from a wealth of physical and digital infrastructures coupled with increased computational power offers a unique opportunity to interrogate social behaviour from the level of the individual to the emergence of group dynamics and traits at different levels. Recently, governmental initiatives (specifically in the US and the UK) have been designed to make such datasets available to the wider public. These initiatives offer the possibility to examine quantitatively the influence and effectiveness of policies on different aspects of social dynamics, as well as providing a route for the exercise of citizen participation and feedback. This could lead to improved quality of life in healthcare, traffic, security, or to the design of policies for public spending and usage of resources from the individual level to the collective of groups. These tantalising possibilities have led in the last year to a series of manifesto and even the declaration of the need for a new field, Computational Social Science.. Although those contributions have arisen from different disciplines, they share the belief that the lack of mathematical tools at present for the analysis of such datasets constitutes the fundamental challenge so that the promise of the integration of multi-modal, dynamic datasets can translate into real interpretative results. In particular, there is a need to go beyond the purely (static) statistical methods and to overcome the lack of mathematical, and eventually computational, methodologies that can formalise, interrogate and analyse the data such that hypotheses can be tested and conclusions can be drawn in a rigorous data-driven manner. This proposal, however, goes beyond issues of accessibility and presentation of data and focuses on the development of mathematical tools for the analysis of data in two steps: (1) finding a faithful representation of the data in terms of multi-label, possibly dynamic, networks, and (2) the generation of simplified, intelligible reductions of such networks in terms of a multi-level dynamical hierarchy of communities that can uncover patterns of interaction in the data. The aim of this proposal is to develop robust methodologies for the analysis of networks derived from large, complex social datasets currently made available to the public through the Open Government initiative. Our mathematical tools will address the creation of representative networks from the data and the multi-scale and multi-label analysis of such networks leading to reduced descriptions in terms of dynamical community structures derived from the data without any a priori specification. The datasets chosen will be of current social interest but also exemplify three fundamental characteristics of social datasets that are linked to specific mathematical challenges for their analysis: (i) the multi-scale nature of social networks; (ii) the multi-label characterisation of social datasets; and (iii) the importance of dynamics and flows in social descriptions. The mathematical tools will be specifically applied to the following three areas of high interest for the Digital Economy: Neighbourhood statistics data, the redistricting problem and the recently released budget expenditure data.

Planned Impact

Multiscale, Dynamical Community Detection for the Digital Economy affords three broad categories of opportunities for significant impact: (i)Academic Impact: Spurring Interdisciplinary Research (ii)Government Transparency: Reducing Fraud, Waste and Abuse (iii)Institutional Innovation: Improving Quality of Life in the UK and worldwide The work outlined in this proposal has the potential to create dramatic new collaboration between mathematical and social sciences and lead to an outpouring of research as the methodology is applied to new and diverse social problems. Existing sociological network theory is built on a foundation of one-time snapshot data. Whereas we have the technology to collect minute-by-minute accounts of life across whole nations, computational social science lacks a mathematical framework that is capable of extracting meaningful analysis of vast and seemingly unrelated data. Our multiscale and multilabel theory and algorithm will be providing exactly this missing link, enabling new ways of conceiving human behaviour. The analysis of the Open Government data can help to identify patterns of waste, fraud and abuse, enabling public officials to cut spending and save taxpayer money informed by empirical evidence. It will further enable citizen participation as everybody can then engage with this process. In the digital age, technology has enabled the unprecedented collection of real-time data from our daily lives. The ability to generate and then navigate and visualize complex data sets has the potential to inform policymaking that, in turn, could lead to improved quality of life. Our methodology goes beyond this to analysis which can be applied to stimulate data-driven policy innovations. We have identified two domains for initial focus in addition to the analysis of government spending. First, we will seek to apply our method to the challenge of voter redistricting in an effort to deepen citizens' democratic right to participate in the democratic process. Second, we will look at new storehouses of neighborhood data - quality of life information collected across local communities in the UK in an effort to compare and visualize the outcome of citizen services at the local level.


10 25 50
Description We have developed a series of powerful multiresolution methods that can extract insight from complex big data. In particular, they can identify the different communities of people or topics involved as well a the roles these nodes (or people) play in such diverse areas as social media, neural science data or healthcare . Furthermore, we have developed a quantitative method for understanding cities. Our methods are an an example of unsupervised learning and can provide good features for supervised machine learning.
Exploitation Route Our findings are used by industry, government, social policy organisations to better understand their data and hence make more informed decisions.
Sectors Agriculture, Food and Drink,Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Education,Financial Services, and Management Consultancy,Healthcare,Government, Democracy and Justice,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology

Description Our methods are now being used to inform policy, transport and healthcare issues as well as in branding. Additionally, they are now being used in education, particularly online learning
Sector Education,Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Cultural,Societal,Economic,Policy & public services

Description EPSRC Centres for Mathematics in Healthcare
Amount £2,520,000 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 01/2016 
End 12/2019
Description EPSRC Knowledge Transfer Secondment
Amount £68,189 (GBP)
Organisation Imperial College London 
Sector Academic/University
Country United Kingdom
Start 10/2012 
End 09/2013
Description EPSRC Pathways to Impact
Amount £69,800 (GBP)
Organisation Imperial College London 
Sector Academic/University
Country United Kingdom
Start 01/2016 
End 12/2017
Description James S McDonnell Foundation Fellowship on Complex Systems
Amount $200,000 (USD)
Organisation James S. McDonnell Foundation 
Sector Charity/Non Profit
Country United States
Start 01/2012 
End 04/2015
Description Science Museum Lates 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Our work on cities was presented to a general audience under the auspices of the Science Museum Lates and as such reached a general audience.
Year(s) Of Engagement Activity 2015