Developing Latent Hierarchical Network Models for Cross-Cultural Comparisons of Social and Economic Inequality

Lead Research Organisation: London School of Economics and Political Science
Department Name: Methodology

Abstract

Researchers from across the social sciences are increasingly using the tools of network analysis to represent the groups they study, mapping out friendships between students, business relations between companies, and supportive relationships between villagers. These networks are often based on surveys, where people's reports of their relationships are combined to represent the overall structure of entire communities. While these network datasets are increasingly fine-grained and complex, the tools used to study them often require simplifications and assumptions that social scientists are uncomfortable with (e.g., assuming that people's recollections are perfect, or treating people as isolated individuals rather than members of larger social groupings like households).

We will develop network models that fully exploit the various facets of the information typically contained in social network datasets. In doing this, we depart from prevalent models in contemporary social network analysis that treat an observed network data set as representing the "true" network. Instead, we assume that the true network is "latent" and, therefore not empirically observed, and further frame the observed network data as an imperfect measurement of what we are modelling. In proposing this probabilistic framework, we will first account for the various individual-level biases that shape who people name (and who they do not). We will then extend our model to allow for nodes (here, people) to form into hierarchically nested groups (for example, households) and thus capture units at the different levels that are present in the system. Finally, we will expand this model to account for changes over time of both the individual units at different levels of the hierarchy and their relationships, thus capturing relevant time evolution.

To do this, we will bring together a diverse team of researchers interested in the analysis and modelling of network data. We are complementing the skills and perspectives of the anthropologist (Power), psychologist (Redhead), and statistical physicist (De Bacco) co-investigators with the addition of: a mathematician studying complex networks (Prof Ginestra Bianconi, Queen Mary University of London), a computer scientist working on applied causal inference (Dr Dhanya Sridhar, Columbia University), an engineer and computer scientist working on machine learning and probabilistic inference (Prof Isabel Valera, Saarland University), a statistician developing multilevel network models (Assoc Prof Tracy Sweet, University of Maryland), an anthropologist and statistician developing Bayesian statistical tools (Prof Dir Richard McElreath, Max Planck Institute for Evolutionary Anthropology), an anthropologist gathering and developing tools for social network data (Assoc Prof Jeremy Koster, University of Cincinnati), and a social statistician with expertise in longitudinal multilevel models (Prof Fiona Steele, London School of Economics).

With this diversity of perspectives (whether of discipline, application, or career stage), we are confident that our collaboration will result in the development of general, robust generative network models with wide potential for application across the sciences. We are committed to facilitating the uptake of these models, so we will be hiring a research officer to develop user-friendly R and Python packages for their use.

This project is grounded in the analytical needs of the "ENDOW project," a US National Science Foundation-funded project that is primarily examining how network structure, and people's position within that network, is associated with the distribution of wealth inequality both within and between societies. Over forty researchers are collecting social network data from rural communities around the world for this project. The models we develop here will help us understand (and potentially then rectify) some of the drivers of social and economic inequality around the world.

Publications

10 25 50
publication icon
De Bacco C (2023) Latent network models to account for noisy, multiply reported social network data in Journal of the Royal Statistical Society Series A: Statistics in Society

 
Description Social network data are gathered and analysed across many social scientific disciplines, being used to make fundamental observations about social mobility, community cohesion, social capital, etc. However, there is often a disjuncture between the complexity of people's social relations and the simplifying assumptions that must often be made when gathering and analysing such data. With this project, we have brought together social scientists, statisticians, and computer scientists to develop methods that better represent that complexity.

We have now published two articles and released associated code and tools to help better represent the complexity of such data, most specifically focusing on how to deal with the imperfect recollections and biases in how people report their social relations. Our tools use new computational methods to adjudicate between peoples' potentially conflicting responses to arrive at principled probabilistic representations of the underlying social network. We show how prior standard practice risks serious misrepresentation of some key measures of network structure (e.g., how reciprocal people's relationships are), and how our approach can resolve these issues.

We are now in the process of tackling a second representational issue: incorporating the existence of higher-order social groupings (such as individuals comprising a household) into our network representations. Here again, existing methods are often forced simplifications that we know don't account for the complexities of peoples' (and households') actual relations. We are currently finalizing multiple approaches and papers to grapple with this issue.

To help other social scientists adopt the approaches we are developing, we have released public versions of our code and tools, alongside extensive tutorials that give practical introductions on how to use them. This is implemented in the two most common programming languages for the social sciences - Python and R - to allow for maximum uptake. We will be running a workshop at the pre-eminent conference for social network analysis this summer to share the tools with others, and these resources are already available through a dedicated website that walks practitioners through the installation and use of these tools, using real-world open-access datasets.

This project has built new collaborations between researchers, which are already expanding in new and promising directions. We have brought together social scientists, statistical physicists, social statisticians, and computer scientists to collectively tackle questions of measurement and modelling that are at the heart of social network analysis. There are active plans to continue this collaboration, with a number of spin-off projects already in the works.
Exploitation Route The methods we have developed will be of use to social scientists and others dealing with social network data gathered through surveys. They are already being used by a large collaboration of anthropologists and economists gathering demographic and social network data from communities around the world to understand social and economic inequality (titled the "ENDOW" project and funded by the US National Science Foundation). The uptake of these tools by this large group will surely help to demonstrate the value and utility of these tools and so help spread their use by others.
Sectors Communities and Social Services/Policy,Education,Other

 
Title VIMuRe (Variational Inference for Multiply Reported data) 
Description Social network data are often constructed by incorporating reports from multiple individuals. However, it is not obvious how to reconcile discordant responses from individuals. There may be particular risks with multiply reported data if people's responses reflect normative expectations-such as an expectation of balanced, reciprocal relationships. We created a probabilistic model (which we call "VIMuRe" for Variational Inference for Multiply Reported data) that incorporates ties reported by multiple individuals to estimate the unobserved network structure. In addition to estimating a parameter for each reporter that is related to their tendency of over- or under-reporting relationships, the model explicitly incorporates a term for 'mutuality', the tendency to report ties in both directions involving the same alter. Our model's algorithmic implementation is based on variational inference, which makes it efficient and scalable to large systems. Alongside a publication introducing the model, we have made a public code repository to facilitate the uptake of this method. Researchers can use either Python or R, making this tool widely accessible to practitioners. 
Type Of Material Computer model/algorithm 
Year Produced 2023 
Provided To Others? Yes  
Impact None so far. 
URL https://latentnetworks.github.io/vimure/