Using multi-level multi-source auxiliary data to investigate nonresponse bias in UK general social surveys

Lead Research Organisation: City, University of London
Department Name: School of Social Sciences

Abstract

This project will explore the extent to which the predictive power of various forms of "Big Data" can be harnessed to overcome the impact of poor response to surveys - one of the major challenges facing social research today. Social surveys are a key tool used by the media, policy makers, and academics to understand more about public attitudes and behaviour. However, the value of surveys is put at risk by the fact that a large and growing number of those selected to take part in surveys do not respond. As non-respondents may be very different from respondents, nonresponse can introduce significant bias into the conclusions drawn from survey data. There is a pressing need therefore to understand more about the extent and sources of nonresponse bias. This requires having information about both respondents and non-respondents. In the absence of interview data being available for non-respondents, this information must be obtained from other, external, sources.

The growth in "Big Data" i.e. routinely generated data arising from commercial transactions, online communication or public administration provides exciting new opportunities to supplement survey data with data from other sources. As opportunities for data linkage increase, there is a need for a detailed investigation into how such data can be used to understand and hopefully correct for nonresponse bias in general social surveys. This project will conduct such an investigation by adding pre-existing data from multiple sources to UK data from the European Social Survey (ESS), a methodologically rigorous survey of public attitudes.

The project, drawing on the expertise of an inter-disciplinary team of survey researchers, statisticians and geographic information (GI) specialists, has three strands: First, the project will explore the opportunities that exist for matching data from three different sources to survey data. These include: small-area administrative data; commercial marketing data and geocoded information from the Ordnance Survey. Each data source will be evaluated in terms of: what information it can provide which may be matched to the survey records of respondents and non-respondents; the accuracy and completeness of this information; and the challenges that matching data presents in terms of the increased risk of individuals or households being identified from the combination of data held about them.

Second, we will see how the matched data can provide information about potential biases that may be present in the survey data as a result of nonresponse. This will involve identifying any external variables associated both with the likelihood of nonresponse and the attitudes and behaviour the survey intendeds to measure. The project will consider how sources of nonresponse bias may vary geographically across the UK.

Finally, we will assess whether using these external variables to create nonresponse weights to adjust for the possible over or under representation of certain types of respondent in the dataset has a significant effect on survey estimates and reduces bias in the data.

This project has the potential to contribute significantly to our understanding not only of survey nonresponse bias but also the statistical tools available to remedy this bias, to improve survey data collection and generate more robust data to better understand public attitudes and behaviour. Lessons learnt will enhance general social surveys in the UK and internationally. This will have considerable benefits for the wide range of stakeholders involved in the funding, collection, and analysis of survey data and those who rely on the insights it provides. This includes academics, government agencies and other publically funded bodies, third sector organisations, policy makers and, ultimately, the general public.

Planned Impact

The impact of this project extends well beyond the academic community. By providing new insights into how auxiliary data may enhance the value of survey data - specifically how it may be used to understand and correct for potential biases arising from nonresponse - this project has the potential to benefit all those involved in the funding, collection and analysis of survey data and/or auxiliary data and who are concerned about the possible effects of nonresponse bias. This includes commercial survey agencies, government agencies and other publically funded bodies, and third sector organisations.

This impact will be realised in three main ways:
i) By leading to improvements in survey practice - including more efficient and cost-effective data collection - across the international survey community;
ii) By providing data users - through these improvements in survey practice - with more high quality survey data to inform their understanding of public attitudes and behaviour;
iii) By establishing mutually beneficial relationships and promoting opportunities for collaboration between auxiliary data providers and survey practitioners.

The most immediate impacts will be on organisations involved in survey data collection - including commercial social and market research agencies - who will benefit from the methodological insights this project will provide into how auxiliary data may be used i) to improve the efficiency of survey data collection through more effective targeting of fieldwork resources ii) to enhance the quality of survey data by reducing nonresponse bias. The use of geographical information techniques to map spatial variation in nonresponse will allow more nuanced strategies for preventing and reducing bias. We will work closely with Ipsos MORI throughout the project and will also seek to share learning within other survey agencies in the UK, including NatCen Social Research, TNS BMRB, GfK NOP and Gallup, and in ESS-participating countries across Europe.

Longer term, the potential improvements to survey practice resulting from this project will have an impact on the wide range of non-academic stakeholders who use survey data to understand society and to inform policy decisions as well as those who experience the effect of these decisions. This includes the media, politicians, policy makers in local and central government, third sector organisations and, ultimately, the general public.

Also in the longer term, this project will have an impact on those who fund survey research. Linking auxiliary data to survey data has the potential to make survey data more cost-effective by increasing the efficiency and effectiveness of data collection and enhancing the quality of the inferences that can be drawn from the resulting data. A range of organisations frequently commission surveys including charities, local and central government and other publicly funded bodies such as the police. Given the large amount of publically funded social research this ultimately benefits the tax payer.

The project has the potential to benefit organisations involved in the production of auxiliary data by promoting the value of these data in survey research and forging new opportunities for collaboration between auxiliary data producers and the international survey community. This will allow data producers - especially those whose work is publically funded - to demonstrate that these data provide good value for money. Ultimately this will benefit data users by helping to secure the continued generation of such data. During the project we will engage directly with commercial organisations such as Experian and CallCredit as well as government departments and other publically funded bodies such as the Office for National Statistics.

Details of how lessons from this project will be shared with relevant non-academic stakeholders are given in the Pathways to Impact.
 
Description One of the main achievements of this project has been to identify a wide range of auxiliary data sources and append them successfully to the European Social Survey in the UK. This included small area administrative data from a variety of sources, data from commercial vendors and local geographic or "Points of Interest" data. This was a complex and labour intensive process which facilitated the exploration of several datasets - including data from commercial vendors and the Ordnance Survey Points of Interest dataset - not widely used by survey methodologists in the UK. Valuable lessons were learned about the quality and accessibility of the different data sources including: the importance of appending data at the time the sample is drawn rather than subsequently (so data covers the appropriate time periods and relevant assurances can be given to survey participants), the challenge of appending data for the whole of the UK given differences in data collection and availability across the devolved administrations, and concerns about the completeness, accuracy and transparency of commercial data. These findings have been carefully documented in a project technical report - which will be made available via the project website - to provide insights for other researchers interested in exploiting auxiliary data.

The starting hypothesis for this project was that auxiliary data would be useful in predicting and understanding survey response behaviour. In fact, the contribution of this project - which provides one of the most thorough tests of the auxiliary data approach to date - is therefore likely to be to urge caution over the commonly advocated approach of using external sources of auxiliary data to address the problem of survey nonresponse. Using the auxiliary data linked in this project it has proved difficult to identify variables which are significantly associated with both response propensity and substantive survey variables. This remains the case even after exploring auxiliary data from a range of sources, at different levels of aggregation, covering the major theories of survey response and using a variety of statistical techniques. The main predictors of response propensity are simple interviewer observations regarding the type of dwelling unit, the physical condition of the property and whether there are barriers to entry. Rather than pursuing external sources of auxiliary data it may prove more fruitful to pursue further research into improving the potential of interviewer observations should perhaps be pursued. Papers documenting the findings from nonresponse analysis are being prepared.

Although the project had limited success in predicting survey nonresponse when using global models applied to the full sample, some interesting findings emerged when applying spatial modelling techniques such as geographically weighted regression (GWR) to explore whether and how the drivers of nonresponse may vary spatially. For example, across the UK as a whole there is little evidence that survey response propensity varies with population density. However, when breaking the results down by geography we discovered that in some areas (mainly in the North West of England and Scotland) population density was negatively correlated with response propensity whereas in others (predominantly in the South West of England) population density was positively correlated with response propensity. Moving away from a "one size fits all" approach to modelling survey nonresponse and allowing models to vary spatially or across different population sub-groups may prove more instructive.
This project has fostered new research collaborations between the social scientists and survey methodologists leading the project and researchers working in other disciplines. These collaborations with data visualisation specialists working in the giCentre at City University London and Prof. Chris Brunson, an expert in geocomputation, have had a number of positive results. The first has been to encourage survey researchers to think spatially and consider social survey data in their geographical context, thereby gaining a better understanding of the phenomena being studied. Brunsdon has trained members of the research team in geographically weighted regression, a statistical technique that can be used to explore a wide range of substantive and methodological research questions. A paper applying GWR to analyse the drivers of survey nonresponse is in preparation for the journal Environment and Planning B. The second result has been to start to develop new data visualisation tools to assist researchers in the statistical analysis of complex datasets. Using the dataset generated by the project and working alongside social researchers to understand their needs and workflow, researchers from the gicentre have been exploring how data visualisation can help researchers to select variables for analysis and evaluate and keep track of different models developed using both theory driven and data-driven machine learning techniques. A paper is currently under review with the journal Neurocomputing.
Exploitation Route The failure to identify suitable auxiliary variables to predict nonresponse bias means that some of the proposed benefits from this project - e.g. an improved weighting strategy for the European Social Survey or a response predictor tool for use on UK surveys - will not be realised.

However, this null finding has important implications for the main non-academic audience for the project - survey practitioners. It suggests that, when deciding how to address the ongoing problem of survey nonresponse with limited resources, survey agencies may be better off not investing time or money in external sources of auxiliary data but instead pursuing other strategies such as investing in interviewer training or improved survey paradata which could then be used in responsive design or for post-hoc adjustments.

The findings also serve to advance academic debates among survey methodologists about the usefulness of auxiliary data for addressing nonresponse bias. Many methodologists have argued for the importance of obtaining auxiliary data for nonresponse analysis. This comprehensive study suggests that even when such data are available they may not be fit for purpose. Other strategies for studying nonresponse bias - such as surveys of nonrespondents or the use of benchmark surveys such as the Labour Force Survey - should be pursued instead.

Even though the main finding from this project appears to be that the types of auxiliary data investigated here are likely to be of limited use in tackling survey nonresponse, these auxiliary data may still be useful in other ways. Within academia, substantive researchers may be interested in using auxiliary data for studying contextual effects on behaviour whilst survey practitioners can make use of auxiliary data to identify specific target groups for oversampling). The project's data scoping exercise - identifying a wide range of different sources of auxiliary data and the issues associated with them - will be useful to them in pursuing their own uses of auxiliary data.

The introduction of new analytical techniques such as geographically weighted regression to the survey community also opens up new lines of research. The technique has the potential to be used to analyse spatial variation in substantive survey variables and other aspects of respondent or interviewer behaviour.
Sectors Government, Democracy and Justice,Other

URL https://blogs.city.ac.uk/addresponse/
 
Description ADDResponse Narrative Impact Report The primary intended beneficiaries from the ADDResponse project are organisations involved in survey data collection - including commercial social and market research agencies and national statistical institutes- who could potentially benefit from the methodological insights this project provides into how auxiliary data may be used i) to improve the efficiency of survey data collection through more effective targeting of fieldwork resources ii) to enhance the quality of survey data by reducing nonresponse bias. In practice, the scope of the project to influence future survey practice - and, through that data quality, is limited. The main contribution of this project - which provides one of the most thorough tests of the auxiliary data approach to date - is likely to be to urge caution over the commonly advocated approach of using external sources of auxiliary data to address the problem of survey nonresponse. The failure to identify suitable auxiliary variables to predict nonresponse bias means that some of the proposed benefits from this project - e.g. an improved weighting strategy for the European Social Survey or a response predictor tool for use on UK surveys - will not be realised. Nevertheless, this null finding has important implications for survey practitioners and how they might best utilise scarce resources. It suggests that, when deciding how to address the ongoing problem of survey nonresponse with limited resources, survey agencies may be better off not investing time or money in external sources of auxiliary data but instead pursuing other strategies such as investing in interviewer training or improved survey paradata which could then be used in responsive design or for post-hoc adjustments. Most directly the findings have been used by the Core Scientific Team of the European Social Survey, one of the ESRC's key investments, to inform the survey's future research agenda. In consultation with the ESS ERIC Methods Advisory Board it was agreed that the ESS would not, in the immediate future, pursue this line of research further or prioritise in putting resources into using auxiliary data to address the problem of survey nonresponse in participating survey countries. There has also been interest in the findings, particularly lessons learned from the project's experiences of using commercial data, from other survey providers including the UK's large scale commercial organisations such as Ipsos MORI and Kantar Public. Following presentation of the project's findings at the end of project workshop in May 2016, the project team have had a number of email and face to face conversations with survey methodologists in these organisations about the project's findings. Ipsos MORI have confirmed that the "negative" findings that auxiliary data do not appear to add to our understanding of nonresponse, has provided reassurance and given them confidence that their current approach to weighting survey data for nonresponse (relying on interviewer observations and a few key census variables such as population density) is the correct one. Collaboration with Ipsos MORI to agree on the procedures for archiving the potentially disclosive dataset which resulted from this project combining auxiliary and survey data, raised important questions about sharing and reusing this type of data and the responsibilities of the primary data collector to survey participants, particularly nonrespondents who, by definition, have not given consent to participate in research. The exercise has informed Ipsos MORI's thinking on this topic and will be useful for them in future projects involving the study of survey nonrespondents. The ADDResponse project is being used to inform work being carried out under the EC-funded Horizon 2020 project Synergies for Europe's Research Infrastructures in the Social Sciences (SERISS) www.seriss.eu. This cluster project, led by the European Social Survey and running to 2019, is intended to overcome fragmentation among Europe's major social science infrastructures and foster interoperability, harmonisation and innovation. In so doing it aims to play a vital role in increasing confidence in social science data, promoting the value of social science to the wider research community and to policymakers, and ensuring that national and European policymaking is built on a solid base of the highest quality socio-economic evidence. Findings from ADDResponse are feeding into the SERISS project in two ways: First, findings will inform a cross-European study on the potential for using auxiliary data to tackle survey nonresponse bias. The experience of ADDResponse will inform the design of the cross-national study as well as providing the evidence base for the UK. This study is being carried out in partnership with the Survey of Health, Ageing and Retirement in Europe (SHARE ERIC) and involves collaboration with ESS National Coordinators and SHARE country teams, representing survey methodologists and practitioners from a range of academic and non-academic (e.g. commercial survey agencies) organisations in more than 20 countries. Second, lessons learned during the data scoping phase of the ADDResponse project regarding some of the challenges in gaining access to different sources of auxiliary data, have fed into a SERISS workpackage exploring the legal and ethical challenges associated with using new forms of data in social research. The emergence of new forms of data, including administrative and social media data, to complement traditional survey data provides many opportunities for the social sciences to address new research questions and better understand our society. However, these new data also present major legal, ethical and practical challenges. The SERISS workpackage aims to produce new guidelines for survey practitioners on the use, storage and sharing of these new forms of data alongside survey data in the context of the new General Data Protection Regulation (GDPR). Preliminary findings will be discussed at a stakeholder workshop in late 2017 with the guidelines produced in 2018. Finally, learning from the ADDResponse project has fed into the design of a similar study, led by NORC (Chicago USA), exploring the potential benefits of appending multi-level multi-source auxiliary data to the US General Social Survey 2016. We anticipate further opportunities for collaboration and to explore joint pathways to impact for surveys in Europe and the US once the GSS data have been analysed.
First Year Of Impact 2016
Sector Other
Impact Types Economic

 
Title ADDResponse: European Social Survey Round 6 (UK) Plus Auxiliary Data, 2010-2014 
Description The dataset combines pre-existing data from the European Social Survey Round 6 (UK) with auxiliary data collated from a range of sources. The full dataset used by the research team covers 4,520 sampled addresses and combines ESS survey and paradata with administrative data from three sources: small-area administrative data, household level data from commercial databases and information on the local geographic context of the sampled addresses. Access conditions imposed by data owners and concerns over possible disclosure mean that not all of these data could be made more widely available. A subset of the data were archived with the UKDA and are available under secure access. Data are available for ESS respondents including: - Individual level ESS data on public attitudes and behaviour focusing on the topics of personal and social wellbeing and understanding and evaluations of democracy - ESS Contact Form data on the number, date, time and outcome of contact attempts made to obtain productive interviews - ESS interviewer questionnaire data - Geographic identifiers and small-area administrative data on a range of topics including: Census 2011 socio-demographics, crime rates, Indices of Deprivation, electricity consumption, school absences, benefit claimants, wellbeing, and voting in local elections 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact The dataset provides a uniquely rich combination of survey and auxiliary data which can be used to: a) Compare the characteristics of survey respondents and nonrespondents thereby further understanding of survey nonresponse b) Study effect of local context on attitudes and behaviour 
URL https://discover.ukdataservice.ac.uk/catalogue/?sn=8066&type=Data%20catalogue
 
Description End of project workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact An end of project workshop was held at which findings from the project were presented to an audience of around 40 survey practitioners, students and fellow academics. The findings were then discussed by a panel of invited experts.
Year(s) Of Engagement Activity 2016
URL https://blogs.city.ac.uk/addresponse/tackling-survey-nonresponse-workshop/
 
Description European Social Survey Methods Advisory Board presentation 2015 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Discussed implications of the research for the ESS. Recommendations received that pursuing auxiliary data approach to addressing nonresponse bias should not be a priority for ESS or extended to countries beyond the UK.

Decision reached on future research agenda and fieldwork practice for ESS i.e. not to pursue auxiliary data approach.
Year(s) Of Engagement Activity 2015
 
Description International Household Nonresponse workshop 2015 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Informed further statistical analysis. Benefited from opportunities to compare methods and results with colleagues undertaking similar work in other European countries.

Impacts mainly internal to project team and development of project outputs.
Year(s) Of Engagement Activity 2015
URL http://www.nonresponse.org/c/519/2015_Leuven_Belgium/?preid=0
 
Description International Household Nonresponse workshop 2016 (Oslo) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Experiences from the ADDResponse project were presented as part of an expert panel on non-response, data protection and research ethics.
Year(s) Of Engagement Activity 2016
URL http://www.nonresponse.org/
 
Description Presentation to ESS National Coordinators (Vienna) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Findings from the project were presented to the National Coordinators responsible for overseeing the implementation of the European Social Survey in participating countries. The presentation was intended to spark a discussion on how feasible/productive carrying out similar investigations in countries other than the UK might be.
Year(s) Of Engagement Activity 2016
 
Description Project website 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A project website has been set up to inform interested parties about the project, keep them up to date with planned engagement activities (e.g. the end of project workshop) and provide access to key outputs e.g. the project technical report. There have been 361 visitors to the site in the 12 months to March 2017
Year(s) Of Engagement Activity 2014,2015,2016,2017
URL http://www.addresponse.org