Understanding [Offline/Online] Society: Linking Surveys with Twitter Data

Lead Research Organisation: Cardiff University
Department Name: Sch of Social Sciences


Understanding behaviours, attitudes and identities in online space is a key challenge for 21st Century Social Science. The opportunities provided by social media platforms such as Twitter are significant, with between 300 and 500 million tweets generated a day representing interactions, networks, opinions and reactions at a highly granular temporal (and sometimes spatial) level. On average 4,500 tweets are authored every second and this velocity of data offers us a real-time insight into the social world. However, the fly in the ointment for researchers is that we have a limited understanding of who (or what in the case of 'bots') is present in the online space and to what extent the online representation of social actors can be taken to represent the social world. The fundamental concerns of what can be known and how we can know it need to be addressed before social science can embrace, albeit with a healthy dose of caution, Twitter as a source of knowledge on the social world.

In light of this, this project sets out to establish what insights Twitter can offer us into social phenomenon through the linkage of the content and metadata of tweets with survey data from three major UK surveys - British Social Attitudes 2015, Understanding Society Innovation Panel 2017 and the NatCen Panel. In essence, this project is an exercise in method, calibration and verification, through taking what we know about a respondent and exploring to what extent a given known characteristic may manifest (or not) in the online setting, and vice versa. There is clear methodological value in this - gaining consent to link additional sources of data to survey responses is increasingly used to enhance the value of survey data, validate survey measures, and address issues with nonresponse. However, most previous research on consent has focused on administrative records, and understanding consent relating to other new forms of data is needed.

With novel methods there are limitations to working theoretically - unpredicted limitations may become apparent, and the value of the design not be evident without a real research context. We therefore propose further data collection as part of substantive case study concerning attitudes and behaviours toward ethnic minorities that will aim to uncover 'hidden' challenges and demonstrate how this methodology can be employed, as well as contributing to the substantive literature. To maximise the value of the research for the wider academic community, this work will in turn inform a work package focusing solely on archiving, sharing and re-use of the linked dataset and/or a derivative of it. Whilst Twitter is only one of many social media platforms, it is the most open and accessible and provides a proving ground on which issues of consent, linkage, archiving and sharing can be tested and evaluated. We anticipate that many of the lessons and protocols developed as part of this research will be operationally applicable to other social media platforms.

In summary, the research project seeks to answer the following research questions:

RQ1) How can Twitter data be used to enhance survey data?
RQ2) How can survey data be used to evaluate existing demographic proxy measures and develop new ones?
RQ3) How can we encourage informed consent to social media data linkage?
RQ4) Demonstrator Study: How can linked data (direct reported and observed indirect) help us to understand public attitudes towards minority ethnic groups?
RQ5) How can social media data be collected, linked to survey data, analysed, archived, and shared in a legal and ethical manner that maintains utility?

Planned Impact

The impact to the academic and non-academic community of this project stems from both the methodological work and the tackling of a substantive issue as a demonstrator study.

Methodological Impact:

Any academic, government, third sector or private enterprise that uses surveys has the potential to benefit from this research. Linked Twitter data is a highly cost-effective way of efficiently and programmatically gathering observational behavioural data that can be used to augment survey datasets, allowing us to derive new variables of interest and to compare what a user reports against what they did. Such verification of survey responses is largely absent, leaving social researcher unable to verify the reliability of their survey instruments. Twitter data may also be useful for improving non-response adjustments. With the protocols arising from this project, anyone will be able to follow our secure and ethical precedents regarding informed consent, collection, linkage and archiving, allowing them to address new research questions at no additional cost. Twitter data is free, and the time and knowledge needed to collect it significantly reduced through the provision of the ESRC COSMOS Platform, which is available on request to government, public and third sector agencies.

In addition to researchers, archiving staff at data archives across the globe will benefit from the output regarding archiving and sharing linked data safely. Archiving specialists are currently carry the burden of having to curate this complex data for further use in the absence of any guidance and community norms. Our proposal plugs this gap. Best practice will be established in collaboration with the UKDA and GESIS to be disseminated through the Consortium of European Social Science Data Archives and our steering group member from the US, ensuring the benefit is realised globally

Substantive Impact:

WP4 will investigate the relationship between direct reported attitudes and behaviours and observed indirect online attitudes and behaviour towards minority ethnic groups. These findings will be relevant to social researchers and psychologists interested in developing new measures of prejudice and understanding the interaction between reported attitudes and behaviours online. They will also be of use to government departments and local authorities with a remit for assessing levels of community cohesion in different parts of the UK. There has been renewed focus from the government on racial inequality as reported levels of hate crime have increased and the relationship between these and wider social phenomena, such as the UK's decision to leave the European Union, or the rising profile of Islamic extremism, recognised by the creation of the Racial Disparity Unit at the Cabinet Office. The focus of the study on behaviour on Twitter broadens the relevance further to those interested in the online social sphere, and debates about the role of social media, and the responsibilities of platform providers to moderate their environments to protect people from online hate crimes and to dampen potential effects of the online 'echo chamber' effects for users with extreme views. To facilitate dissemination to these groups, a report will be written for both an academic and policy audience, to be launched at an event with representatives invited from the Home Office, Welsh Government, Office for National Statistics, the Met, MoJ, DCMS, EHRC, Government Equalities Office, Racial Disparity Unit, Stop Hate, Victim Support, Tell Mama, Runnymede, Hope not Hate, Community Security Trust social media platform owners, and the national Local Government Association.

The new linked dataset will also be publicly archived and accessible to researchers (both academic and non-academic) concerned with the area of attitudes and behaviours towards ethnic minorities, and methodologists interested in exploring the relationship between the reported and the observed.


10 25 50
Description 'Data Linking' online course delivered online for GESIS June 2021 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Part of a programme of events offered by GESIS, very well received with excellent feedback. Working primarily with researchers on the ethics and technicalities of linking social media and survey data.
Year(s) Of Engagement Activity 2021
Description 'Linking Surveys with Twitter Data' online course delivered for the University of Graz (Austria) March 2021 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Very well received. Working primarily with researchers on the ethics and technicalities of linking social media and survey data.
Year(s) Of Engagement Activity 2021
URL https://grazer-methodenkompetenzzentrum.uni-graz.at/de/archiv/workshops-2020-21/
Description Article promoting the newly funded project in Social Research Association (SRA) Research Matters March 2020 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Article that promoted the project to a wider practitioner audience, signalling our intent and the area in which we are going to move.
Year(s) Of Engagement Activity 2020
URL https://the-sra.org.uk/common/Uploaded%20files/Research%20Matters%20Magazine/sra-research-matters-ma...
Description Online Workshop on Linking Twitter & Survey Data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Delivery of a workshop on linking Twitter and survey data, delivered in collaboration with colleagues at GESIS as part of a CESSDA project. Aimed at an international audience of researchers and academics. The learning objectives of the workshop were to:
1. Understand why and how to link survey and Twitter data
2. Be aware of the key practical and ethical challenges in linking survey and Twitter data
3. Be familiar with the types of disclosure risks associated with linked survey and Twitter data
4. Know strategies for minimising risk in linked survey and Twitter data projects
Year(s) Of Engagement Activity 2020
URL https://zenodo.org/record/4001700#.YBrMTnf7RTY