Understanding [Offline/Online] Society: Linking Surveys with Twitter Data
Lead Research Organisation:
CARDIFF UNIVERSITY
Department Name: Sch of Social Sciences
Abstract
Understanding behaviours, attitudes and identities in online space is a key challenge for 21st Century Social Science. The opportunities provided by social media platforms such as Twitter are significant, with between 300 and 500 million tweets generated a day representing interactions, networks, opinions and reactions at a highly granular temporal (and sometimes spatial) level. On average 4,500 tweets are authored every second and this velocity of data offers us a real-time insight into the social world. However, the fly in the ointment for researchers is that we have a limited understanding of who (or what in the case of 'bots') is present in the online space and to what extent the online representation of social actors can be taken to represent the social world. The fundamental concerns of what can be known and how we can know it need to be addressed before social science can embrace, albeit with a healthy dose of caution, Twitter as a source of knowledge on the social world.
In light of this, this project sets out to establish what insights Twitter can offer us into social phenomenon through the linkage of the content and metadata of tweets with survey data from three major UK surveys - British Social Attitudes 2015, Understanding Society Innovation Panel 2017 and the NatCen Panel. In essence, this project is an exercise in method, calibration and verification, through taking what we know about a respondent and exploring to what extent a given known characteristic may manifest (or not) in the online setting, and vice versa. There is clear methodological value in this - gaining consent to link additional sources of data to survey responses is increasingly used to enhance the value of survey data, validate survey measures, and address issues with nonresponse. However, most previous research on consent has focused on administrative records, and understanding consent relating to other new forms of data is needed.
With novel methods there are limitations to working theoretically - unpredicted limitations may become apparent, and the value of the design not be evident without a real research context. We therefore propose further data collection as part of substantive case study concerning attitudes and behaviours toward ethnic minorities that will aim to uncover 'hidden' challenges and demonstrate how this methodology can be employed, as well as contributing to the substantive literature. To maximise the value of the research for the wider academic community, this work will in turn inform a work package focusing solely on archiving, sharing and re-use of the linked dataset and/or a derivative of it. Whilst Twitter is only one of many social media platforms, it is the most open and accessible and provides a proving ground on which issues of consent, linkage, archiving and sharing can be tested and evaluated. We anticipate that many of the lessons and protocols developed as part of this research will be operationally applicable to other social media platforms.
In summary, the research project seeks to answer the following research questions:
RQ1) How can Twitter data be used to enhance survey data?
RQ2) How can survey data be used to evaluate existing demographic proxy measures and develop new ones?
RQ3) How can we encourage informed consent to social media data linkage?
RQ4) Demonstrator Study: How can linked data (direct reported and observed indirect) help us to understand public attitudes towards minority ethnic groups?
RQ5) How can social media data be collected, linked to survey data, analysed, archived, and shared in a legal and ethical manner that maintains utility?
In light of this, this project sets out to establish what insights Twitter can offer us into social phenomenon through the linkage of the content and metadata of tweets with survey data from three major UK surveys - British Social Attitudes 2015, Understanding Society Innovation Panel 2017 and the NatCen Panel. In essence, this project is an exercise in method, calibration and verification, through taking what we know about a respondent and exploring to what extent a given known characteristic may manifest (or not) in the online setting, and vice versa. There is clear methodological value in this - gaining consent to link additional sources of data to survey responses is increasingly used to enhance the value of survey data, validate survey measures, and address issues with nonresponse. However, most previous research on consent has focused on administrative records, and understanding consent relating to other new forms of data is needed.
With novel methods there are limitations to working theoretically - unpredicted limitations may become apparent, and the value of the design not be evident without a real research context. We therefore propose further data collection as part of substantive case study concerning attitudes and behaviours toward ethnic minorities that will aim to uncover 'hidden' challenges and demonstrate how this methodology can be employed, as well as contributing to the substantive literature. To maximise the value of the research for the wider academic community, this work will in turn inform a work package focusing solely on archiving, sharing and re-use of the linked dataset and/or a derivative of it. Whilst Twitter is only one of many social media platforms, it is the most open and accessible and provides a proving ground on which issues of consent, linkage, archiving and sharing can be tested and evaluated. We anticipate that many of the lessons and protocols developed as part of this research will be operationally applicable to other social media platforms.
In summary, the research project seeks to answer the following research questions:
RQ1) How can Twitter data be used to enhance survey data?
RQ2) How can survey data be used to evaluate existing demographic proxy measures and develop new ones?
RQ3) How can we encourage informed consent to social media data linkage?
RQ4) Demonstrator Study: How can linked data (direct reported and observed indirect) help us to understand public attitudes towards minority ethnic groups?
RQ5) How can social media data be collected, linked to survey data, analysed, archived, and shared in a legal and ethical manner that maintains utility?
Planned Impact
The impact to the academic and non-academic community of this project stems from both the methodological work and the tackling of a substantive issue as a demonstrator study.
Methodological Impact:
Any academic, government, third sector or private enterprise that uses surveys has the potential to benefit from this research. Linked Twitter data is a highly cost-effective way of efficiently and programmatically gathering observational behavioural data that can be used to augment survey datasets, allowing us to derive new variables of interest and to compare what a user reports against what they did. Such verification of survey responses is largely absent, leaving social researcher unable to verify the reliability of their survey instruments. Twitter data may also be useful for improving non-response adjustments. With the protocols arising from this project, anyone will be able to follow our secure and ethical precedents regarding informed consent, collection, linkage and archiving, allowing them to address new research questions at no additional cost. Twitter data is free, and the time and knowledge needed to collect it significantly reduced through the provision of the ESRC COSMOS Platform, which is available on request to government, public and third sector agencies.
In addition to researchers, archiving staff at data archives across the globe will benefit from the output regarding archiving and sharing linked data safely. Archiving specialists are currently carry the burden of having to curate this complex data for further use in the absence of any guidance and community norms. Our proposal plugs this gap. Best practice will be established in collaboration with the UKDA and GESIS to be disseminated through the Consortium of European Social Science Data Archives and our steering group member from the US, ensuring the benefit is realised globally
Substantive Impact:
WP4 will investigate the relationship between direct reported attitudes and behaviours and observed indirect online attitudes and behaviour towards minority ethnic groups. These findings will be relevant to social researchers and psychologists interested in developing new measures of prejudice and understanding the interaction between reported attitudes and behaviours online. They will also be of use to government departments and local authorities with a remit for assessing levels of community cohesion in different parts of the UK. There has been renewed focus from the government on racial inequality as reported levels of hate crime have increased and the relationship between these and wider social phenomena, such as the UK's decision to leave the European Union, or the rising profile of Islamic extremism, recognised by the creation of the Racial Disparity Unit at the Cabinet Office. The focus of the study on behaviour on Twitter broadens the relevance further to those interested in the online social sphere, and debates about the role of social media, and the responsibilities of platform providers to moderate their environments to protect people from online hate crimes and to dampen potential effects of the online 'echo chamber' effects for users with extreme views. To facilitate dissemination to these groups, a report will be written for both an academic and policy audience, to be launched at an event with representatives invited from the Home Office, Welsh Government, Office for National Statistics, the Met, MoJ, DCMS, EHRC, Government Equalities Office, Racial Disparity Unit, Stop Hate, Victim Support, Tell Mama, Runnymede, Hope not Hate, Community Security Trust social media platform owners, and the national Local Government Association.
The new linked dataset will also be publicly archived and accessible to researchers (both academic and non-academic) concerned with the area of attitudes and behaviours towards ethnic minorities, and methodologists interested in exploring the relationship between the reported and the observed.
Methodological Impact:
Any academic, government, third sector or private enterprise that uses surveys has the potential to benefit from this research. Linked Twitter data is a highly cost-effective way of efficiently and programmatically gathering observational behavioural data that can be used to augment survey datasets, allowing us to derive new variables of interest and to compare what a user reports against what they did. Such verification of survey responses is largely absent, leaving social researcher unable to verify the reliability of their survey instruments. Twitter data may also be useful for improving non-response adjustments. With the protocols arising from this project, anyone will be able to follow our secure and ethical precedents regarding informed consent, collection, linkage and archiving, allowing them to address new research questions at no additional cost. Twitter data is free, and the time and knowledge needed to collect it significantly reduced through the provision of the ESRC COSMOS Platform, which is available on request to government, public and third sector agencies.
In addition to researchers, archiving staff at data archives across the globe will benefit from the output regarding archiving and sharing linked data safely. Archiving specialists are currently carry the burden of having to curate this complex data for further use in the absence of any guidance and community norms. Our proposal plugs this gap. Best practice will be established in collaboration with the UKDA and GESIS to be disseminated through the Consortium of European Social Science Data Archives and our steering group member from the US, ensuring the benefit is realised globally
Substantive Impact:
WP4 will investigate the relationship between direct reported attitudes and behaviours and observed indirect online attitudes and behaviour towards minority ethnic groups. These findings will be relevant to social researchers and psychologists interested in developing new measures of prejudice and understanding the interaction between reported attitudes and behaviours online. They will also be of use to government departments and local authorities with a remit for assessing levels of community cohesion in different parts of the UK. There has been renewed focus from the government on racial inequality as reported levels of hate crime have increased and the relationship between these and wider social phenomena, such as the UK's decision to leave the European Union, or the rising profile of Islamic extremism, recognised by the creation of the Racial Disparity Unit at the Cabinet Office. The focus of the study on behaviour on Twitter broadens the relevance further to those interested in the online social sphere, and debates about the role of social media, and the responsibilities of platform providers to moderate their environments to protect people from online hate crimes and to dampen potential effects of the online 'echo chamber' effects for users with extreme views. To facilitate dissemination to these groups, a report will be written for both an academic and policy audience, to be launched at an event with representatives invited from the Home Office, Welsh Government, Office for National Statistics, the Met, MoJ, DCMS, EHRC, Government Equalities Office, Racial Disparity Unit, Stop Hate, Victim Support, Tell Mama, Runnymede, Hope not Hate, Community Security Trust social media platform owners, and the national Local Government Association.
The new linked dataset will also be publicly archived and accessible to researchers (both academic and non-academic) concerned with the area of attitudes and behaviours towards ethnic minorities, and methodologists interested in exploring the relationship between the reported and the observed.
Organisations
Publications
Addario G
(2024)
Public attitudes towards immigration and ethnic minorities
Al Baghal T
(2024)
Linking Survey and LinkedIn Data: Understanding Usage and Consent Patterns
in Journal of Survey Statistics and Methodology
Al Baghal T
(2021)
Linking Twitter and survey data: asymmetry in quantity and its impact
in EPJ Data Science
Breuer J
(2021)
Informed consent for linking survey and social media data - Differences between platforms and data types
in IASSIST Quarterly
Liu S
(2024)
Linking survey with Twitter data: examining associations among smartphone usage, privacy concern and Twitter linkage consent
in International Journal of Social Research Methodology
Sloan, L.
(2022)
The SAGE Handbook of Social Media Research Methods (2nd Ed)
| Description | We explored the factors that influence consent to data linkage in four ways. First, we looked at how several factors are associated with consent rates including how often someone interacts with social media, the variety of activity they partake in, and their general level of technical competence when using a smartphone. We found that consent rates are higher for individuals who have more variety in their activity. We also noted that younger and employed individuals were more likely to have their consent behaviour affected by privacy concerns. Second, we conducted 25 depth interviews with the general public to explore how consent decisions are made and to what extent consent is informed. We found that there were no consistent preferences in how the consent question and information was presented or accessed and no participants 'fully' understood what they were consenting to. However, participants did not change their minds regarding consent after more detailed discussions. People rely on short-cuts when making consent decisions that seem to be driven by four key factors: risk; benefit; trust; and control. Third, we conducted a survey experiment to see how the presentation of additional information and incentives impacts consent rates. We used a variety of modes for delivery across three studies, with help links for consent information in different positions and variable incentives. In summary, we found that changes to the wording of the consent question and positioning of additional information did not appear to impact consent rates, but we do not know to what extent it impacted how 'informed' that consent was. Incentivising consent to linkage improved response rates for a relatively small cost. Fourth, we explored to what extent socio-demographic characteristics and usage of Twitter impacts on likelihood to consent. Over four different studies the associations with consent and socio-demographics and internet usage we not consistently significant. We found some evidence that consent is less likely for older participants and for individuals who are less active on Twitter. We also learned how to negotiate access for sensitive and non-anonymous data between collaborators on a research project. Some of our Data Sharing Agreements took months to finalise as different partners had different priorities. We followed the principles of systematic processing, data reduction, controlled access, and data deletion. We found that it took time to come to shared understanding between partners, One example would include the need to ensure that data can be securely erased at the end of a project, and what it means to secure erase data on a cloud-based service such as Microsoft 365. The solution we used was to store data on dedicated, encrypted laptops solely for the purpose of this project - however by not backing up our work the data could be lost if it was corrupted. The best practice principles were not always compatible, and we constantly reflected on our practice. |
| Exploitation Route | organisation who is conducting a linked data project. We have shared protocols, information and question wording both for ascertaining informed consent to link survey and Twitter data and how to share and analyse the data as part of a team in an ethical and legal manner. The principles of what we have found are applicable for other social media platforms. We will also be archiving our linked dataset and other researchers will be able to access it, allowing anyone to answer new research questions with linked survey and Twitter data. |
| Sectors | Communities and Social Services/Policy Creative Economy Digital/Communication/Information Technologies (including Software) Education Healthcare Government Democracy and Justice |
| URL | https://natcen.ac.uk/linking-survey-and-digital-trace-data |
| Description | 'Data Linking' online course delivered online for GESIS June 2021 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | Part of a programme of events offered by GESIS, very well received with excellent feedback. Working primarily with researchers on the ethics and technicalities of linking social media and survey data. |
| Year(s) Of Engagement Activity | 2021 |
| Description | 'Linking Surveys with Twitter Data' online course delivered for the University of Graz (Austria) March 2021 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | Very well received. Working primarily with researchers on the ethics and technicalities of linking social media and survey data. |
| Year(s) Of Engagement Activity | 2021 |
| URL | https://grazer-methodenkompetenzzentrum.uni-graz.at/de/archiv/workshops-2020-21/ |
| Description | Article promoting the newly funded project in Social Research Association (SRA) Research Matters March 2020 |
| Form Of Engagement Activity | A magazine, newsletter or online publication |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Article that promoted the project to a wider practitioner audience, signalling our intent and the area in which we are going to move. |
| Year(s) Of Engagement Activity | 2020 |
| URL | https://the-sra.org.uk/common/Uploaded%20files/Research%20Matters%20Magazine/sra-research-matters-ma... |
| Description | Linking Survey and Digital Trace Data (end of project event) |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | Understanding online behaviors, attitudes, and identities was a key challenge for social science in the 21st century. At the same time, the opportunities provided by digital trace data were substantial as researchers could access huge quantities of precise observational data relatively quickly, easily, and cheaply. However, the fact that these data were not designed for social researchers created challenges: researchers had a limited understanding of who (or what in the case of 'bots') was included in the data and the biases it may have had, or control over what information was collected to ensure it answered their research questions. The event explored the feasibility, challenges, and opportunities of linking digital trace data with survey data, drawing on experiences and findings from the ESRC-funded 'Understanding (Online/Offline) Society' project. It focused specifically on experiences linking X (formerly known as Twitter) and LinkedIn data with survey data, although the findings could be applied to digital trace data more broadly. It was split into three sessions, each focusing on a key methodological question: How can digital trace data and survey data enhance each other? How can we maximise informed consent to link survey and digital trace data? How can digital trace data be collected, linked to survey data, and shared in a legal and ethical manner that maintains utility? The event consisted of three sessions and a total of seven presentations followed by questions. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://natcen.ac.uk/events/linking-survey-and-digital-trace-data |
| Description | Linking Surveys with Twitter Data |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | Workshop in collaboration with colleagues at GESIS on the ethics of linking survey and Twitter data including informed consent and making the link |
| Year(s) Of Engagement Activity | 2022 |
| Description | Online Workshop on Linking Twitter & Survey Data |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | Delivery of a workshop on linking Twitter and survey data, delivered in collaboration with colleagues at GESIS as part of a CESSDA project. Aimed at an international audience of researchers and academics. The learning objectives of the workshop were to: 1. Understand why and how to link survey and Twitter data 2. Be aware of the key practical and ethical challenges in linking survey and Twitter data 3. Be familiar with the types of disclosure risks associated with linked survey and Twitter data 4. Know strategies for minimising risk in linked survey and Twitter data projects |
| Year(s) Of Engagement Activity | 2020 |
| URL | https://zenodo.org/record/4001700#.YBrMTnf7RTY |
