A framework for linking and sharing social media data for high-resolution longitudinal measurement of mental health across CLOSER cohorts

Lead Research Organisation: University of Bristol
Department Name: Social Medicine

Abstract

Interactions on social media have the potential to help us to understand human behaviour, including the development of both good and poor mental health. However, to do the best science we need to know as much as possible about the people who are participating in our research. The CLOSER group of UK longitudinal cohorts include people who have contributed their data to research since birth. By inviting participants in these cohorts to also allow us to derive information from their social media feeds, we will be able to relate this information to gold-standard measures of the behaviours we are trying to understand and to world-class data on other aspects of life. To work out the best way to do this, our project will engage with participants in the Children of the '90s cohort to find out what is acceptable to them in terms of collecting and using their interactions on social media. We will use what we have learnt to develop software that collects and codes social media data in a way that protects the anonymity of participants by scoring Tweets without making the text available to researchers. We will share this software with other CLOSER cohorts to make it easy for them to invite participants to contribute their Twitter data in a safe and secure way. The high-resolution data collected in this way will help us to understand human behaviour and how mental health changes over time. Collecting these data in well known groups of people will also give scientists the information they need to improve the quality of all research using social media.

Planned Impact

We will achieve impact in two ways. First, online social media represent a vast untapped repository of data about cohort participants' lives, their behaviour and the environments they are exposed to. These data could be used to make extraordinary discoveries about human health and wellbeing that would be very difficult to achieve by other means. Second, cohort studies have information that is extremely valuable to the wider field of research using social media, because they can provide "ground truth": rich information collected directly from participants that can be used to validate and improve social media coding algorithms. Improving these algorithms could help achieve a step-change in the quality of all research conducted using social media data. We hope that the proof-of-principle dataset that we collect and share as part of this project will demonstrate this and highlight the potential value of linking social media data in other UK cohorts.
 
Description We conducted focus groups with two generations of participants in the Children of the 90s cohort to understand their views on linking all types of social media data. We explored several different possible scenarios with each group before we explained our proposed approach. The attitudes we encountered were similar to those identified in our previous work in the Twins Early Development Study (TEDS) and in a NatCen report on focus groups conducted with the general population. For example, photos are regarded as more sensitive than text data, and the sensitivity of the information depends on the social media platform: data from Facebook, where networks are by default closed, are generally regarded as more sensitive than data from Twitter, where interactions are by default open to the world. However, in contrast to the NatCen report, the focus group participants' long-term relationship with Children of the 90s and the trust they place in the study with their other sensitive data meant that they expressed a general willingness for Children of the 90s to link and code any of their own data from social media so long as their identifiable information was suitably protected when sharing with outside parties. This was true of both generations. All participants endorsed the approach that we had proposed for this project.

In consultation with CLOSER cohort leaders, we developed an open source software package to link and archive Twitter data that is easy to share between institutions as a series of Docker containers. Docker wraps up all the code and the associated computational environment into a virtual machine that is guaranteed to run in the same way on any computer. This approach has become very popular in industry, and has the advantage that the software is easy to share and deploy in new places. This will give each CLOSER cohort the ability to run the social media linkage software on their own computers, so that the identifiable data collected never leaves a cohort's data safe haven. We have been working with cohorts to develop the best approaches for integrating this into data linkage pipelines. We have also developed a companion software package that scores each Tweet on several dimensions relevant to mental health, and returns these anonymous scores for inclusion in a cohort study's dataset.

We obtained ethical approval to approach Children of the 90s participants for permission to link their publicly available Twitter data, and Children of the 90s staff have begun to link Twitter data from consenting participants across all generations of the cohort.

We have presented the outcome of this research to the CLOSER Leadership Team, and we plan to run a workshop later this year to train staff from interested cohorts in the use of the software.
Exploitation Route First, online social media represent a vast untapped repository of data about cohort participants' lives, their behaviour and the environments they are exposed to. These data could be used to make extraordinary discoveries about human health and wellbeing that would be very difficult to achieve by other means. Second, cohort studies have information that is extremely valuable to the wider field of research using social media, because they can provide "ground truth": rich information collected directly from participants that can be used to validate and improve social media coding algorithms. Improving these algorithms could help achieve a step-change in the quality of all research conducted using social media data. We hope that the proof-of-principle dataset that we are collecting and sharing as part of this project will demonstrate this and highlight the potential value of linking social media data in other UK cohorts.
Sectors Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Healthcare

 
Description Submission to Commons Science and Technology Committee inquiry into the impact of social media and screen-use on young people's health inquiry
Geographic Reach National 
Policy Influence Type Contribution to a national consultation/review
Impact CLOSER submitted this project as written evidence to the House of Commons Science and Technology Committee's inquiry into the impact of social media and screen-use on young people's health, as current work that is likely to produce evidence relevant to the inquiry.
URL https://www.parliament.uk/business/committees/committees-a-z/commons-select/science-and-technology-c...
 
Description Adolescence, digital technology and mental health care: exploring opportunity and harm.
Amount £100,809 (GBP)
Funding ID MR/T046716/1 
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 04/2020 
End 03/2021
 
Description Exploring community resilience assets in Wales during the COVID-19 outbreak
Amount £180,000 (GBP)
Organisation The Health Foundation 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2021 
End 01/2022
 
Description UK Birth Cohorts as a Platform for Ground Truth in Mental Health Data Science
Amount £120,845 (GBP)
Organisation Alan Turing Institute 
Sector Academic/University
Country United Kingdom
Start 01/2019 
End 12/2020
 
Description Using social media linkage for high-resolution longitudinal measurement of mental health
Amount £1,500,000 (GBP)
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 03/2018 
End 03/2020
 
Description CLOSER social media linkage framework collaboration 
Organisation Cardiff University
Country United Kingdom 
Sector Academic/University 
PI Contribution This award is bringing together CLOSER cohort leaders from across institutions to develop a robust and secure software framework for linking social media data in UK birth cohorts.
Collaborator Contribution Our partners are contributing their expertise as leaders of UK cohorts to ensure that the software framework developed by the award is as relevant and easy to deploy as possible across UK birth cohorts.
Impact This is a multi-disciplinary collaboration, bringing together psychologists, data scientists, software engineers and epidemiologists to develop a software framework for linking social media data in UK cohorts.
Start Year 2018
 
Description CLOSER social media linkage framework collaboration 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution This award is bringing together CLOSER cohort leaders from across institutions to develop a robust and secure software framework for linking social media data in UK birth cohorts.
Collaborator Contribution Our partners are contributing their expertise as leaders of UK cohorts to ensure that the software framework developed by the award is as relevant and easy to deploy as possible across UK birth cohorts.
Impact This is a multi-disciplinary collaboration, bringing together psychologists, data scientists, software engineers and epidemiologists to develop a software framework for linking social media data in UK cohorts.
Start Year 2018
 
Description CLOSER social media linkage framework collaboration 
Organisation University of Essex
Country United Kingdom 
Sector Academic/University 
PI Contribution This award is bringing together CLOSER cohort leaders from across institutions to develop a robust and secure software framework for linking social media data in UK birth cohorts.
Collaborator Contribution Our partners are contributing their expertise as leaders of UK cohorts to ensure that the software framework developed by the award is as relevant and easy to deploy as possible across UK birth cohorts.
Impact This is a multi-disciplinary collaboration, bringing together psychologists, data scientists, software engineers and epidemiologists to develop a software framework for linking social media data in UK cohorts.
Start Year 2018
 
Title Social media linkage software 
Description The software facilitates the secure linkage and sharing of information derived from social media data in UK birth cohorts. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This software facilitates the secure linkage and sharing of information derived from social media data in UK birth cohorts, providing high-resolution time series data on participants' real-life social interactions that can be processed to provide anonymised datasets available to researchers. This enriches the data available to cohorts, but also facilitates the improvement of algorithms for inferring information from social media data through linkage to high quality ground truth data in well characterised epidemiological cohorts. 
URL https://github.com/DynamicGenetics/Epicosm
 
Description Science Centre exhibit 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact We won a competition funded by the Alan Turing Institute to develop a "curiosity toolkit" in collaboration with Bristol science centre We The Curious, as part of the programme for their new Open City Lab. The toolkit allows visitors to We The Curious to contribute to research using the tools used by scientists themselves. In this case, we developed a series of activities around the question, "Can machines understand emotion?". The centrepiece is a large high-resolution touch screen where visitors can interact with a machine learning model, attempting to teach it to recognise emotions in facial expressions, an example of human-in-the-loop machine learning. This stems directly from our research using machine learning to infer emotion from digital footprint data.
Year(s) Of Engagement Activity 2019,2020,2021,2022
URL https://jeangoldinginstitute.blogs.bristol.ac.uk/tag/we-the-curious/