A framework for linking and sharing social media data for high-resolution longitudinal measurement of mental health across CLOSER cohorts
Lead Research Organisation:
University of Bristol
Department Name: Social Medicine
Abstract
Interactions on social media have the potential to help us to understand human behaviour, including the development of both good and poor mental health. However, to do the best science we need to know as much as possible about the people who are participating in our research. The CLOSER group of UK longitudinal cohorts include people who have contributed their data to research since birth. By inviting participants in these cohorts to also allow us to derive information from their social media feeds, we will be able to relate this information to gold-standard measures of the behaviours we are trying to understand and to world-class data on other aspects of life. To work out the best way to do this, our project will engage with participants in the Children of the '90s cohort to find out what is acceptable to them in terms of collecting and using their interactions on social media. We will use what we have learnt to develop software that collects and codes social media data in a way that protects the anonymity of participants by scoring Tweets without making the text available to researchers. We will share this software with other CLOSER cohorts to make it easy for them to invite participants to contribute their Twitter data in a safe and secure way. The high-resolution data collected in this way will help us to understand human behaviour and how mental health changes over time. Collecting these data in well known groups of people will also give scientists the information they need to improve the quality of all research using social media.
Planned Impact
We will achieve impact in two ways. First, online social media represent a vast untapped repository of data about cohort participants' lives, their behaviour and the environments they are exposed to. These data could be used to make extraordinary discoveries about human health and wellbeing that would be very difficult to achieve by other means. Second, cohort studies have information that is extremely valuable to the wider field of research using social media, because they can provide "ground truth": rich information collected directly from participants that can be used to validate and improve social media coding algorithms. Improving these algorithms could help achieve a step-change in the quality of all research conducted using social media data. We hope that the proof-of-principle dataset that we collect and share as part of this project will demonstrate this and highlight the potential value of linking social media data in other UK cohorts.
Publications
Shiells K
(2022)
Participant acceptability of digital footprint data collection strategies: an exemplar approach to participant engagement and involvement in the ALSPAC birth cohort study.
in International Journal of Population Data Science
Shiells K
(2020)
Participant acceptability of digital footprint data collection strategies: an exemplar approach to participant engagement and involvement in the ALSPAC birth cohort study.
in International journal of population data science
Di Cara NH
(2023)
Methodologies for Monitoring Mental Health on Twitter: Systematic Review.
in Journal of medical Internet research
Di Cara NH
(2020)
Views on social media and its linkage to longitudinal data from two generations of a UK cohort study.
in Wellcome open research
Di Cara N
(2020)
Views on social media and its linkage to longitudinal data from two generations of a UK cohort study
in Wellcome Open Research
Description | We conducted focus groups with two generations of participants in the Children of the 90s cohort to understand their views on linking all types of social media data. We explored several different possible scenarios with each group before we explained our proposed approach. The attitudes we encountered were similar to those identified in our previous work in the Twins Early Development Study (TEDS) and in a NatCen report on focus groups conducted with the general population. For example, photos are regarded as more sensitive than text data, and the sensitivity of the information depends on the social media platform: data from Facebook, where networks are by default closed, are generally regarded as more sensitive than data from Twitter, where interactions are by default open to the world. However, in contrast to the NatCen report, the focus group participants' long-term relationship with Children of the 90s and the trust they place in the study with their other sensitive data meant that they expressed a general willingness for Children of the 90s to link and code any of their own data from social media so long as their identifiable information was suitably protected when sharing with outside parties. This was true of both generations. All participants endorsed the approach that we had proposed for this project. In consultation with CLOSER cohort leaders, we developed an open source software package to link and archive Twitter data that is easy to share between institutions as a series of Docker containers. Docker wraps up all the code and the associated computational environment into a virtual machine that is guaranteed to run in the same way on any computer. This approach has become very popular in industry, and has the advantage that the software is easy to share and deploy in new places. This will give each CLOSER cohort the ability to run the social media linkage software on their own computers, so that the identifiable data collected never leaves a cohort's data safe haven. We have been working with cohorts to develop the best approaches for integrating this into data linkage pipelines. We have also developed a companion software package that scores each Tweet on several dimensions relevant to mental health, and returns these anonymous scores for inclusion in a cohort study's dataset. We obtained ethical approval to approach Children of the 90s participants for permission to link their publicly available Twitter data, and Children of the 90s staff have begun to link Twitter data from consenting participants across all generations of the cohort. We have presented the outcome of this research to the CLOSER Leadership Team, and we plan to run a workshop later this year to train staff from interested cohorts in the use of the software. |
Exploitation Route | First, online social media represent a vast untapped repository of data about cohort participants' lives, their behaviour and the environments they are exposed to. These data could be used to make extraordinary discoveries about human health and wellbeing that would be very difficult to achieve by other means. Second, cohort studies have information that is extremely valuable to the wider field of research using social media, because they can provide "ground truth": rich information collected directly from participants that can be used to validate and improve social media coding algorithms. Improving these algorithms could help achieve a step-change in the quality of all research conducted using social media data. We hope that the proof-of-principle dataset that we are collecting and sharing as part of this project will demonstrate this and highlight the potential value of linking social media data in other UK cohorts. |
Sectors | Communities and Social Services/Policy Digital/Communication/Information Technologies (including Software) Healthcare |
Description | Submission to Commons Science and Technology Committee inquiry into the impact of social media and screen-use on young people's health inquiry |
Geographic Reach | National |
Policy Influence Type | Contribution to a national consultation/review |
Impact | CLOSER submitted this project as written evidence to the House of Commons Science and Technology Committee's inquiry into the impact of social media and screen-use on young people's health, as current work that is likely to produce evidence relevant to the inquiry. |
URL | https://www.parliament.uk/business/committees/committees-a-z/commons-select/science-and-technology-c... |
Description | Adolescence, digital technology and mental health care: exploring opportunity and harm. |
Amount | £100,809 (GBP) |
Funding ID | MR/T046716/1 |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2020 |
End | 03/2021 |
Description | Exploring community resilience assets in Wales during the COVID-19 outbreak |
Amount | £180,000 (GBP) |
Organisation | The Health Foundation |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 01/2021 |
End | 01/2022 |
Description | UK Birth Cohorts as a Platform for Ground Truth in Mental Health Data Science |
Amount | £120,845 (GBP) |
Organisation | Alan Turing Institute |
Sector | Academic/University |
Country | United Kingdom |
Start | 01/2019 |
End | 12/2020 |
Description | Using social media linkage for high-resolution longitudinal measurement of mental health |
Amount | £1,500,000 (GBP) |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2018 |
End | 03/2020 |
Description | CLOSER social media linkage framework collaboration |
Organisation | Cardiff University |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | This award is bringing together CLOSER cohort leaders from across institutions to develop a robust and secure software framework for linking social media data in UK birth cohorts. |
Collaborator Contribution | Our partners are contributing their expertise as leaders of UK cohorts to ensure that the software framework developed by the award is as relevant and easy to deploy as possible across UK birth cohorts. |
Impact | This is a multi-disciplinary collaboration, bringing together psychologists, data scientists, software engineers and epidemiologists to develop a software framework for linking social media data in UK cohorts. |
Start Year | 2018 |
Description | CLOSER social media linkage framework collaboration |
Organisation | University College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | This award is bringing together CLOSER cohort leaders from across institutions to develop a robust and secure software framework for linking social media data in UK birth cohorts. |
Collaborator Contribution | Our partners are contributing their expertise as leaders of UK cohorts to ensure that the software framework developed by the award is as relevant and easy to deploy as possible across UK birth cohorts. |
Impact | This is a multi-disciplinary collaboration, bringing together psychologists, data scientists, software engineers and epidemiologists to develop a software framework for linking social media data in UK cohorts. |
Start Year | 2018 |
Description | CLOSER social media linkage framework collaboration |
Organisation | University of Essex |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | This award is bringing together CLOSER cohort leaders from across institutions to develop a robust and secure software framework for linking social media data in UK birth cohorts. |
Collaborator Contribution | Our partners are contributing their expertise as leaders of UK cohorts to ensure that the software framework developed by the award is as relevant and easy to deploy as possible across UK birth cohorts. |
Impact | This is a multi-disciplinary collaboration, bringing together psychologists, data scientists, software engineers and epidemiologists to develop a software framework for linking social media data in UK cohorts. |
Start Year | 2018 |
Title | Social media linkage software |
Description | The software facilitates the secure linkage and sharing of information derived from social media data in UK birth cohorts. |
Type Of Technology | Software |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | This software facilitates the secure linkage and sharing of information derived from social media data in UK birth cohorts, providing high-resolution time series data on participants' real-life social interactions that can be processed to provide anonymised datasets available to researchers. This enriches the data available to cohorts, but also facilitates the improvement of algorithms for inferring information from social media data through linkage to high quality ground truth data in well characterised epidemiological cohorts. |
URL | https://github.com/DynamicGenetics/Epicosm |
Description | Science Centre exhibit |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Public/other audiences |
Results and Impact | We won a competition funded by the Alan Turing Institute to develop a "curiosity toolkit" in collaboration with Bristol science centre We The Curious, as part of the programme for their new Open City Lab. The toolkit allows visitors to We The Curious to contribute to research using the tools used by scientists themselves. In this case, we developed a series of activities around the question, "Can machines understand emotion?". The centrepiece is a large high-resolution touch screen where visitors can interact with a machine learning model, attempting to teach it to recognise emotions in facial expressions, an example of human-in-the-loop machine learning. This stems directly from our research using machine learning to infer emotion from digital footprint data. |
Year(s) Of Engagement Activity | 2019,2020,2021,2022 |
URL | https://jeangoldinginstitute.blogs.bristol.ac.uk/tag/we-the-curious/ |