Harnessing digital data to study 21st-century adolescence
Lead Research Organisation:
UNIVERSITY OF CAMBRIDGE
Department Name: MRC Cognition and Brain Sciences Unit
Abstract
Almost every child in the UK is now living in both offline and online worlds. Just as they go to school every day, they also engage in online environments on a daily basis. A recent data collection by Ofcom showed that an average UK 16-year-old reports spending about 4 hours and 54 minutes a day online. In the USA, 46% of young people aged 13-17 say they are online 'almost constantly' while another 48% say they are online 'several times a day' (Pew, 2022). If we want to understand adolescent health in this digital age, we need to collect adequate data about what they do, see, and experience when spending time online.
We need to do so because online activities can influence a range of mental and physical health factors. For example, seeing self-harm content online could impact the mental health of a vulnerable adolescent who has just had a very difficult day at school. On the other hand, certain TikTok trends could impact another adolescent's activity levels or what they eat, impacting their physical health. Further, the communities young people find online can be a source of support, especially for those already struggling with their health.
However, currently no large UK data collection investment targeting adolescents is adequately collecting data about any of these processes. Many rely only on questionnaires to understand the online world. For example, they ask how much time teenagers spend on social media. However, time spent engaging with digital environments is not a good indicator of their impact, as we do not know what adolescents are seeing or doing online. Further, teenagers are very bad at estimating the time they spend on digital platforms, making current questionnaire measures woefully unreliable.
So how do we adequately capture interaction with digital environments to better understand adolescent health? One of the methods that has been especially promising and that has been developed in the past years through a variety of international grants is "Digital Data Donation". Digital Data Donation uses our data rights; as users of digital platforms, we are entitled to request our data from these platforms and are then free to share it with anyone we want. To allow us to exercise these rights, most digital platforms have introduced 'download your data' pages or services on their sites. Some in our team (Co-Is Boeschoten and Oberski) have developed infrastructure in The Netherlands to use these data download packages in an ethical and privacy enhancing fashion to collect data for research. This offers an opportunity to explore whether this infrastructure could be used to collect important data in the Adolescent Health Study.
In this project we will complete four workstreams to achieve this goal. First, we will get feedback from young people about the data donation infrastructure and test its feasibility in a classroom data collection environment. We will explore a research question that young people acting as our co-investigators throughout the project will help choose. Second, we will explore how we can not only collect digital data from young people, but also feed it back to them in fun and interactive data visualisations. We will work with young people and software engineers to explore what is possible and interesting, to ultimately develop a prototype of this data feedback tool. Third, this data feedback tool could also be used in a classroom environment, e.g., when teachers provide digital literacy lessons. We will consult teachers and develop a draft lesson plan integrating digital data donation and the data feedback tool. This could be used to motivate schools and teachers to engage in the broader Adolescent Health Study. Fourth, we will use what we learnt both in this project and in our other work to provide a briefing document setting out the pros and cons of a variety of digital data collection methods that can be used to inform the Adolescent Health Study and other future work.
We need to do so because online activities can influence a range of mental and physical health factors. For example, seeing self-harm content online could impact the mental health of a vulnerable adolescent who has just had a very difficult day at school. On the other hand, certain TikTok trends could impact another adolescent's activity levels or what they eat, impacting their physical health. Further, the communities young people find online can be a source of support, especially for those already struggling with their health.
However, currently no large UK data collection investment targeting adolescents is adequately collecting data about any of these processes. Many rely only on questionnaires to understand the online world. For example, they ask how much time teenagers spend on social media. However, time spent engaging with digital environments is not a good indicator of their impact, as we do not know what adolescents are seeing or doing online. Further, teenagers are very bad at estimating the time they spend on digital platforms, making current questionnaire measures woefully unreliable.
So how do we adequately capture interaction with digital environments to better understand adolescent health? One of the methods that has been especially promising and that has been developed in the past years through a variety of international grants is "Digital Data Donation". Digital Data Donation uses our data rights; as users of digital platforms, we are entitled to request our data from these platforms and are then free to share it with anyone we want. To allow us to exercise these rights, most digital platforms have introduced 'download your data' pages or services on their sites. Some in our team (Co-Is Boeschoten and Oberski) have developed infrastructure in The Netherlands to use these data download packages in an ethical and privacy enhancing fashion to collect data for research. This offers an opportunity to explore whether this infrastructure could be used to collect important data in the Adolescent Health Study.
In this project we will complete four workstreams to achieve this goal. First, we will get feedback from young people about the data donation infrastructure and test its feasibility in a classroom data collection environment. We will explore a research question that young people acting as our co-investigators throughout the project will help choose. Second, we will explore how we can not only collect digital data from young people, but also feed it back to them in fun and interactive data visualisations. We will work with young people and software engineers to explore what is possible and interesting, to ultimately develop a prototype of this data feedback tool. Third, this data feedback tool could also be used in a classroom environment, e.g., when teachers provide digital literacy lessons. We will consult teachers and develop a draft lesson plan integrating digital data donation and the data feedback tool. This could be used to motivate schools and teachers to engage in the broader Adolescent Health Study. Fourth, we will use what we learnt both in this project and in our other work to provide a briefing document setting out the pros and cons of a variety of digital data collection methods that can be used to inform the Adolescent Health Study and other future work.
Technical Summary
There have been a broad range of approaches developed to collect digital data. Platform-centric approaches include use of an API or web scraping for data collection. The effectiveness of these processes is often limited due to privacy concerns and sample selectivity. An alternative user-centric approach is the use of tracking apps. Although these are typically expensive to develop and maintain, they offer a solution to collect some types of data. For example, the US ABCD study and a project led by PI Orben have used an app to collect digital data from a subset of their participants, giving accurate information about time spent on screens. Yet both projects experienced limitations with young people finding the app privacy invasive and liable to glitches.
Recently, a more generic user-centric approach has gained popularity, namely donation of digital trace data also known as data donation. With data donation, participants request a digital copy of their personal archive at a platform of interest (known as a 'Data Download Package'), such as Instagram or WhatsApp. These platforms are legally obliged to share this due to GDPR data regulation. Our team Co-Is Boeschoten and Oberski have recently developed a workflow that allows researchers to analyse these Data Download Packages in a privacy enhancing way. In addition, they have developed this workflow into a reusable Open-Source software tool.
First, the participant visits a website where local extraction takes place. In practice, this means a Python script running locally at the device of the participant in their web-browser. It extracts only the data relevant for the current research project from their Data Download Package. After inspecting the extracted data, the participant can provide 'true' informed consent, after which the data is sent to the researcher. By making use of this local extraction step, only information that is relevant for the research question, and no sensitive data, is shared.
Recently, a more generic user-centric approach has gained popularity, namely donation of digital trace data also known as data donation. With data donation, participants request a digital copy of their personal archive at a platform of interest (known as a 'Data Download Package'), such as Instagram or WhatsApp. These platforms are legally obliged to share this due to GDPR data regulation. Our team Co-Is Boeschoten and Oberski have recently developed a workflow that allows researchers to analyse these Data Download Packages in a privacy enhancing way. In addition, they have developed this workflow into a reusable Open-Source software tool.
First, the participant visits a website where local extraction takes place. In practice, this means a Python script running locally at the device of the participant in their web-browser. It extracts only the data relevant for the current research project from their Data Download Package. After inspecting the extracted data, the participant can provide 'true' informed consent, after which the data is sent to the researcher. By making use of this local extraction step, only information that is relevant for the research question, and no sensitive data, is shared.
Description | British Academy Workshop Series on Social Media Data Sharing |
Geographic Reach | National |
Policy Influence Type | Participation in a guidance/advisory committee |
Description | Social Media Mechanisms Affecting Adolescent Mental Health (SoMe3) |
Amount | £1,985,735 (GBP) |
Funding ID | MR/X034925/1 |
Organisation | United Kingdom Research and Innovation |
Sector | Public |
Country | United Kingdom |
Start | 03/2024 |
End | 02/2028 |
Title | Data Donation |
Description | Infrastructure to allow kids to download data |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | Not yet had big impact |
Description | Data Donation |
Organisation | Utrecht University |
Country | Netherlands |
Sector | Academic/University |
PI Contribution | We are working on bringing data donation methodology to the UK |
Collaborator Contribution | They are providing expertise in data donation |
Impact | We have had one successful grant application |
Start Year | 2023 |
Description | Adolescent advisory boards |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Patients, carers and/or patient groups |
Results and Impact | My team has run multiple advisory boards of young people for our studies and have presented to schools in the process. |
Year(s) Of Engagement Activity | 2023,2024 |
Description | Interview for national newspaper by Amy Orben as well as team members |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | My team gave multiple interviews over this submission period, e.g. getting a headline in the Guardian |
Year(s) Of Engagement Activity | 2023,2024 |