📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Cumulative Revelations in Personal Data

Lead Research Organisation: University of Strathclyde
Department Name: Computer and Information Sciences

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
 
Description Cumulative Revelations in Personal Data is a difficult and challenging research area due to the often sensitive nature of the data involved. This required the project to develop new methods to probe how people share data, along with new tools to raise awareness and provide training to end-users on how to identify and protect their personal data (through improved online sharing behaviors) and, by proxy, protect their employer or organization.

New methodologies for analyzing and understanding personal data narratives were developed to advance our understanding of everyday (and often mundane) data by exploring people's seemingly innocuous digital traces (evidenced by a series of studies involving persona-based inquiry). These techniques are applicable to a wide range of privacy and security based investigations that focus on human factors from a personal perspective. Adopting such techniques resulted in new insights into how people perceive and manage their data.

Training tools were developed based on persona-based narrative methods -- where participants, stakeholders and end-users, engaged with scenarios tailored to raise awareness regarding personal data privacy and security. The tools highlighted to users how cumulative revelations can arise via small pieces of data innocently shared online, which when taken together, can have serious (negative) consequences. The tools used constructed personas ("Alex Smith" and "Taylor Addison") as the subject of interest, where participants were asked to explore and inspect their online profiles, posts, friends, etc., and to look for relationships between posts that led to cumulative revelations (posts that together reveal more than intended about the individual). Over 400 people participated in our user studies with findings indicating significantly higher awareness of risks, as well as a desire from participants to engage in more secure information practices.

A series of demonstrators and tools were publicly released via our website/github that will enable researchers to build upon our research, and the training tools that we developed. In addition the datasets from all studies from exploratory interviews on conceptions of personal data and participatory design sessions used to develop the tools through to the data collected from user engagement with the tools themselves was released to facilitate further research.

In addition to the methodological and technological contributions of this project, the research findings show that people (and organisations) largely underestimate and are unaware of the cumulative revelations that manifest as a result of sharing/engaging online. We developed several interventions (released as training tools) that raised awareness of such security vulnerabilities along with training to help people identify the weaknesses and threats that their own practices may result in -- and thus how they can update and change their information behaviours.

This project also opens up a number of challenging research directions such as exploring the value of going beyond "awareness nudges" by e.g. promoting reflection before sharing personal data, to enable informed choices about the information added to cumulative digital traces, and the development and evaluation of tools that enable the efficient curation of material posted online, avoiding the need to revisit every account and post to do so.
Exploitation Route The research performed in this project can be taken forward in a number of different ways. On the academic side, the new methodologies and methods developed can be used by other researchers to explore more deeply nuanced, and under-examined, relationships between pieces of information that when taken together reveal more cumulatively than intended. The tools developed can further enable researchers, build upon our findings regarding this serious, but under investigated area of research.
On the educational side, these tools and methods can also be used to help facilitate a better understanding of the implications of poor data sharing practices, and raise the awareness and abilities of participants, to improve and protect themselves from the potential risks and harms that may arise. Training and educational tools can further be developed and expanded to encompass a wider variety of scenarios tailored specifically to the demographics (e.g., scenarios for young persons who are particularly vulnerable to cyberbullying and online abuse, etc.).
Sectors Digital/Communication/Information Technologies (including Software)

Education

Security and Diplomacy

Other

URL https://cumulative-revelations.github.io/revelations/
 
Description Research Ethics commitee Technische Universität Graz
Geographic Reach Europe 
Policy Influence Type Participation in a guidance/advisory committee
Impact As an outcome of the event, a policy document was drafted for the senate of the university. As this is the first Austrian university to start such a process, hopes are that it could become a blueprint for other universities in the region.
 
Title Alex Smith method 
Description The Big Data & Society article Everyday Digital Traces (2023) https://journals.sagepub.com/doi/full/10.1177/20539517231213827 presents the replicable and contextually customsiable "Alex Smith" method that we developed. We used a co-designed, fictional persona called Alex Smith to concretise and represent people's online information to help participants (through role-playing) to reflect on data and digital traces. Drawing together four fields of scholarly research concerning personal data: digital traces and the digital self, datafication and dataveillance, mundane, everyday data and the data journey - we advanced understandings of personal data by exploring ordinary people's seemingly innocuous digital traces generated through everyday online interactions. The method developed enabled investigations into ordinary people's engagement with their data, and can be adapted for and used with different participant groups, which also supports their awareness of cumulative functions of personal data and potential use by un/known actors. 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact Too early to quantify 
URL https://journals.sagepub.com/doi/full/10.1177/20539517231213827
 
Title Cumulative Revelations in Personal Data Study 1 
Description Data collected in respect of EPSRC Cumulative Revelations in Personal Data EP/R033889/1 This project was a major EPSRC funded study that sought to better understand the revelations that arise when pieces from an individual's personal information available online are connected over time and across multiple platforms. Such more complete digital traces can give unintended insights into their life and opinions. Extensive fieldwork included an interview study (Study 1) with UK employees regarding their experiences of cumulative revelation of their data. We examined the risks and harms to individuals and employers when others joined the dots between their online information. Interviews employed a "digital narrative" technique where participants were asked to make drawings of their information and communication networks, the types of information shared and details of to whom it was available or visible. Study 1 was conducted online in the period May 2020-August 2020 when much of the UK was in lockdown due to the Covid-19 pandemic. Interviews included questions addressing changes to information sharing behaviour occurring during lockdown conditions. The dataset contains: • Transcripts of 26 interviews with the Uk public • Photographic images of drawings created by participants during the interviews • Data from a technology survey completed by participants at the start of each interview regarding their use of devices, information channels and data storage 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact None yet 
 
Description Cum. Revelations 
Organisation Government of the UK
Department Government Office for Science
Country United Kingdom 
Sector Public 
PI Contribution Presentations at Home Office and ACE Vivace events
Collaborator Contribution Attendance at advisory board, and ad-hoc advice
Impact Recorded under other sections in Researchfish
Start Year 2019
 
Description Detecting Depression in Social Media 
Organisation University of Santiago de Compostela
Department Centro Singular de Investigación en Tecnoloxías Intelixentes
Country Spain 
Sector Academic/University 
PI Contribution Development of ML and AI methods to detect social media users that are at risk of depression, self harm and anorexia.
Collaborator Contribution Our partner sent over a visiting researcher to work with our group and while here we jointly developed new ML/AI methods to apply detecting at risk behaviours in social media.
Impact Martínez-Castaño, Rodrigo and Htait, Amal and Azzopardi, Leif and Moshfeghi, Yashar (2020) Early risk detection of self-harm and depression severity using BERT-based transformers : iLab at CLEF eRisk 2020. In: Early Risk Prediction on the Internet, 2020-09-22 - 2020-09-25.
Start Year 2020
 
Description Royal Bank of Scotland 
Organisation Royal Bank of Scotland
Country United Kingdom 
Sector Private 
PI Contribution Project is in its early days, so no contribution yet.
Collaborator Contribution Membership of strategic advisory board, and provision of access to bank staff for research purposes.
Impact Project is in its early days, so no contribution yet.
Start Year 2019
 
Title Awessome: Sentiment Analysis Package 
Description Sentiment analysis (SA) is the key element for a variety of opinion and attitude mining tasks. While various unsupervised SA tools already exist, a central problem is that they are lexicon-based where the lexicons used are limited, leading to a vocabulary mismatch. In this paper, we present an unsupervised word embedding-based sentiment scoring framework for sentiment intensity scoring (SIS). The framework generalizes and combines past works so that pre-existing lexicons (e.g. VADER, LabMT) and word embeddings (e.g. BERT, RoBERTa) can be used to address this problem, with no require training, and while providing fine grained SIS of words and phrases. The framework is scalable and extensible, so that custom lexicons or word embeddings can be used to core methods, and to even create new corpus specific lexicons without the need for extensive supervised learning and retraining. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact The package has just been released. 
URL https://github.com/cumulative-revelations/awessome
 
Title DataMirror 
Description DataMirror, is an initial prototype tool, that enables social network users to aggregate their online data so that they can search, browse and visualise what they have put online. The aim of the tool is to investigate and explore people's awareness of their data self that is projected online; not only in terms of the volume of information that they might share, but what it may mean when combined together, what pieces of sensitive information may be gleaned from their data, and what machine learning may infer about them given their data. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact The tool is being used to glean insights into people's online behaviours and how that may lead to security risks, threats and vunerabilities. 
URL https://github.com/cumulative-revelations/DataMirror
 
Description Demonstrator Booth 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Approx. 50 information retrieval and behaviour professionals visited our demonstration booth at ACM SIGIR Conference 2022 to view and interact with our persona based scenarios tool for raising awareness about the threats and harms of cumulative revelations in online data.
Year(s) Of Engagement Activity 2022
 
Description EDEN Community Webinar: Lawful Hacking within Investigations of Serious and Organized Crime 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact participation in a panel discussion organized for the Europol Data Protection Experts Network (EDEN) on the topic of lawful equipment interference. Discussing results of TAS and Cumulative Disclosure research projects to warn about significantly higher privacy risks, and risks to the safety of digital infrastructures, than this is currently cosnidered.
Year(s) Of Engagement Activity 2023
URL https://www.youtube.com/watch?v=v76h_t4WoDk
 
Description Presentation and stand at "Eyes Online: Understand your data, switch on your rights" Feb, 2020 at the Insight Institute in Edinburgh, which was an event open and free to the public. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Over 50 members of the public listened to our presentation on the project, and many came by and visited our interactive stand during the event. The presentation and stand raised awareness of the potential threats that people faced when online, and specifically, from the information that they reveal over time. Several visitors signed up to participant in future studies, while others contributed questions regarding how they would like to visualise and inspect their social media data, as well as learn how organisation (commercial and/or criminal) can use machine learning and other techniques to draw inferences or conclusions about them from their data, and what vulnerabilities their data may expose.
Year(s) Of Engagement Activity 2020
 
Description Sprite+ Online Showcase : Cumulative revelations in personal data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Raised awareness of the projects goals and outcomes to an audience of 30-40 participants from academia and industry. We received significant interest from the community in developing the methods and training materials further and building a web based demonstrator to show case to larger audiences.
Year(s) Of Engagement Activity 2021