Cumulative Revelations in Personal Data

Lead Research Organisation: University of Strathclyde
Department Name: Computer and Information Sciences

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
 
Description Our findings from a systematic review of the literature indicate that cumulative revelations of personal data through social media presents many risks to individuals which can have serious financial, health and well being, and social ramifications such as identity theft, hacked accounts, being stalked, loss of employment opportunities and so forth. Work is ongoing as to how to raise awareness to individuals of these threats and how to avoid or reduce the risks. Subsequent work has shown that it is possible to identify when social media users are revealing details about themselves online which indicates that they maybe at risk or vulnerable w.r.t self-harm, depression, or anorexia.

This lead to the development of several online tools and studies investigating how people reflect upon their own data (through a tool called Data Mirror) and how they inspect and reflect upon other people's data (through a demonstrator that utilised fabricated personas and case studies). Using these tools, we performed user studies involving over 200 participants to explore different scenarios in which cumulative revelations could have lead to hacking, identity theft, unwanted attention, loss of opportunities, etc. Participants report higher awareness and understanding of the threats and harms that could arise as a consequential of their information behaviours online.
Exploitation Route These findings could be used to create information packs and educational resources informing social media users of the risks and dangers of revealing small pieces of information over time and how they can be pieced together in ways which may lead to unintended and potentially harmful consequences to the individual. We have developed tools that make it possible to detect at risk behaviours in social media posts.
Sectors Digital/Communication/Information Technologies (including Software),Education,Security and Diplomacy,Other

 
Description Detecting Depression in Social Media 
Organisation University of Santiago de Compostela
Department Centro Singular de Investigación en Tecnoloxías Intelixentes
Country Spain 
Sector Academic/University 
PI Contribution Development of ML and AI methods to detect social media users that are at risk of depression, self harm and anorexia.
Collaborator Contribution Our partner sent over a visiting researcher to work with our group and while here we jointly developed new ML/AI methods to apply detecting at risk behaviours in social media.
Impact Martínez-Castaño, Rodrigo and Htait, Amal and Azzopardi, Leif and Moshfeghi, Yashar (2020) Early risk detection of self-harm and depression severity using BERT-based transformers : iLab at CLEF eRisk 2020. In: Early Risk Prediction on the Internet, 2020-09-22 - 2020-09-25.
Start Year 2020
 
Title Awessome: Sentiment Analysis Package 
Description Sentiment analysis (SA) is the key element for a variety of opinion and attitude mining tasks. While various unsupervised SA tools already exist, a central problem is that they are lexicon-based where the lexicons used are limited, leading to a vocabulary mismatch. In this paper, we present an unsupervised word embedding-based sentiment scoring framework for sentiment intensity scoring (SIS). The framework generalizes and combines past works so that pre-existing lexicons (e.g. VADER, LabMT) and word embeddings (e.g. BERT, RoBERTa) can be used to address this problem, with no require training, and while providing fine grained SIS of words and phrases. The framework is scalable and extensible, so that custom lexicons or word embeddings can be used to core methods, and to even create new corpus specific lexicons without the need for extensive supervised learning and retraining. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact The package has just been released. 
URL https://github.com/cumulative-revelations/awessome
 
Title DataMirror 
Description DataMirror, is an initial prototype tool, that enables social network users to aggregate their online data so that they can search, browse and visualise what they have put online. The aim of the tool is to investigate and explore people's awareness of their data self that is projected online; not only in terms of the volume of information that they might share, but what it may mean when combined together, what pieces of sensitive information may be gleaned from their data, and what machine learning may infer about them given their data. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact The tool is being used to glean insights into people's online behaviours and how that may lead to security risks, threats and vunerabilities. 
URL https://github.com/cumulative-revelations/DataMirror
 
Description Demonstrator Booth 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Approx. 50 information retrieval and behaviour professionals visited our demonstration booth at ACM SIGIR Conference 2022 to view and interact with our persona based scenarios tool for raising awareness about the threats and harms of cumulative revelations in online data.
Year(s) Of Engagement Activity 2022
 
Description Presentation and stand at "Eyes Online: Understand your data, switch on your rights" Feb, 2020 at the Insight Institute in Edinburgh, which was an event open and free to the public. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Over 50 members of the public listened to our presentation on the project, and many came by and visited our interactive stand during the event. The presentation and stand raised awareness of the potential threats that people faced when online, and specifically, from the information that they reveal over time. Several visitors signed up to participant in future studies, while others contributed questions regarding how they would like to visualise and inspect their social media data, as well as learn how organisation (commercial and/or criminal) can use machine learning and other techniques to draw inferences or conclusions about them from their data, and what vulnerabilities their data may expose.
Year(s) Of Engagement Activity 2020
 
Description Sprite+ Online Showcase : Cumulative revelations in personal data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Raised awareness of the projects goals and outcomes to an audience of 30-40 participants from academia and industry. We received significant interest from the community in developing the methods and training materials further and building a web based demonstrator to show case to larger audiences.
Year(s) Of Engagement Activity 2021