Databox: Privacy-Aware Infrastructure for Managing Personal Data

Lead Research Organisation: Imperial College London
Department Name: Design Engineering (Dyson School)

Abstract

Building privacy, trust and security into the evolving digital ecosystem is broadly recognized as a key societal challenge. Regulatory activities in the US, Europe and Japan are complemented by industry initiatives that seek to rebalance "the crisis in trust" occasioned by widespread personal data harvesting. All parties agree that key to this challenge are increased accountability and control. Accountability not only seeks to strengthen compliance but also make the emerging ecosystem more transparent to consumers, while control seeks to empower consumers and provide them with the means of actively exercising choice. This proposal will develop the underlying technology infrastructure required to deliver both accountability and control.

Although personal data management is generally considered an intensely personal matter, it is also inherently social: it is impractical to withdraw from all online activity simply to protect one's privacy. The success of the modern Internet and the "free" services it supports largely rests on the ability for advertisers and analytics providers to make money with the result that approaches that remove or diminish advertising revenues have been doomed to failure. The many motivations and uses for systems enabling personal management of personal data point to a need for tools enabling individuals to take more explicit control over the collection and usage of their data and the information inferred from their online activities, while addressing the challenges of HDI.

Working with partner organisations we have refined our vision of just such a tool, a Databox, an on-demand personal data aggregation and query point, control over which rests directly with the user. The Databox vision is of an open-source personal networked device augmented by cloud-hosted services that collates, curates, and mediates access to our personal data. The Databox will enable and, in some cases, may even host third party applications and services that process personal data. The Databox will form the heart of an individual's personal data processing ecosystem, providing a platform for managing secure access to these data and enabling authorised third parties to provide the owner with authenticated services while roaming outside the home environment.

Planned Impact

The proposed research will benefit society through numerous pathways: industry, academia, and through several user communities including open-source developers, Internet advocacy groups, and engagement in the many live policy and other debates currently active in the personal data space. Fundamentally however, realisation of the Databox as an open-source platform for the broader community will be of most significant benefit to all citizens. The combination of infrastructure that enables open source development and drives critical mass, with commercial and policy impact opportunities via our industrial and advocacy partners will add significant momentum to the growing community of HDI practitioners.

Perhaps the most critical pathway to impact is the Databox itself. Databox is a practical open-source platform whose methodology entails deployment of working artefacts with users. These artefacts will create a comprehensive software platform that enables trusted service-to-user solutions across multiple market segments. These software tools will realise various advantages to individuals for better control over their personal data, digital identity and privacy. This provides more possibilities of access to personal data for third party applications, generating new businesses and differentiating their products with innovative services.

There are a number of other impact channels:
- The Emerging HDI Community http:// hdiresearch.org
- The Open Source Development Community
- Industry
- Advocacy Groups
- Broader Society
- Academics

Full details of the engagement plans are presented in the attached Pathways to Impact document.

Publications

10 25 50
publication icon
Servia-Rodriguez Sandra (2017) Privacy-Preserving Personal Model Training in arXiv e-prints

publication icon
Crabtree A (2017) Repacking 'Privacy' for a Networked World. in Computer supported cooperative work : CSCW : an international journal

publication icon
Zhang C (2019) Deep Learning in Mobile and Wireless Networking: A Survey in IEEE Communications Surveys & Tutorials

publication icon
Osia S (2020) A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics in IEEE Internet of Things Journal

publication icon
Shamsabadi A (2020) PrivEdge: From Local to Distributed Private Training and Prediction in IEEE Transactions on Information Forensics and Security

publication icon
Osia S (2020) Deep Private-Feature Extraction in IEEE Transactions on Knowledge and Data Engineering

publication icon
Urquhart L (2019) Demonstrably doing accountability in the Internet of Things in International Journal of Law and Information Technology

 
Description The award enabled major advances in the area of secure, private personal data analytics and edge computing. This enabled the adoption of open source standards, and a number of related project such as the WWW Consortium Solid proposal, and a number of health-related platforms and approaches. The award also inspired a number of commercial entities to adopt approaches in edge computing.

For some further examples and publications visit:
https://github.com/me-box/databox
https://www.bbc.co.uk/rd/projects/databox
https://www.horizon.ac.uk/project/databox/
https://www.imperial.ac.uk/systems-algorithms-design-lab/research/ukdri-smart-home/

The findings and the platform have already led to a large number of projects in the area of IoT security and privacy, personal data analytics, and the new UK Hub on Trusted Autonomous Systems.
Exploitation Route The findings have been adopted by a number of commercial and research organisations. For a few examples see

https://www.bbc.co.uk/rd/projects/databox
https://www.horizon.ac.uk/project/databox/
https://www.imperial.ac.uk/systems-algorithms-design-lab/research/ukdri-smart-home/

The open source code and platform are available on https://github.com/me-box/databox
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Healthcare

URL https://github.com/me-box/databox
 
Description The Databox Project lead to the creation of a large opensource community of personal data analytics experts and enthusiasts, alongside direct engagement with several communities in the privacy-preserving Machine Learning space. The award led to publications in top-tier journals and conferences, but most importantly, code and approach that got adopted by leading corporations such as the BBC, Telefonica Research, and BT. Alongside academic and industry impact, the award was used to build communities and engage in several outreach activities such as the annual MozFest festival. In addition, the award had several other impacts, including: The award has lead to the Human-Data Interaction EPSRC NetworkPlus. The award has lead to the Defence Against Dark Artefacts EPSRC grant. The research has been showcased at Victoria & Albert Museum and Tate by the BBC. The award led to the £20m UK Dementia Research Institute Grant at Imperial (Smart Homes Program) The research led to the creation of one of the world's most advanced IoT Labs at Imperial College London. The award led to the EPSRC Open Plus Fellowship (EP/W005271/1: Securing the Next Billion Consumer Devices on the Edge, £1,5m, 2022-2027).
Sector Digital/Communication/Information Technologies (including Software),Healthcare,Security and Diplomacy
Impact Types Societal,Economic

 
Description Input into the Government CDEI report on home assistants
Geographic Reach National 
Policy Influence Type Contribution to a national consultation/review
Impact Our research provided evidence for the "Smart Speakers and Voice Assistants" paper as part of the "CDEI Snapshot Series" The Centre for Data Ethics and Innovation (CDEI) is an advisory body set up by the UK government and led by an independent board of experts. It is tasked with identifying the measures we need to take to maximise the benefits of AI and data-driven technology for our society and economy. The CDEI has a unique mandate to advise government on these issues, drawing on expertise and perspectives from across society. The CDEI Snapshots are a series of briefing papers that aim to improve public understanding of topical issues related to the development and deployment of AI. These papers are intended to separate fact from fiction, clarify what is known and unknown, and suggest areas for further investigation.
URL https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/8311...
 
Description EPSRC OpenPlus Fellowship
Amount £1,283,043 (GBP)
Funding ID EP/W005271/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 07/2022 
End 06/2027
 
Description EPSRC Trust, Identity, Privacy, and Security 2
Amount £1,249,510 (GBP)
Funding ID EP/R03351X/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 07/2018 
End 06/2020
 
Description Human Data Interaction: Legibility, Agency, Negotiability
Amount £1,298,811 (GBP)
Funding ID EP/R045178/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 07/2018 
End 06/2021
 
Description UK Dementia Research Institute
Amount £20,000,000 (GBP)
Funding ID WBDT_P80558 UK-DRI 
Organisation Imperial College London 
Sector Academic/University
Country United Kingdom
Start 07/2019 
End 06/2024
 
Title Advanced IoT Testbed 
Description Our state of the art IoT Testbed is instrumental to a number of research projects, government regulation reports, TV documentaries, and independent investigations into the security and privacy of smart devices. The testbed consists of over 140 various consumer IoT devices, state of the art network and device performance monitoring (through BatteryLab), and various automation techniques. Through close collaboration with friends at Northeastern University, We also make our testbed configuration and publications' datasets available to researchers worldside. Please see below for specific articles, papers, and datasets. We also make various IoT device signatures and destination lists available through the IoTrim Project. Our team has won one of the TOP 10 spots in the Telekom Challenge Development Stream. We have received a generous gift and an InnovateUK Cyber Security Academic Startup Accelerator Programme (CyberASAP) grant, supporting our efforts to accelerate IoT Security and to develop IoTrim. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact Publications: Oliver Thompson, Anna Maria Mandalari, Hamed Haddadi, "Rapid IoT Device Identification at the Edge", 2nd Workshop on Distributed Machine Learning (DistributedML 2021), co-located with CoNEXT 2021, December 7-10, 2021, Munich, Germany. (Paper available on ArXiv) Anna Maria Mandalari, Daniel J. Dubois, Roman Kolcun, Muhammad Talha Paracha, Hamed Haddadi, David Choffnes, "Blocking without Breaking: Identification and Mitigation of Non-Essential IoT Traffic", in 21st Privacy Enhancing Technologies Symposium (PETS 2021), July 12-16, 2021, On the Internet. (Paper and code available on IoTrim) Said Jawad Saidi, Anna Maria Mandalari, Roman Kolcun, Hamed Haddadi, Daniel J. Dubois, David Choffnes, Georgios Smaragdakis, Anja Feldmann, "A Haystack Full of Needles: Scalable Detection of IoT Devices in the Wild", in ACM Internet Measurement Conference 2020, October 2020, Pittsburgh, Pennsylvania, USA. (Paper and data available) Daniel J. Dubois, Roman Kolcun, Anna Maria Mandalari, Muhammad Talha Paracha, David Choffnes, Hamed Haddadi, "When Speakers Are All Ears: Characterizing Misactivations of IoT Smart Speakers", in proceedings of the 20th Privacy Enhancing Technologies Symposium (PETS 2020), July 14-18, 2020, Montréal, Canada. (Paper, Webpage, Code, and Dataset, NYTimes, The Independent, and USA Today, BBC Panorama program Youtube Link, BBC News, Channel 4 The Truth About Amazon, NYT lead editorial, Vox, ZDNet, Telegraph, Gizmodo, GeekWire, Forbes, BusinessInsider) Anna Maria Mandalari, Roman Kolcun, Hamed Haddadi, Daniel J. Dubois, David Choffnes, "Towards Automatic Identification and Blocking of Non-Critical IoT Traffic Destinations", Workshop on Technology and Consumer Protection (ConPro '20), Co-located with the 41th IEEE Symposium on Security and Privacy, May 21, 2020, San Francisco, CA. (Paper available on ArXiv) Ranya Aloufi, Hamed Haddadi, David Boyle, "Emotionless: Privacy-Preserving Speech Analysis for Voice Assistants", in Privacy Preserving Machine Learning, ACM CCS 2019 Workshop, November 2019, London, UK. (Available on ArXiv, Articles on Vice, Medium) Jingjing Ren, Daniel J. Dubois, David Choffnes, Anna Maria Mandalari, Roman Kolcun, Hamed Haddadi, "Information Exposure From Consumer IoT Devices: A Multidimensional, Network-Informed Measurement Approach", in ACM Internet Measurement Conference 2019 (IMC 2019), October, 2019, Amsterdam, Netherlands. (Community Contribution Award) (paper and code and dataset, Financial Times article, The Times article, Vice article, BBC News, BBC Click program, Ars Technica) Media coverage: Consumer Reports: Connected Devices Share More Data Than Needed Channel 4 - The Truth About Amazon BBC News - Why Amazon knows so much about you BBC One (Panorama program) - Amazon: What They Know About Us (YouTube Link) BBC and BBC Click program on GDPR Anniversary (YouTube link) USA Today - It's not you, it's them: Google, Alexa and Siri may answer even if you haven't called The Independent - Smart Speakers Could Accidentally Record Users up to 19 Times Per Day, Study Reveals The New York Times - Are Alexa and Google Assistant spying on us? Which? - Are Alexa and Google Assistant spying on us? Centre for Data Ethics and Innovation first series of three snapshot papers on ethical issues in AI including Smart Speakers and Voice Assistants 
URL https://netsys.doc.ic.ac.uk/IoTLab.html
 
Description BBC Research and Development 
Organisation British Broadcasting Corporation (BBC)
Department BBC Research & Development
Country United Kingdom 
Sector Public 
PI Contribution In 2018 we will showcase a live engagement event and demonstrator of the 'Future of the Living Room' with the BBC R&D at FACT in Liverpool as part of the States of play exhibition and at the Western Balkans Culture Summit. The living room will be open to the public at States of Play in May 2018 at FACT, Liverpooland at The Western Balkans Culture Summit in August 2018. We provided time, technical expertise, and Databox platform for the exhibition, as well as providing Hackathon and Exhibitions at the Mozilla Festival 2017.
Collaborator Contribution The BBC are working on designing and creating an exhibition for a future immersive living room experience. This novel experience will explore the relationship between object based media (OBM) and Internet of Things (IoT) devices, taking advantage of the hyper-connected nature of our homes to provide new media experiences. Within this living room, we will demonstrate how a range of internet connected objects will adapt and personalise to people and to groups of people in several different ways within the shared space. We aim to show how we can create a personalised, immersive and engaging environment that is both entertaining and educational. This public engagement event and demonstrator is a piece of research that aims to demonstrate and evaluate the concept of IoT augmented experiences through broadcast media, as well as identify the implications of bi-directional, immersive and social broadcast media. There is the potential of this research helping to inform and establish a new framework for the future of media in a living room setting. When talking about the IoT there is a clear fear that personalised media (enabled by OBM) could potentially damage the social experience. When personalisation occurs in a shared space a range of research questions arise in light of how it will impact on the people sharing the space. For example, how will it impact on the social experience and how will it effect the issues around personal data, privacy and the exchange of data? This is a joint project between the BBC R&D, The British Council, the Databox team and the Foundation for Art and Creative Technology (FACT) Liverpool to create an exhibition for showcasing ideas for the living room. The live demonstrator will be open to the public and capture real-time IOT data and feedback to inform our understanding and to answer important research questions about personal data ethics, privacy and the social impact of the experience.
Impact This partnership is part of an ongoing collaboration with the BBC R&D, which has already lead to engagement in joint activities in MozFest 2017 (https://www.databoxproject.uk/2017/09/02/databox-hackday-at-mozfest-2017-thu-26-oct/ ), MozFest 2016, in addition to the FACT in Liverpool. This
Start Year 2017
 
Description BT Research and Development 
Organisation BT Group
Country United Kingdom 
Sector Private 
PI Contribution We provided the Databox platform, demonstration, and IoT exemplar applications and devices in the BT Innovation showcase 2017. This exhibition is part of an ongoing collaboration between the Databox and BT R&D.
Collaborator Contribution BT provided exhibition space for the Databox team at the BT Innovation week at Adastral Park. This was part of our ongoing collaboration with BT, leading to further research in the IoT space in 2018-2019.
Impact our team showcased our Databox platform at BT Innovation week at Adastral Park, Ipswich, UK. There were nearly 5000 visitors over 5 days at the show. Over the week, our team talked to a mix of businesses - a couple of banks, healthcare providers, a housing association, IoT developers, BBC, Sky, EPSRC and BT researchers. We presented three use-cases: fraud detection, personalised adverts and health insurance. Many attendees were able to see use-cases for their sectors - typical questions were "how much will it cost?", "when will it be ready/commercialised?", "how centralised local datastore model is more secure than distributed", "what would be the physical form factor of the product if deployed?", "Does it require dedicated hardware?", "Can it run in BT's home hub", "how data usage would be analysed". In addition to this, many industry attendees mentioned concerns around GDPR (EU - General Data Protection Regulation) and could see how Databox can help industries/businesses to address the personal data storage related issues. Most of the discussions were about the overall concept and were around "how would I do this/that" and discussion on new potential applications. Overall, the project got positive feedback and follow-up invitations from the audience.
Start Year 2017
 
Description Telefonica Research 
Organisation Telefonica S.A
Department Telefonica Research
Country Spain 
Sector Private 
PI Contribution Research on edge computing and privacy-preserving analytics
Collaborator Contribution internships to PhD students, industry expertise provided for opensource projects
Impact Fan Mo, Hamed Haddadi, Kleomenis Katevas, Eduard Marin, Diego Perino, Nicolas Kourtellis, "PPFL: Privacy-preserving Federated Learning with Trusted Execution Environments", in The 19th ACM International Conference on Mobile Systems, Applications, and Services (MobiSys 2021), Online, July 2021. ( Best Paper Award at MobiSys 2021)
Start Year 2017
 
Title The Databox Platform 
Description The open source platform and app ecosystem code is on our developing repository on GitHub. The Databox platform is an open-source personal networked device, augmented by cloud-hosted services, that collates, curates, and mediates access to an individual's personal data by verified and audited third-party applications and services. The Databox will form the heart of an individual's personal data processing ecosystem, providing a platform for managing secure access to data and enabling authorised third parties to provide the owner with authenticated services, including services that may be accessed while roaming outside the home environment. 
Type Of Technology Software 
Year Produced 2017 
Impact There have been numerous individuals and organisations which have used the Databox platform as part of their ongoing research. These include ISPs (KDDI in Japan), media industry (BBC R&D) and enthusiasts. The Databox papers and platform have been already highly cited in ~100 papers. 
 
Description Royal Academy of Engineering report on Data Sharing 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Databox has been featured as a case study at the Royal Academy of Engineering report "Towards trusted data sharing: guidance and case studies"

Read all about it here: http://reports.raeng.org.uk/datasharing/cover/
Year(s) Of Engagement Activity 2019
URL http://reports.raeng.org.uk/datasharing/cover/
 
Description Victoria and Albert Museum exhibition of the Living room of the future 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Full details via https://www.bbc.co.uk/rd/projects/databox and https://www.eventbrite.co.uk/e/the-living-room-of-the-future-at-the-va-museum-tickets-48129449479#

The Living Room of the Future

BBC R&D and the Databox team are also, in collaboration with the Foundation for Art and Creative Technology (FACT) and the British Council, organising a public experiment called 'The Living Room of the Future', which seeks to explore the relationship between our hyper-connected homes and next generation broadcasting techniques in ways that enhance inhabitants' media experiences while protecting their privacy and security.

The HDI principles that underpin Databox development have also been applied in an innovative collaboration with BBC R&D centring on 'Object Based Media' (OBM). OBM adapts media to devices, environments, and people to create bespoke personalised experiences. BBC R&D and the Databox team undertook a public experiment at the 2016 Mozilla Festival to explore the potential relationship between OBM and Databox. The experiment leveraged the OBM 'Cook-Along Kitchen Experience' alongside Internet of Things technologies to engage members of the public in an innovative cooking experience. Mediated by the Databox, the experience used data generated by participants' interactions with Internet-enabled utensils and kitchen appliances to drive the timely delivery of recipe instructions.
Year(s) Of Engagement Activity 2018
URL https://www.bbc.co.uk/rd/projects/databox