ESPRESSO: Efficient Search over Personal Repositories - Secure and Sovereign

Lead Research Organisation: Birkbeck, University of London

Department Name: Computer Science and Information Systems

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Funded Value:

£353,353

Funded Period:

Aug 22 - Feb 27

Funder:

EPSRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/W024659/1

Principal Investigator:

George Roussos

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Information & Knowledge Mgmt (70%)

Mobile Computing (20%)

Networks & Distributed Systems (10%)

Organisations

Birkbeck, University of London (Lead Research Organisation)

People	ORCID iD
George Roussos (Principal Investigator)
Alexandra Poulovassilis (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Ragab M (2024) ESPRESSO: A Framework to Empower Search on the Decentralized Web in Data Science and Engineering

Ragab M (2024) A Demonstration of Decentralized Search Over Solid Personal Online Datastores

Ragab M (2023) Web Information Systems Engineering - WISE 2023 - 24th International Conference, Melbourne, VIC, Australia, October 25-27, 2023, Proceedings

Ragab M (2024) Unlocking the Potential of Health Data with Decentralised Search in Personal Health Datastores

Ragab M (2024) The 1st Workshop on Decentralised Search and Recommendation

Key Findings
Software and Technical Products
Engagement Activities


Description	The project has been exploring algorithms and metadata structures to improve the performance of keyword search and queries across personal online datastores (deploying on architectures such as Solid and Dataswyft). Health and well being scenarios were considered for the experimentation across up to 50 servers with thousands of personal online datastores each. Key findings of experimentation so far include: 1. It is possible for decentralised search to efficiently preserve privacy by encoding data-owner imposed access constraints to different searching parties. 2. Decentralised keyword search can involve long response times when exhaustive search is run across thousands of personal online datastores. However, the use of matadata can significantly improve performance. Further, metadata that strike the right balance between privacy preservation and source selection are crucial for both top-k search and exhaustive search. Specifically, when searching across 475K pods on 50 Solid servers in health and well-being data scenarios, metadata can improve search time by up to 24 times, 13.4 times on average. The performance of decentralised search using rare keywords can benefit the most from metadata. 3. Architectures for decentralised storage such as Solid would benefit from compute components (in addition to storage). The reason is that decentralised search can require significant computational power that may not be readily available to searching parties. Further, the framework of the Community Solid Server could support better performance for decentralise search by making use of multi-threading. 4. The use of Bloom Filters can provide adequate performance and more privacy safeguards than raw metadata, and still outperforms decentralised exhaustive search without metadata.
Exploitation Route	There are large communities of developers of applications over decentralised datastores especially around the Solid and the Dataswyft ecosystems who would benefit from the algorithms and matadata structures to support search and queries within applications or as independent services in those ecosystems. The research community will benefit from approaches to address the problem of information retrieval across datastores where different search parties may have different visibility to resources. This problem has not been sufficiently explored before because the scenarios requiring such algorithms were scarce. The community on health and well-being data collection and processing will also benefit from these approaches as they enable for privacy-aware information discovery across datastores.
Sectors	Communities and Social Services/Policy Creative Economy Digital/Communication/Information Technologies (including Software) Healthcare
URL	https://espressoproject.org/


Title	ESPRESSO Search System
Description	The open source license of the software is: AGPL-3.0 The ESPRESSO project (espressoproject.org) researches, develops, and evaluates decentralised algorithms, meta-information data structures, and indexing techniques to enable large-scale data search across personal online datastores, taking into account varying access rights and caching requirements. This involves a number of Solid servers (see solidproject.org) that are inter-connected via an overlay GaianDB network (https://github.com/gaiandb/). The ESPRESSO system contains the following components that are installed alongside each Solid server in the network: - An indexing app (Brewmaster), that indexes the pods, and creates and maintains indexes inside each pod, along with a meta-index for the Solid server as a whole. - A search app (CoffeeFilter), that performs the local search on the pods of a Solid server. - An overlay network (the prototype system uses a custom build of GaianDB) that connects the servers, and routes and propagates the queries. - A user interface app (Barista) that receives search queries from the user and presents the search results.
Type Of Technology	Webtool/Application
Year Produced	2024
Open Source License?	Yes
Impact	The software release is only a few weeks old. It has enabled the project to run experiments that will benefit the research community with initial insights on performance and trade-offs of decentralised search. The results of that experimentation have informed the project work that has been published so far and two additional publications that are to be presented at the Web Conference 2024 conference and published in its proceedings. The open source publication of the software is expected to enable members of the research community beyond the project to engage with this research.
URL	https://github.com/espressogroup/ESPRESSO


Description	ESPRESSO Workshop on Decentralised Search
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	In a world where health data is at the forefront of our digital lives, the need for data privacy and control has become paramount. Today's Web landscape confines user-generated health data within centralized data silos, limiting individuals' autonomy and insight into the management and utilization of their own health information. These developments have highlighted the crucial importance of individual sovereignty when it comes to personal health data. This paradigm shift has given rise to new approaches in application development, focusing on personal online data stores known as "pods." In these systems, individuals have full authority over which applications can access their personal health data and specify the purposes for which such access is granted. However, the journey toward decentralization presents its own set of challenges, particularly when it comes to enabling secure and efficient search and distributed queries over such decentralized platforms. To this end, our workshop seeks to blend research and industry engagement, encouraging active participation from the public. Our primary goal is to collectively explore viable solutions and engage in fruitful discussions that address the challenges of decentralized web search and privacy-preserving information retrieval, especially in the context of health data. This workshop offers a structured and interactive platform, fostering knowledge exchange and collaboration within the specified timeframe and objectives. We firmly believe that workshops that bridge industry and academia will play a pivotal role in advancing our understanding and development of decentralized online services. Through these collaborative efforts, we aim to drive the creation of new techniques and technologies that can revolutionize the way we store and process our personal (health) data. Ultimately, this research trajectory has the potential to pave the way for innovative approaches that could influence the global adoption of transformative systems, revolutionizing how personal health data is managed and enhancing data sovereignty for individuals. Your participation is key to shaping the future of health data privacy and security.
Year(s) Of Engagement Activity	2024
URL	https://espressoproject.org/london-june-24/


Description	Research Visit and Workshop at the NExT Research Centre, National University of Singapore
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The workshop explored the research challenges of decentralised information systems from the perspective of search and information retrieval. The challenges of decentralised recommendation and AI decentralised information retrieval were discussed. Researchers from the project partners and other visiting researchers were involved. A proposal for a workshop at the Web Conference 2024 was submitted to enable wider discussion on the topics. The workshop proposal was accepted for inclusion in the Web Conference programme and is to take place on 13 May 2024.
Year(s) Of Engagement Activity	2023


Description	The 1st Workshop on Decentralised Search and Recommendation (DeSeRe'24) at the Web Conference 2024, Singapore, Singapore
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Workshop on decentralised search and recommendation that was accepted and organised in collaboration with the National University of Singapore at the Web Conference 2024 in Singapore. It became the most attended among the workshops running in parallel.
Year(s) Of Engagement Activity	2024
URL	https://desere.org/

Abstract

Organisations

People

ORCID iD

Publications