ESPRESSO: Efficient Search over Personal Repositories - Secure and Sovereign
Lead Research Organisation:
Birkbeck, University of London
Department Name: Computer Science and Information Systems
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Publications
Ragab M
(2024)
ESPRESSO: A Framework to Empower Search on the Decentralized Web
in Data Science and Engineering
Ragab M
(2024)
The 1st Workshop on Decentralised Search and Recommendation
| Description | The project has been exploring algorithms and metadata structures to improve the performance of keyword search and queries across personal online datastores (deploying on architectures such as Solid and Dataswyft). Health and well being scenarios were considered for the experimentation across up to 50 servers with thousands of personal online datastores each. Key findings of experimentation so far include: 1. It is possible for decentralised search to efficiently preserve privacy by encoding data-owner imposed access constraints to different searching parties. 2. Decentralised keyword search can involve long response times when exhaustive search is run across thousands of personal online datastores. However, the use of matadata can significantly improve performance. Further, metadata that strike the right balance between privacy preservation and source selection are crucial for both top-k search and exhaustive search. Specifically, when searching across 475K pods on 50 Solid servers in health and well-being data scenarios, metadata can improve search time by up to 24 times, 13.4 times on average. The performance of decentralised search using rare keywords can benefit the most from metadata. 3. Architectures for decentralised storage such as Solid would benefit from compute components (in addition to storage). The reason is that decentralised search can require significant computational power that may not be readily available to searching parties. Further, the framework of the Community Solid Server could support better performance for decentralise search by making use of multi-threading. 4. The use of Bloom Filters can provide adequate performance and more privacy safeguards than raw metadata, and still outperforms decentralised exhaustive search without metadata. |
| Exploitation Route | There are large communities of developers of applications over decentralised datastores especially around the Solid and the Dataswyft ecosystems who would benefit from the algorithms and matadata structures to support search and queries within applications or as independent services in those ecosystems. The research community will benefit from approaches to address the problem of information retrieval across datastores where different search parties may have different visibility to resources. This problem has not been sufficiently explored before because the scenarios requiring such algorithms were scarce. The community on health and well-being data collection and processing will also benefit from these approaches as they enable for privacy-aware information discovery across datastores. |
| Sectors | Communities and Social Services/Policy Creative Economy Digital/Communication/Information Technologies (including Software) Healthcare |
| URL | https://espressoproject.org/ |
| Title | ESPRESSO Search System |
| Description | The open source license of the software is: AGPL-3.0 The ESPRESSO project (espressoproject.org) researches, develops, and evaluates decentralised algorithms, meta-information data structures, and indexing techniques to enable large-scale data search across personal online datastores, taking into account varying access rights and caching requirements. This involves a number of Solid servers (see solidproject.org) that are inter-connected via an overlay GaianDB network (https://github.com/gaiandb/). The ESPRESSO system contains the following components that are installed alongside each Solid server in the network: - An indexing app (Brewmaster), that indexes the pods, and creates and maintains indexes inside each pod, along with a meta-index for the Solid server as a whole. - A search app (CoffeeFilter), that performs the local search on the pods of a Solid server. - An overlay network (the prototype system uses a custom build of GaianDB) that connects the servers, and routes and propagates the queries. - A user interface app (Barista) that receives search queries from the user and presents the search results. |
| Type Of Technology | Webtool/Application |
| Year Produced | 2024 |
| Open Source License? | Yes |
| Impact | The software release is only a few weeks old. It has enabled the project to run experiments that will benefit the research community with initial insights on performance and trade-offs of decentralised search. The results of that experimentation have informed the project work that has been published so far and two additional publications that are to be presented at the Web Conference 2024 conference and published in its proceedings. The open source publication of the software is expected to enable members of the research community beyond the project to engage with this research. |
| URL | https://github.com/espressogroup/ESPRESSO |
| Description | ESPRESSO Workshop on Decentralised Search |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Industry/Business |
| Results and Impact | In a world where health data is at the forefront of our digital lives, the need for data privacy and control has become paramount. Today's Web landscape confines user-generated health data within centralized data silos, limiting individuals' autonomy and insight into the management and utilization of their own health information. These developments have highlighted the crucial importance of individual sovereignty when it comes to personal health data. This paradigm shift has given rise to new approaches in application development, focusing on personal online data stores known as "pods." In these systems, individuals have full authority over which applications can access their personal health data and specify the purposes for which such access is granted. However, the journey toward decentralization presents its own set of challenges, particularly when it comes to enabling secure and efficient search and distributed queries over such decentralized platforms. To this end, our workshop seeks to blend research and industry engagement, encouraging active participation from the public. Our primary goal is to collectively explore viable solutions and engage in fruitful discussions that address the challenges of decentralized web search and privacy-preserving information retrieval, especially in the context of health data. This workshop offers a structured and interactive platform, fostering knowledge exchange and collaboration within the specified timeframe and objectives. We firmly believe that workshops that bridge industry and academia will play a pivotal role in advancing our understanding and development of decentralized online services. Through these collaborative efforts, we aim to drive the creation of new techniques and technologies that can revolutionize the way we store and process our personal (health) data. Ultimately, this research trajectory has the potential to pave the way for innovative approaches that could influence the global adoption of transformative systems, revolutionizing how personal health data is managed and enhancing data sovereignty for individuals. Your participation is key to shaping the future of health data privacy and security. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://espressoproject.org/london-june-24/ |
| Description | Research Visit and Workshop at the NExT Research Centre, National University of Singapore |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | The workshop explored the research challenges of decentralised information systems from the perspective of search and information retrieval. The challenges of decentralised recommendation and AI decentralised information retrieval were discussed. Researchers from the project partners and other visiting researchers were involved. A proposal for a workshop at the Web Conference 2024 was submitted to enable wider discussion on the topics. The workshop proposal was accepted for inclusion in the Web Conference programme and is to take place on 13 May 2024. |
| Year(s) Of Engagement Activity | 2023 |
| Description | The 1st Workshop on Decentralised Search and Recommendation (DeSeRe'24) at the Web Conference 2024, Singapore, Singapore |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Workshop on decentralised search and recommendation that was accepted and organised in collaboration with the National University of Singapore at the Web Conference 2024 in Singapore. It became the most attended among the workshops running in parallel. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://desere.org/ |