Optimised Big Data Processing for a Pro-active Information System

Lead Research Organisation: University of Strathclyde
Department Name: Computer and Information Sciences

Abstract

Summary: In the era of Big Data with the ever-increasing volumes of data and the added complexity of the space-time decoupling of information production versus users with Information Needs, the necessity for dynamic large-scale information systems with dynamic and scalable communication infrastructures is apparent. Examples of such systems are online stock quotes, Internet games, sensor networks, financial workflows, process automation, transportation, algorithmic trading, datacentre platforms and business processes.

Dr Moshfeghi has a new and radical idea where he aims to develop proactive information systems. His idea of proactive information systems will allow such systems to foresee users' information Needs and proactively realise them, in the sense that relevant information matching it will be delivered to users without any submitted queries. Such a system is even more crucial for situations where time is of the essence. As the amount of information which users must deal with can be so large, and decisions may have to be made promptly, only an autonomic system-initiated Information Retrieval process (replacing users) may provide the time-critical information.

A pro-active information system is an idea with several far-reaching advantages in users' everyday life choices, challenges and problems relating to various aspects such as well-being, energy, health, transportation, finance, trading and safety, etc. This is a fundamental step forward. The potential of proactive information systems was demonstrated by his ACM-SIGIR 2016 paper which won the best paper award in the top-tear conference in Information Retrieval as well as his ACM-WWW 2018 and ACM-WWW 2019 papers which is the most important conference in Databases and Information Systems. This PhD project is part of his ongoing project for the next ten years and underpins his chancellor's fellowship proposal.

Background: Efficiency is an important aspect of this proposal. We are addressing the efficiency problem by optimising performance, e.g. by using evolutionary algorithms. There is a wealth of research in evolutionary algorithms, e.g. genetic algorithms, inheritance, mutation, selection, etc. which has been proposed in the past to generate solutions to optimisation problems. The Parallel Processing Component is responsible for exploiting scaling-out/up system's infrastructure appropriately. Current advances in big data processing techniques such as Hadoop, Spark, and Mahout will facilitate meeting the high scalability requirements inherent in such a system. Finally, the system is required to deliver the final set of information to the interested users in a real-time fashion.

Aims: This project will develop the underlying and core technology to provide a high-performing processing infrastructure to optimise the identification of situations, latent Information Needs, and the high-level matching of data. Within the scope of this project, the student will be investigating the development of efficient big data algorithms for storage, processing, linking, and analysing large heterogeneous data sources involved in the project. This direction fits well with the Data to Knowledge priority area, which fits perfectly with the University of Strathclyde's strategic directions on big data and more broadly data science.

Objectives: This project has three main objectives. The first objective associated with this model is to identify and evaluate suitable (e.g., evolutionary) algorithms for a pro-active information system. The second objective is to develop novel and innovative big data science processing techniques (e.g., based on Spark/Mahout) that can process the vast amount of data incorporated in such systems promptly. The final objective is to investigate how the data, matching latent Information Need, are delivered to users. This is particularly important since users might not be even aware of the reason as to why they have received such information.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513349/1 01/10/2018 30/09/2023
2283975 Studentship EP/R513349/1 01/10/2019 31/03/2023 Francesco Meggetto