Using Machine Learning to Improve Data Systems

Lead Research Organisation: University of Warwick
Department Name: Computer Science

Abstract

Data is growing around us and it changes the world drastically. Social media, Internet of things, bioinformatics and most applications in the internet age are producing massive volumes of rich data whose analysis can help us improve all facets of our lives. To ingest, manage, process and analyze this data deluge, big infrastructures, efficient file systems, precise resource managers and data-analytics processing engines are required. Significant progress has been made in this area, but users of these large infrastructures are still facing formidable challenges. Response times to analytical queries are still too long, current solutions do not scale, analytical errors are still too high unless performance is sacrificed, and these large infrastructures cost dearly to acquire and maintain.
We are pursuing a radical new approach to solve the above problems, coined Data-less Big Data Analytics (DBDA). The main idea of DBDA is to process analytical queries without having to access any base data, while providing accurate answers. That is how we can have Scalable, Efficient and Accurate (SEA) systems. In this regard, different kind of aggregation queries which are central to in-DBMS analytics have been studied recently by our group. Their algorithms achieve many orders of magnitude improvement in query processing efficiency and near-perfect approximations of the underlying relationships among data attributes. However, much remains to be done: 1) but most of these approaches are not adaptive in terms of changes in data and changes in user's interests which result in loss of quality of models with time. 2) Approaches suffer from the cold-start problem, so that to make a SEA system we need appropriate training data to make models and unfortunately sometimes this training data does not exist or is too costly to produce. 3) There is a lack of a general framework that can lead to more scalable, efficient and accurate systems sooner.
The research questions to be tackled are: How to use machine learning techniques to improve big data systems? How to make the current methods adaptive to changes in data and users' interests? How to tackle the cold-start problem on this kind of methods? Can we propose a general framework to this problem?

This research will put forward the basic principles for designing the next generation intelligent data system infrastructures realizing this new analytics-processing paradigm and will take us closer to realizing the data-less big data analytics.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513374/1 01/10/2018 30/09/2023
2191237 Studentship EP/R513374/1 01/04/2019 30/09/2022 Ali Mohammadi Shanghooshabad