VADA: Value Added Data Systems -- Principles and Architecture

Lead Research Organisation: University of Oxford

Department Name: Computer Science

Abstract

Data is everywhere, generated by increasing numbers of applications, devices and users, with few or no guarantees on the format, semantics, and quality. The economic potential of data-driven innovation is enormous, estimated to reach as much as £40B in 2017, by the Centre for Economics and Business Research. To realise this potential, and to provide meaningful data analyses, data scientists must first spend a significant portion of their time (estimated as 50% to 80%) on "data wrangling" - the process of collection, reorganising, and cleaning data.

This heavy toll is due to what is referred as the four V's of big data: Volume - the scale of the data, Velocity - speed of change, Variety - different forms of data, and Veracity - uncertainty of data. There is an urgent need to provide data scientists with a new generation of tools that will unlock the potential of data assets and significantly reduce the data wrangling component. As many traditional tools are no longer applicable in the 4 V's environment, a radical paradigm shift is required. The proposal aims at achieving this paradigm shift by adding value to data, by handling data management tasks in an environment that is fully aware of data and user contexts, and by closely integrating key data management tasks in a way not yet attempted, but desperately needed by many innovative companies in today's data-driven economy.

The VADA research programme will define principles and solutions for Value Added Data Systems, which support users in discovering, extracting, integrating, accessing and interpreting the data of relevance to their questions. In so doing, it uses the context of the user, e.g., requirements in terms of the trade-off between completeness and correctness, and the data context, e.g., its availability, cost, provenance and quality. The user context characterises not only what data is relevant, but also the properties it must exhibit to be fit for purpose. Adding value to data then involves the best effort provision of data to users, along with comprehensive information on the quality and origin of the data provided. Users can provide feedback on the results obtained, enabling changes to all data management tasks, and thus a continuous improvement in the user experience.

Establishing the principles behind Value Added Data Systems requires a revolutionary approach to data management, informed by interlinked research in data extraction, data integration, data quality, provenance, query answering, and reasoning. This will enable each of these areas to benefit from synergies with the others. Research has developed focused results within such sub-disciplines; VADA develops these specialisms in ways that both transform the techniques within the sub-disciplines and enable the development of architectures that bring them together to add value to data.

The commercial importance of the research area has been widely recognised. The VADA programme brings together university researchers with commercial partners who are in desperate need of a new generation of data management tools. They will be contributing to the programme by funding research staff and students, providing substantial amounts of staff time for research collaborations, supporting internships, hosting visitors, contributing challenging real-life case studies, sharing experiences, and participating in technical meetings. These partners are both developers of data management technologies (LogicBlox, Microsoft, Neo) and data user organisations in healthcare (The Christie), e-commerce (LambdaTek, PricePanda), finance (AllianceBernstein), social networks (Facebook), security (Horus), smart cities (FutureEverything), and telecommunications (Huawei).

Planned Impact

The economic impact of relevant activities is difficult to approximate, but the value of the sub-areas of Big Data, Data Integration and Data Quality is forecast to be over $50B by 2017:
- The International Institute of Analytics estimate the Big Data market at $16.1B in 2014, growing 6 times faster than the overall IT market. Projection for 2017 is ~$50B.
- Gartner (2014) estimates the Data Integration tool market at over $2.2B at end 2013, an increase of 9.4% from 2012. Growth rate is above average for the enterprise software market. By 2018 total revenue should be ~$3.6B
- Gartner (2014) estimates the Data Quality market as $960M in software revenue at end 2012 ($2B by 2017), an increase of 12.3% from 2011.
Thus directly associated markets - with users across government, industry, health and commerce - are large and fast growing.

Who will benefit from this research?

Data is central to the efficient operation of many technology development and user organisations, and is the raison d'etre for many others. Here we categorise potential VADA beneficiaries, into:
1. Technology providers of platforms and solutions for collecting, integrating, and aggregating data. Partner examples include LogicBlox, Microsoft, Neo. New business opportunities are likely to emerge, where impact results from the development of techniques to enable more efficient and effective use of available data.
2. Organisations having a need for such platforms. This is almost every organization; our partners include knowledge companies who work with product (LambdaTek, PricePanda), financial (AllianceBernstein), security (Horus), social networking (Facebook), telecommunications (Huawei), governmental (FutureEverything) and healthcare (Christie) data.

All partners have highlighted the importance of this research in their support letters:
* VADA addresses fundamental questions that have great significance (Microsoft),
* The challenge addressed by VADA is a significant one (LogicBlox),
* VADA tackles several problems that are of great interest (Facebook),
* We need an automatic approach to reliable, timely and continuous collection and evaluation of sources against an ever-increasing amount of raw data. Current data collection technologies are neither reliable nor scalable enough. (Horus)
* To remain competitive we need to enrich our product data with extended background data. No technology that currently exists can do this. (LambdaTek).

How might they benefit from this research?

VADA's impact is in line with the RCUK priorities:
1. Contribute toward wealth creation and economic prosperity. VADA will develop techniques and methodologies informing the development of platforms to add value to data. Among the many mechanisms that can realise this, we propose a consultancy spin-out. We believe that this will ease the efficient transfer of knowledge from academia to UK industry, as previously demonstrated by similar successful ventures.
2. Shape/enhance effectiveness of public services. The UK has signed up to the Open Government Declaration, which should make travel easier and healthcare better, and create significant growth for UK industry (http://www.cabinetoffice.gov.uk/news/open-data-measures-autumn-statement). However, exploiting such data involves inter-relating it with other data sources, managing variety and veracity. SMEs such as FutureEverything will benefit from efficient techniques for adding value to such data.
3. Enhance training capacity, knowledge and skills of businesses and organisations. Within many organisations, efficient sharing and use of data is crucial for decision-making. VADA will directly train 11 PhD students, supporting exchange visits, workshops, and a summer school. VADA's academics will be also involved in the design of training courses on Value Added Data Systems for the next generation of higher education post-graduate programmes and skill training courses for the industry.

Funded Value:

£4,557,635

Funded Period:

Mar 15 - Sep 20

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/M025268/1

Principal Investigator:

Georg Gottlob

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Fundamentals of Computing (15%)

Information & Knowledge Mgmt (85%)

Organisations

People	ORCID iD
Georg Gottlob (Principal Investigator)
Dan Olteanu (Co-Investigator)
Sebastian Maneth (Co-Investigator)
Oscar Buneman (Co-Investigator)
John Keane (Co-Investigator)
Alvaro Fernandes (Co-Investigator)	http://orcid.org/0000-0002-6100-7199
Paolo Guagliardo (Co-Investigator)	http://orcid.org/0000-0003-0756-5787
Norman Paton (Co-Investigator)
Leonid Libkin (Co-Investigator)
Thomas Lukasiewicz (Co-Investigator)
Wenfei Fan (Co-Investigator)
Andreas Pieris (Co-Investigator)
Giorgio Orsi (Researcher Co-Investigator)
Tim Furche (Researcher Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 5 6 7 8 9 10 > >|

10 25 50

Abboud R (2020) On the Approximability of Weighted Model Integration on DNF Structures

Abboud R (2022) Approximate weighted model integration on DNF structures in Artificial Intelligence

Abboud R (2020) Learning to Reason: Leveraging Neural Networks for Approximate DNF Counting in Proceedings of the AAAI Conference on Artificial Intelligence

Abboud R (2019) Learning to Reason: Leveraging Neural Networks for Approximate DNF Counting

Abboud R. (2020) Learning to reason: Leveraging neural networks for approximate dnf counting in AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

Abboud R. (2020) On the approximability of weighted model integration on DNF structures in 17th International Conference on Principles of Knowledge Representation and Reasoning, KR 2020

Abel E (2020) Targeted evidence collection for uncertain supplier selection in Expert Systems with Applications

Abel E (2018) User driven multi-criteria source selection in Information Sciences

Abel E (2018) SOURCERY