Heterogeneous and Permanent Data

Lead Research Organisation: University of Edinburgh
Department Name: Lab. for Foundations of Computer Science

Abstract

Since its inception a little more than four years ago the Database Group in the School of Informatics has grown to a leading database research group -- certainly the strongest in the UK and one of the strongest in the world. The group has also gained visibility by leading the research of the UK Digital Curation Centre. This application for a platform grant is to sustain that momentum and to provide the means to continue the group's interaction with the Digital Curation Centre.The work of the group is based on the proposition that our data resources are valuable, that they are necessarily heterogeneous in structure, and that, in the case of research data, we need to preserve that value for future researchers and scholars. The main research themes of the group are concerned with data exchange and integration, provenance and data quality, security, distributed data, data archiving. We have been particularly concerned with the advancement of these topics in relation to semistructured data such as XML and web data. It goes almost without saying that we cannot make much progress without understanding principles and building models, hence our involvement in database theory. Equally, database work is all about making things work efficiently, hence our extensive work in database systems and the many forms of optimisation related to storage and manipulation of data. The work of the PIs over first four years has been devoted to building up a critical mass of researchers and developing a good set of research topics. Having put our initial effort into this -- as well as into the concomitant effort of finding space, administrative support, hiring, teaching new courses, etc. -- it is now time to turn our energy to building new collaborative links with the UK and Europe. To this end we are initiating a UK-based collaborative project on data quality, European collaborations on database preservation and dynamic web data, new ties with the financial sector in Edinburgh and some e-science collaborations with the Digital Curation Centre. A platform grant will provide the flexibility to move our researchers onto these new projects and will allow us to respond rapidly to new research problems that we expect to arise in connection with all these areas.

Publications

10 25 50

publication icon
Acar U. (2010) A graph model of data and workflow provenance in 2nd Workshop on the Theory and Practice of Provenance, TaPP 2010

publication icon
Amano S (2009) XML schema mappings

publication icon
Barceló P (2014) Efficient Approximations of Conjunctive Queries in SIAM Journal on Computing

publication icon
Benedikt M (2009) Schema-based independence analysis for XML updates in Proceedings of the VLDB Endowment

publication icon
Buneman P (2008) Curated databases

publication icon
Buneman P (2009) Curating the CIA World Factbook in International Journal of Digital Curation

 
Description From a research perspective, the grant was instrumental in the transition and application of database research to graph databases. Rather than keeping data in highly-structured (and sometimes constraining) relational databases, much data is now stored in less or differently structured formats such as XML, JSON and RDF. This grant was key in the transition of database research to these formats.

In addition the grant promoted new ideas in data cleaning and started the (computational) field of data citation.
Exploitation Route The following topics have been taken forward by others:
Data cleaning
Data exchange
Data citation
Sectors Digital/Communication/Information Technologies (including Software)

Culture

Heritage

Museums and Collections

Other

 
Description Data Citation. Work on this started during the project and the topic is directly related to the themes of the project. It is now recognised as a major issue even by the EPSRC itself in their blurbs about data. Not sure if this is "non-academic", but it is pervasive in all forms of scholarship. The computational principles are now being implemented in various areas. Rural Networks. Only distantly related to this project, but since Researchfish seems to think that it is (see related grants) It is worth pointing out that we have now built the biggest rural network in the UK and literally thousands of people are benefiting from high speed internet in the most remote parts of Scotland.
First Year Of Impact 2010
Sector Digital/Communication/Information Technologies (including Software),Other
Impact Types Cultural

Societal

Economic

Policy & public services

 
Description Carnegie UK Trust
Amount £40,000 (GBP)
Organisation Carnegie Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 05/2012 
End 06/2013
 
Description Carnegie UK Trust
Amount £40,000 (GBP)
Organisation Carnegie Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 05/2012 
End 06/2013