DM4(B)T: Data Management for (Build)TEDDI(NET) using Semantic Technologies

Lead Research Organisation: University of Bath
Department Name: Computer Science

Abstract

CONTEXT

EPSRC funded 22 projects over two calls in 2010 and 2012 to investigate `Transforming Energy Demand through Digital Innovation' (TEDDI) as a means to find how and how people use energy in homes and what can be done reduce energy consumption. As a result a lot of data is being collected at different levels of detail in a variety of housing up and down the UK, but the mode, detail and quantity are largely defined by the needs of each individual project. At the same time, the research councils (RCUK) are defining guidelines for what happens to data generated by projects they fund, for which universities are then defining policies and finally researchers are then taking concrete actions to store, preserve and document data for future reference.

The problem at this current time is that there is relatively little awareness, limited experience and only emerging practice of how to incorporate data management into much of (physical) science research. This is in stark contrast to established procedures for data formats and sharing in the biosciences, stemming from international collaboration on the Human Genome Project, and in the social sciences, where data from national surveys, including census data, have been centrally archived for many years. Consequently, current solutions adopted by (Build)TEDDI projects may be able to meet a minimal interpretation of the requirements, but not effectively deliver the desired data legacy, such as (for example) the means to execute trans-project queries, or being able to cite the results of such queries for the sake of reproducibility.

AIMS AND OBJECTIVES

The challenges described above, which we address in DM4(B)T in the microcosm of the TEDDI projects, are tackled in three ways:

1. Raising awareness with those who are responsible for data management (principal investigators),

2. Developing a framework to guide the process of making the choices for how to go about implementing data management and

3. Demonstrating example tools that will enable researchers to bring together and re-analyse data from different projects more easily,

which together will help researchers (i) to satisfy funding and institutional guidelines for data management, (ii) begin the process of forming a data management culture in science research and (iii) create a substantial case study in science data management which can inform the three primary stakeholders (researchers, institutions and research councils) across a range of issues (see Recommendations below).

Key activities and outputs:

1. Workshops (i) to gather information about current practice, (ii) present data management problems and outline analysis and solutions and (iii) to disseminate knowledge of tools and (new) practices to support effective data management.

2. Tools and techniques: to allow researchers to harness both the variety and volume of data being collected specifically within the (Build)TEDDI projects. The tools will be made available open-source for access by other researchers to expand and adapt.

3. Recommendations: these will take the form of an online report to identify routes to facilitate a sustainable data legacy (management, curation and citation) for projects in the science and engineering domain.

APPLICATIONS AND BENEFITS

1. (Build)TEDDI projects will benefit directly from the above activities and outputs to meet institutional and research council requirements.

2. Other researchers will benefit from being able to access (Build)TEDDI data.

3. The outputs will benefit the wider research community in science and engineering through the provision of an easy-to-adopt (and adapt) data management methodology.

Planned Impact

Spreading outwards from the activities taking place in the project, the classes of beneficiaries and consequent impacts are:

1. The investigators and researchers in (Build)TEDDI projects. While some of these are in the investigators' immediate professional circle, the interdisciplinary nature of the portfolio means that many are not. This group are directly exposed to the problems of data management and data legacy creation and can benefit from DM4(B)T's activities to help them analyse those problems and develop appropriate solutions, while also fostering a culture in these projects (and their) researchers that begins to take account of (future) data issues. This will take place during the lifetime of the DM4(B)T project and current (Build)TEDDI projects.

2. The institutions hosting (Build)TEDDI projects. The practices and culture emerging from the (Build)TEDDI projects can inform and contribute to the development of institutional data management guidelines and identification of the resources needed to support them. This can begin to occur during the lifetime of current (Build)TEDDI projects, but will continue afterwards.

3. The EPSRC that is funding the (Build)TEDDI projects. A direct benefit is improved compliance with funder data policies, of benefit to both research councils and researchers within institutions tasked with ensuring compliance. Indirect benefits are enhanced knowledge and awareness of data issues at institutional level enabling capacity for informed discussion between institutions and research councils over guidelines, compliance, resources and implementation. This too can begin during the period of current (Build)TEDDI projects, but will gain momentum subsequently as the data management agenda gathers pace and informed by the recommendations coming out of DM4(B)T's engagement with the (Build)TEDDI project portfolio.

4. Sibling projects in the same institution. Transfer of knowledge and practice to other projects, either through common investigators or local institutional enhancement activities can lead to increased skills amongst RCUK-funded researchers in data management. This would include knowledge of how to curate their data for publication, where and how to publish data, particularly if an institutional data repository is not available, and what ethical issues and licence conditions should be complied with. Expected time frame is during the lifetime of current (Build)TEDDI projects and thereafter.

5. Policy-makers (e.g. DECC/DEFRA) and non-governmental organizations (e.g. Energy Saving Trust, Carbon Trust who often report on field data) or charities (e.g. Centre for Sustainable Energy who worked on the National Household Model). DM4(B)T will initiate the process and prototype the tools to enable increased access to publicly funded data and increased value extraction from existing investment in research by support for and facilitation of publication and re-analysis of data from (Build)TEDDI projects. This in turn can increase knowledge derived from data and hence increased understanding of energy use in homes which may help better inform policy-analysis and policy-making. Impact will start through these stakeholders participating in workshops held with TEDDINET (workshops 1-3) and WHOLESEM (enabled through the CEE, workshop 4), initiating the debate and increasing awareness of what can be achieved through enhanced data management practices. Expected time frame is the duration of the above projects and thereafter.

6. The wider public. Public participation in future studies on domestic energy use will benefit from better understanding of ethical issues by researchers obliged to publish research outputs, thus ensuring privacy of and transparency for participants whilst maximising the potential for data sharing. Time frame is after the end of the current round of projects.

Publications

10 25 50
 
Description We held two workshops in Bath aimed primarily at the TEDDInet community but also more widely at researchers (primarily in STEM) that create and use datasets. Presentations and summaries are available on the project website at http://www.cs.bath.ac.uk/dm4t/.

We developed a prototype web tool for non-specialist users to enable the creation of semantic annotations for classical comma-separated value datasets. This builds on existing ontologies for semantic sensor networks (W3C), quantities, units and datatypes (QUDT) and smart energy-aware systems (SEAS). Consequently, this work makes it possible to author a semantic description of the dataset after its creation and to use that semantic description to process semantic queries of the content, rather than needing to understand the structure/layout of the dataset. A video demonstrating the tool is available at http://www.cs.bath.ac.uk/dm4t/ and an article is in preparation.

We interviewed teams from three projects funded under the TEDDI programme at StrathClyde (APAtSCHE and REFIT), Loughborough (REFIT and DeFacto) and Bath (ENLITEN) along with representatives of each university's research data support team. In addition we conducted a survey of the above and several other TEDDI projects and a survey of the above and several other university research data teams. These allowed us to assess the level of knowledge about research data management among those researchers, levels of institutional support and the institutional and other barriers that both encounter in seeking to publish and make research datasets accessible. An article analysing the interviews and surveys is in preparation.
Exploitation Route UKRI: policy, guidance, support for preservation of and access to STEM datasets as requirements on grants and expectations of institutions.

Institutions: policy, guidance, support for preservation of and access to STEM datasets, as implementation of UKRI expectations.

Research community and industry: practice in data management (for creators) and challenges in access to datasets (for consumers).
Sectors Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Energy,Manufacturing, including Industrial Biotechology

URL http://www.cs.bath.ac.uk/dm4t/
 
Description ARCC webinar 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Presented the DM4T project as part of a webinar organized by the ARCC network (http://www.arcc-network.org.uk/)
Year(s) Of Engagement Activity 2017
 
Description CTECH symposium 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Presentation at CTECH Symposium held at InnovateUK
Year(s) Of Engagement Activity 2017