Data for Digital Decarbonisation (3D): A FAIR approach to energy demand data in buildings

Lead Research Organisation: Loughborough University
Department Name: Architecture, Building and Civil Eng

Abstract

It is a salient truth that the better clarity and structure of data relating to a problem, the easier that problem will be to solve. In the words of Linus Torvalds, the inventor of the Linux operating system upon which the majority of today's internet runs:

"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." - Linus Torvalds

This proposal is fundamental, early stage research into how data for energy demand in buildings is structured and stored. The output will be proof-of-concept of new data structures and techniques to greatly improve our ability to make design and policy decisions for zero-carbon buildings. The proposed work takes a selection of existing, established open-access energy datasets and converts them to highly-structured versions using the FAIR open data guidelines (https://www.go-fair.org/fair-principles/). This is a small but crucial first step in moving the energy data community from 'closed-world' to 'open-world' data models to enhance openness, transparency and collaboration.

Why do this? Reducing the energy demand from buildings is of key national importance in delivering the zero carbon economy. This is largely a policy challenge, as many proven building retrofit technologies already exist and successive governments have found it very difficult to develop long-lasting successful policy initiatives for the sector. Perhaps inspiration can be found in the Government's recent policy response to the Covid-19 crisis. Here the combination of data and modelling from many cooperating epidemiology academic groups added weight to the arguments for action and helped drive the policy of the pandemic response. A similar initiative is needed in the energy demand response to climate change; rather than isolated academic and/or professional groups developing data and tools which only they understand, a fundamental shift in required to transform collaboration and reproducibility within the field so that significant momentum for policy design and implementation can be created.

Epidemiology, and the medical field in general, have been at the forefront of developing new data standards and structures due to the critical importance to avoid misunderstandings in data interpretation. This greatly informed the development of the FAIR open data guidelines which have been widely adopted by the Open Research movement. FAIR stands for Findable, Accessible, Interoperable and Reusable, and the guidelines give a set of broad principles for open research data provision. Although widely discussed, very few datasets in the energy demand field are fully compliant with the FAIR guidelines. Of particular note is the requirement to use Unique Identifiers to represent concepts and information.

Ultimately, meeting the FAIR guidelines require a shift away from simple data structures such as Excel spreadsheets and csv files to more refined, machine-readable and standardised data representations such as node-edge graphs and Resource Description Framework (RDF) structures. These more complex data structures are harder to initially create but contain much more embedded meaning and logic, both in the information and the relationships between the points of information, so that they naturally are better suited for complex analyses and for other users to understand and reuse. The premise of this work is that a sector-wide move to FAIR-compliant energy demand data is an underlying, necessary requirement for a truly collaborative and innovative environment for energy research informing future technology, design and policy developments.

Publications

10 25 50
 
Description The project has identified a novel route to publishing large time-series energy demand datasets which meet the FAIR data guidelines. The CSV on the Web (CSVW) standards, published by W3C and promoted by the UK Government Digital Service, represent a formalised method to publishing data as CSV files but with detailed metadata in the form of an accompanying JSON file (https://csvw.org). Crucially the CSVW standards include a method for converting CSVW data to the RDF format, which can contain the unique identifiers required in the FAIR guidelines. The work has also identified the Semantic Sensor Network Ontology (SOSA) as a suitable RDF ontology for storing the type of sensor measurements found in energy research datasets (www.w3.org/TR/vocab-ssn). The concept is that a large time-series energy demand dataset is published as a CSVW file with detailed metadata so that it could, if desired, be converted to a SOSA-based RDF dataset. This would meet the FAIR criteria but also, as the dataset is published as CSVW, be accessible for analysis to users who understand CSV files but do not know how the RDF data model works.
Exploitation Route The main effort has been in developing a new implementation of the CSVW standards, in the Python package named csvw_functions. This was deemed necessary as the existing implementations did not have the level of transparency and reusability required. This should prove valuable to researchers working with and publishing CSVW data in all fields.
Sectors Construction,Digital/Communication/Information Technologies (including Software),Energy

URL https://github.com/stevenkfirth/3DFAIR
 
Title Python implementation of the CSVW standards 
Description This is a Python package which implements the following W3C standards: Model for Tabular Data and Metadata on the Web Metadata Vocabulary for Tabular Data Generating JSON from Tabular Data on the Web Generating RDF from Tabular Data on the Web These standards together comprise the CSV on the Web (CSVW) standards. The package is written as pure Python and passes all the tests in the CSVW Test Suite. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Still ongoing 
URL https://github.com/stevenkfirth/csvw_functions