Engineering Transformation for the Integration of Sensor Networks: A Feasibility Study - 'ENTRAIN'

Lead Research Organisation: UK Centre for Ecology & Hydrology
Department Name: Water Resources (Wallingford)

Abstract

There is a need to make use of new digital data analysis techniques to improve our understanding of the environment. Data from a new generation of environmental sensors, combined with analyses based on Artificial Intelligence, has the potential to help us understand from human influences and long-term change are affecting the environment around us. Artificial Intelligence approaches enable computers to identify trends and relationships across different streams of data, often picking out patterns that would be too difficult or time-consuming for humans to identify manually.

To realise these benefits, data from diverse sensor networks must combined and analysed together. Currently many sensor networks are operated individually, and data are not readily combined due to differences in the way measurements are made (e.g. between weekly river samples and sub-second measurements of gases in the atmosphere). In addition, to combine these data in an automatic way without human intervention requires much finer and more consistent descriptions of the contents of data streams, so that machines can understand the content sufficiently. Links between sensors in space are also important, and machines will need an understanding of these links, not just in the sense of coordinates, but for example how sensors are linked along rivers. We can construct a digital representation of rivers in order to enable this.
We will describe the various elements of a future environmental analysis system that will be required in order to achieve these benefits, and addressing some of these currently missing components. We will look at technologies, from databases to data transfer mechanisms, to understand how a system could be built.

We will use data from 3 NERC sensor networks measuring environmental variables from the atmosphere to river water quality, and show how this data can be automatically integrated in such a way that machines would be able to analyse it automatically.
A significant issue when monitoring with high-resolution sensors is how to handle problems in the data, which could include missing data, and erroneous values due to sensor failure. There is too much data for humans to manually view and check, and so automated approaches are needed. Currently these are often simple checks of individual data values against expected ranges, but again there are opportunities for artificial intelligence to improve this. AI approaches can look across multiple sensors, identify relationships, and find subtle changes in data signals, and this can be used to both identify data problems and to fix them through infilling. We will enhance the 3 NERC networks by testing and applying such approaches to data quality control.

We will investigate some fundamental limitations of high-resolution monitoring, the transfer of large amounts of data from the field site to the data centre, the security of such systems, and whether more processing could be done on the instruments themselves to reduce data transfer volumes.

We will meet with the public, with policy-makers, with industry and with researchers to discuss where there will be most to be gained from development of AI approaches to analysing environmental sensor data. We will develop ideas for future work to realise these gains, and will promote the benefits of an integrated system for environmental monitoring. These stakeholders are likely to include the Environment Agency, SEPA, Natural Resources Wales, Defra, Water companies, sensor network developers, and public organisations with an interest in the environment, including the National Trust, the Rivers Trusts, and local community groups.

Planned Impact

The Digital Environment programme will benefit from ENTRAIN's foundation work, providing requirements, methods, best practice advice and recommendations for integration and data modelling across multiple sensor networks and other datasets, such as EO. This information and techniques will benefit other areas of science and industry, as well as public engagement.

Environmental practitioners, regulators, government, consultants, the water industry, agribusiness, insurance, and many others can benefit substantially from the joined-up evidence that ENTRAIN will start to generate.

The public, schools and colleges will benefit from access to meaningful data in a spatially aware context. There is strong public interest in environmental issues, and yet beyond weather forecasts, weather data and perhaps more recently air quality readings, there are few accessible data which have tangible meaning to the layperson.

The Earth Observation (EO) community will benefit from better access to connected in situ data for retrieval algorithm validation & development. E.g. flooding extent, soil moisture and land cover products etc. and from new automated Phenocam greenness products output by ENTRAIN. Other Big Data projects (e.g. Data Labs) will benefit from a greater and easier connection to datasets, with common spatio-temporal linking requirements and proper metadata description already done.

Data modellers, and environmental informatics will benefit from improved sensor metadata schemes, sensor registers/catalogues, and data structuring for interoperability of a network of networks. We will disseminate the results of the ENTRAIN feasibility study also through an environmental informatics paper, describing the proposed methods and advances made in data modelling and data provenance, to ensure wide impact and uptake of these methods.

Observational scientists and regulatory observers will benefit from our website case studies, webinars, training workshops and online video tutorials to disseminate best practice and training in realtime data collection, cyber security, data vocabularies, metadata schemes and new deep learning QC techniques. These studies will benefit the global environmental, and wider, data communities such as the Committee on Data for Science and Technology (CODATA). Harmonisation of data networks to increase or facilitate interoperability, and production of spatio-temporally connected observations across environmental domains (e.g. Digital Rivers), will benefit the British Geological Survey (BGS), CEH, UK MetOffice, the Environment Agency (EA), the Scottish Environmental Protection Agency (SEPA), Defra, water companies (and other utilities such as the power grid), argi-business, insurers, and public health. Other NERC and EPSRC funded programmes (e.g. ASSIST, Natural Flood Management, HydroJULES, Internet of Food Things) and Defra Air Quality Monitoring Networks will all benefit through higher data quality, more complete data and efficiency gains in analysing data across sensor networks, and by using developed data structures in other environmental monitoring and food chain domains. The spatially connected integrated data visualisations will be demonstrated at both academic and industry events and conferences, where practitioners can benefit from e.g. improved water quality alerts (e.g. algal blooms, nutrient levels etc.), and crucially see the connected drivers of those trends, and get decision support information. Similarly, this has the potential to provide interconnected, cross-discipline, environmental management information that can give government the evidence chain to implement, monitor and evaluate policy.

We will monitor and evaluate the success of our impacts by recording attendances at events, the number of website visits, and use Twitter to promote activities, whilst recording the number of followers and retweets. We will also log email enquiries, webinar and web video views.
 
Description Machine learning tools can be very effective for improving the quality of environmental measurements, allowing prediction of measurements using independent data that can be used to understand whether there are likely to be issues with any individual data value or series of values. Machine learning can also be used to infill data series to provide more complete datasets, even for complicated variables such as rainfall.

A number of new data streams can be produced through use of proxy measurements. These include snow water information from soil moisture measurements, and measures of field greenness from camera images. These new data streams can contribute to advances in understanding of the environment.

Environmental measurements from different monitoring networks can, despite significant differences between the data being measured, be harmonised through the use of data standards. These can capture detailed information about each measurement without reducing the readability and usability of the dataset.

Monitoring data across the freshwater environment can be linked up using digital representations of the river / lake network and new approaches to storing this data can enable rapid analysis of data up and downstream.
Exploitation Route Machine learning QA tools could be used in any monitoring project. They are generic and can be automated in order to build the best models for assessing and infilling a given variable. This could lead to significant improvements in the quality and completeness of environmental datasets.This could be undertaken by any monitoring group, or research data centre.

To do this would require access to wider networks of environmental data. This can be achieved through the use of sensor data standards. The standard developed within this project could be applied to integrate sensor data streams from a number of networks to provide an effective network of networks.

Use of river networks to link freshwater monitoring and inputs could lead to more effective exploitation of data and application of statistical and machines learning models. An approach such as the US Internet of Water would enable this to be undertaken on a consistent basis, shared with government, research, and commercial organisations to improve understanding of the status of freshwaters.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment

 
Description Constructing a Digital Environment short projects: "A NERC data service for integrating NERC sensor networks"
Amount £40,000 (GBP)
Organisation Natural Environment Research Council 
Sector Public
Country United Kingdom
Start 03/2023 
End 07/2023
 
Description Natural Capital Ecosystem Assessment
Amount £30,000 (GBP)
Organisation Department For Environment, Food And Rural Affairs (DEFRA) 
Sector Public
Country United Kingdom
Start 12/2021 
End 03/2022
 
Description Natural Capital Ecosystem Assessment pilot
Amount £80,000 (GBP)
Organisation Department For Environment, Food And Rural Affairs (DEFRA) 
Sector Public
Country United Kingdom
Start 10/2020 
End 03/2021
 
Title Automated machine-learning based algorithms for infilling of sensor network data 
Description This is a suite of code that runs within the operational COSMOS-UK data management system, that improves the quality of the data available to end users by automatically infilling missing periods of data using an appropriate algorithm. A number of models exist for each sensor time series (i.e. each variable at each site) which use either linear interpolation, simple statistical infilling, or a machine learning model (using XGBoost). The statistical and machine learning models use inputs of other variables at the same site or variables from other sites. A number of models are pre-trained for each site with a different set of input features, to give robustness in case any inputs are missing, Each model, and the interpolation methods, have a set of uncertainties pre-defined from the model training / testing. The system automatically chooses the methods with the lowest uncertainty. Infilling method used is identified for every infilled value and the uncertainties also stored within the data system providing a full audit trail, and allowing users to select which infilled data (or none) to use. 
Type Of Material Computer model/algorithm 
Year Produced 2021 
Provided To Others? No  
Impact The COSMOS-UK dataset now has more complete time series, meaning it is simpler and quicker for other researchers to use in, e.g. modelling applications, without having to assess quality control and undertake their own infilling. The infilled data is already in use, e.g. for running the JULES model at COSMOS-UK sites. 
 
Title Daily and sub-daily hydrometeorological and soil data (2013-2019) [COSMOS-UK] 
Description This dataset contains daily and sub-daily hydrometeorological and soil observations from COSMOS-UK (cosmic-ray soil moisture) monitoring network from October 2013 to the end of 2019. These data are from 51 sites across the UK recording a range of hydrometeorological and soil variables. Each site in the network records the following hydrometeorological and soil data at 30 minute resolution: Radiation (short wave, long wave and net), precipitation, atmospheric pressure, air temperature, wind speed and direction, humidity, soil heat flux, and soil temperature and volumetric water content (VWC), measured by point senors at various depths. Each site hosts a cosmic-ray sensing probe; a novel sensor technology which counts fast neutrons in the surrounding atmosphere. In combination with the recorded hydrometeorological data, neutron counts are used to derive VWC over a field scale (COSMOS VWC), at two temporal resolutions (hourly and daily). The presence of snow leads to erroneously high measurements of COSMOS VWC due to all the extra water in the surrounding area. Included in the daily data are indications of snow days, on which, the COSMOS VWC are adjusted and the snow water equivalent (SWE) is given. The potential evapotranspiration (PE), derived from recorded hydrometeorological and soil are also included at daily resolution. Two levels of quality control are carried out, firstly data is run through a series of automated checks, such as range tests and spike tests, and then all data is manually inspected each week where any other faults are picked up, including sensor faults or connection issues. Quality control flags are provided for all recorded (30 minute) data, indicating the reason for any missing data. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact The dataset is widely used as the largest network of field scale soil moisture measurements in the UK by researchers interested in soil moisture, comparison with satellite data products, hydrological modelling, etc. The ENTRAIN project improved the dataset through addition of new Snow Water Equivalent measurements. 
URL https://catalogue.ceh.ac.uk/id/b5c190e4-e35d-40ea-8fbe-598da03a1185
 
Title Greenness index from phenocams for 50 COSMOS-UK sites 
Description Field greenness metrics (as well as individual RGB time series) from the images from phenocams established at 50 COSMOS-UK sites. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? No  
Impact The dataset is being used to understand the use of the method in phenological research applications. 
 
Description Collaboration with NERC Environmental Data Services around sensors 
Organisation British Geological Survey
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing learning from the ENTRAIN project around sensor data integration, standards, and use cases from terrestrial and freshwater monitoring
Collaborator Contribution Contributing information on how different data centres work with sensor data and developing a vision for how this could be integrated.
Impact Successful proposal to NERC Constructing a Digital Environment programme, and small project
Start Year 2023
 
Description Collaboration with NERC Environmental Data Services around sensors 
Organisation National Oceanography Centre
Country United Kingdom 
Sector Academic/University 
PI Contribution Contributing learning from the ENTRAIN project around sensor data integration, standards, and use cases from terrestrial and freshwater monitoring
Collaborator Contribution Contributing information on how different data centres work with sensor data and developing a vision for how this could be integrated.
Impact Successful proposal to NERC Constructing a Digital Environment programme, and small project
Start Year 2023
 
Description Collaboration with NERC Environmental Data Services around sensors 
Organisation Science and Technologies Facilities Council (STFC)
Country United Kingdom 
Sector Public 
PI Contribution Contributing learning from the ENTRAIN project around sensor data integration, standards, and use cases from terrestrial and freshwater monitoring
Collaborator Contribution Contributing information on how different data centres work with sensor data and developing a vision for how this could be integrated.
Impact Successful proposal to NERC Constructing a Digital Environment programme, and small project
Start Year 2023
 
Description Digital Environment Expert Network 
Organisation Natural Environment Research Council
Country United Kingdom 
Sector Public 
PI Contribution Attending regular meetings of the NERC Digital Environment Expert Network, contributing to development of a horizon scanning activity and report, and other programme idea generation sessions.
Collaborator Contribution Travel and subsistence budgets were available but remained unused as all meetings were online. Communications and coordinating support was provided by the Digital Environment Programme.
Impact None as yet
Start Year 2020
 
Description Industry-Research collaboration around sensor metadata standards 
Organisation Epimorphics
Country United Kingdom 
Sector Private 
PI Contribution The collaboration was born of a subcontract within the project to Epimorphics, a company with extensive expertise in linked data, including in the area of UK environmental data. We provided detailed information regarding our use of sensors, how we collect and process data, how we store data and the requirements we have for managing information around sensor networks and sensor measurements (e.g. instruments used, processing used, etc.). A main focus was the wider need for integration of sensor metadata and therefore standardisation of data and metadata between networks. The ENTRAIN project subsequently contributed use cases to the Research Data Alliance iADOPT (Interoperable Descriptions of Observable Properties) working group, aiming to standardise the way that environmental measurements are described.
Collaborator Contribution Epimorphics joined a number of calls with other organisations working on research sensor data networks, including the British Geological Survey, National Oceanographic Centre, Met Office, and Marine Ireland. The aim of these was to develop some common threads of understanding around sensor data management, how others are managing large sensor networks and what their requirements are. This has resulted in a loose network of research sensor metadata operators, and ongoing discussions and collaborations around vocabularies, and data management approaches for sensor network data and metadata. The specific output of the Epimorphics work was a review of existing sensor metadata standards and development of a sensor metadata standard to meet CEH, and wider, requirements. A significant part of this was discussion around definition of complex observed properties, and how work by NOC (through a collaboration within the Research Data Alliance iADOPT Working Group) should be taken forward to improve representation of environmental measurements to increase interoperability.
Impact Sensor metadata review (see url).
Start Year 2019
 
Description UKCEH - Alan Turing Institute environment and sustainability project 
Organisation Alan Turing Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution Matt Fry (PI) has a part-time secondment to the Turing Institute to further collaborations around environment and sustainability, PGR data scientist is employed full time for 3 months on this project, furthering machine learning applications.
Collaborator Contribution Turing Institute have funded the secondment and PGR work, and provided networking opportunities to develop new data science applications.
Impact None as yet.
Start Year 2022
 
Description Presentation and hosting of discussion session around sensor data APIs at UKSCAPE engagement event 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact 42 people attended a one-day meeting in Manchester to help shape the development of the UKSCAPE Data Science Framework. Attendees included University researchers, representatives of Government bodies, Government agencies and non-Governmental public bodies. One of four areas of focus and engagement within the day's agenda was API access to environmental data. Matt Fry presented work on a number of APIs including work within ENTRAIN on standardisation of network metadata to enable interoperability, and subsequently ran one of the workshop activities on the subject of APIs. The enabled discussion with stakeholders on the future of integration of environmental data through APIs and standards, and delivered understanding of user needs to CEH.
Year(s) Of Engagement Activity 2019
URL https://www.ceh.ac.uk/get-involved/events/shaping-development-ceh-uk-scape-data-science-framework
 
Description Presentation at the Constructing a Digital Environment workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Add description later
Year(s) Of Engagement Activity 2019
URL https://nerc.ukri.org/innovation/activities/environmentaldata/digitalenv/news/digital-workshop/
 
Description Presentation to Catchment Based Approach Catchment Data Evidence Forum 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Third sector organisations
Results and Impact Presentation and Q&A on new approaches to access to data for UK freshwaters for organisations involved in local catchment partnerships within the Catchment Based Approach (CaBA), as part of the CaBA Catchment Data and Evidence Forum. Title was "Webinar: Making catchment data more accessible to end users".. The workshop was held remotely and was recorded, with the presentations held on YouTube.
Year(s) Of Engagement Activity 2020
URL https://catchmentbasedapproach.org/learn/catchment-data-and-evidence-forum-2020/
 
Description Workshop on opportunities for improving understanding of UK water quality using AI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact A small focussed workshop of 12 people, largely researchers and potential users representing Defra and Environment Agency discussed requirements for improved understanding of freshwater quality, specific issues needing more evidence / analysis, datasets available to inform this, and machine learning / AI approaches to address these issues. Ideas were taken forward both for future funding application, supported by Defra and EA, and also within Environment Agency internal review of water quality monitoring data.
Year(s) Of Engagement Activity 2019