Engineering Transformation for the Integration of Sensor Networks: A Feasibility Study - 'ENTRAIN'

Lead Research Organisation: UK Ctr for Ecology & Hydrology fr 011219
Department Name: Water Resources (Wallingford)


There is a need to make use of new digital data analysis techniques to improve our understanding of the environment. Data from a new generation of environmental sensors, combined with analyses based on Artificial Intelligence, has the potential to help us understand from human influences and long-term change are affecting the environment around us. Artificial Intelligence approaches enable computers to identify trends and relationships across different streams of data, often picking out patterns that would be too difficult or time-consuming for humans to identify manually.

To realise these benefits, data from diverse sensor networks must combined and analysed together. Currently many sensor networks are operated individually, and data are not readily combined due to differences in the way measurements are made (e.g. between weekly river samples and sub-second measurements of gases in the atmosphere). In addition, to combine these data in an automatic way without human intervention requires much finer and more consistent descriptions of the contents of data streams, so that machines can understand the content sufficiently. Links between sensors in space are also important, and machines will need an understanding of these links, not just in the sense of coordinates, but for example how sensors are linked along rivers. We can construct a digital representation of rivers in order to enable this.
We will describe the various elements of a future environmental analysis system that will be required in order to achieve these benefits, and addressing some of these currently missing components. We will look at technologies, from databases to data transfer mechanisms, to understand how a system could be built.

We will use data from 3 NERC sensor networks measuring environmental variables from the atmosphere to river water quality, and show how this data can be automatically integrated in such a way that machines would be able to analyse it automatically.
A significant issue when monitoring with high-resolution sensors is how to handle problems in the data, which could include missing data, and erroneous values due to sensor failure. There is too much data for humans to manually view and check, and so automated approaches are needed. Currently these are often simple checks of individual data values against expected ranges, but again there are opportunities for artificial intelligence to improve this. AI approaches can look across multiple sensors, identify relationships, and find subtle changes in data signals, and this can be used to both identify data problems and to fix them through infilling. We will enhance the 3 NERC networks by testing and applying such approaches to data quality control.

We will investigate some fundamental limitations of high-resolution monitoring, the transfer of large amounts of data from the field site to the data centre, the security of such systems, and whether more processing could be done on the instruments themselves to reduce data transfer volumes.

We will meet with the public, with policy-makers, with industry and with researchers to discuss where there will be most to be gained from development of AI approaches to analysing environmental sensor data. We will develop ideas for future work to realise these gains, and will promote the benefits of an integrated system for environmental monitoring. These stakeholders are likely to include the Environment Agency, SEPA, Natural Resources Wales, Defra, Water companies, sensor network developers, and public organisations with an interest in the environment, including the National Trust, the Rivers Trusts, and local community groups.

Planned Impact

The Digital Environment programme will benefit from ENTRAIN's foundation work, providing requirements, methods, best practice advice and recommendations for integration and data modelling across multiple sensor networks and other datasets, such as EO. This information and techniques will benefit other areas of science and industry, as well as public engagement.

Environmental practitioners, regulators, government, consultants, the water industry, agribusiness, insurance, and many others can benefit substantially from the joined-up evidence that ENTRAIN will start to generate.

The public, schools and colleges will benefit from access to meaningful data in a spatially aware context. There is strong public interest in environmental issues, and yet beyond weather forecasts, weather data and perhaps more recently air quality readings, there are few accessible data which have tangible meaning to the layperson.

The Earth Observation (EO) community will benefit from better access to connected in situ data for retrieval algorithm validation & development. E.g. flooding extent, soil moisture and land cover products etc. and from new automated Phenocam greenness products output by ENTRAIN. Other Big Data projects (e.g. Data Labs) will benefit from a greater and easier connection to datasets, with common spatio-temporal linking requirements and proper metadata description already done.

Data modellers, and environmental informatics will benefit from improved sensor metadata schemes, sensor registers/catalogues, and data structuring for interoperability of a network of networks. We will disseminate the results of the ENTRAIN feasibility study also through an environmental informatics paper, describing the proposed methods and advances made in data modelling and data provenance, to ensure wide impact and uptake of these methods.

Observational scientists and regulatory observers will benefit from our website case studies, webinars, training workshops and online video tutorials to disseminate best practice and training in realtime data collection, cyber security, data vocabularies, metadata schemes and new deep learning QC techniques. These studies will benefit the global environmental, and wider, data communities such as the Committee on Data for Science and Technology (CODATA). Harmonisation of data networks to increase or facilitate interoperability, and production of spatio-temporally connected observations across environmental domains (e.g. Digital Rivers), will benefit the British Geological Survey (BGS), CEH, UK MetOffice, the Environment Agency (EA), the Scottish Environmental Protection Agency (SEPA), Defra, water companies (and other utilities such as the power grid), argi-business, insurers, and public health. Other NERC and EPSRC funded programmes (e.g. ASSIST, Natural Flood Management, HydroJULES, Internet of Food Things) and Defra Air Quality Monitoring Networks will all benefit through higher data quality, more complete data and efficiency gains in analysing data across sensor networks, and by using developed data structures in other environmental monitoring and food chain domains. The spatially connected integrated data visualisations will be demonstrated at both academic and industry events and conferences, where practitioners can benefit from e.g. improved water quality alerts (e.g. algal blooms, nutrient levels etc.), and crucially see the connected drivers of those trends, and get decision support information. Similarly, this has the potential to provide interconnected, cross-discipline, environmental management information that can give government the evidence chain to implement, monitor and evaluate policy.

We will monitor and evaluate the success of our impacts by recording attendances at events, the number of website visits, and use Twitter to promote activities, whilst recording the number of followers and retweets. We will also log email enquiries, webinar and web video views.

Related Projects

Project Reference Relationship Related To Start End Award Value
NE/S016244/1 31/03/2019 30/11/2019 £251,614
NE/S016244/2 Transfer NE/S016244/1 01/12/2019 29/06/2020 £62,903
Description Machine learning tools can be very effective for improving the quality of environmental measurements, allowing prediction of measurements using independent data that can be used to understand whether there are likely to be issues with any individual data value or series of values. Machine learning can also be used to infill data series to provide more complete datasets, even for complicated variables such as rainfall.

A number of new data streams can be produced through use of proxy measurements. These include snow water information from soil moisture measurements, and measures of field greenness from camera images. These new data streams can contribute to advances in understanding of the environment.

Environmental measurements from different monitoring networks can, despite significant differences between the data being measured, be harmonised through the use of data standards. These can capture detailed information about each measurement without reducing the readability and usability of the dataset.

Monitoring data across the freshwater environment can be linked up using digital representations of the river / lake network and new approaches to storing this data can enable rapid analysis of data up and downstream.
Exploitation Route Machine learning QA tools could be used in any monitoring project. They are generic and can be automated in order to build the best models for assessing and infilling a given variable. This could lead to significant improvements in the quality and completeness of environmental datasets.This could be undertaken by any monitoring group, or research data centre.

To do this would require access to wider networks of environmental data. This can be achieved through the use of sensor data standards. The standard developed within this project could be applied to integrate sensor data streams from a number of networks to provide an effective network of networks.

Use of river networks to link freshwater monitoring and inputs could lead to more effective exploitation of data and application of statistical and machines learning models. An approach such as the US Internet of Water would enable this to be undertaken on a consistent basis, shared with government, research, and commercial organisations to improve understanding of the status of freshwaters.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment

Description Natural Capital Ecosystem Assessment
Amount £30,000 (GBP)
Organisation Department For Environment, Food And Rural Affairs (DEFRA) 
Sector Public
Country United Kingdom
Start 12/2021 
End 03/2022
Description Natural Capital Ecosystem Assessment pilot
Amount £80,000 (GBP)
Organisation Department For Environment, Food And Rural Affairs (DEFRA) 
Sector Public
Country United Kingdom
Start 09/2020 
End 03/2021
Title Automated machine-learning based algorithms for infilling of sensor network data 
Description This is a suite of code that runs within the operational COSMOS-UK data management system, that improves the quality of the data available to end users by automatically infilling missing periods of data using an appropriate algorithm. A number of models exist for each sensor time series (i.e. each variable at each site) which use either linear interpolation, simple statistical infilling, or a machine learning model (using XGBoost). The statistical and machine learning models use inputs of other variables at the same site or variables from other sites. A number of models are pre-trained for each site with a different set of input features, to give robustness in case any inputs are missing, Each model, and the interpolation methods, have a set of uncertainties pre-defined from the model training / testing. The system automatically chooses the methods with the lowest uncertainty. Infilling method used is identified for every infilled value and the uncertainties also stored within the data system providing a full audit trail, and allowing users to select which infilled data (or none) to use. 
Type Of Material Computer model/algorithm 
Year Produced 2021 
Provided To Others? No  
Impact The COSMOS-UK dataset now has more complete time series, meaning it is simpler and quicker for other researchers to use in, e.g. modelling applications, without having to assess quality control and undertake their own infilling. The infilled data is already in use, e.g. for running the JULES model at COSMOS-UK sites. 
Title Daily and sub-daily hydrometeorological and soil data (2013-2019) [COSMOS-UK] 
Description This dataset contains daily and sub-daily hydrometeorological and soil observations from COSMOS-UK (cosmic-ray soil moisture) monitoring network from October 2013 to the end of 2019. These data are from 51 sites across the UK recording a range of hydrometeorological and soil variables. Each site in the network records the following hydrometeorological and soil data at 30 minute resolution: Radiation (short wave, long wave and net), precipitation, atmospheric pressure, air temperature, wind speed and direction, humidity, soil heat flux, and soil temperature and volumetric water content (VWC), measured by point senors at various depths. Each site hosts a cosmic-ray sensing probe; a novel sensor technology which counts fast neutrons in the surrounding atmosphere. In combination with the recorded hydrometeorological data, neutron counts are used to derive VWC over a field scale (COSMOS VWC), at two temporal resolutions (hourly and daily). The presence of snow leads to erroneously high measurements of COSMOS VWC due to all the extra water in the surrounding area. Included in the daily data are indications of snow days, on which, the COSMOS VWC are adjusted and the snow water equivalent (SWE) is given. The potential evapotranspiration (PE), derived from recorded hydrometeorological and soil are also included at daily resolution. Two levels of quality control are carried out, firstly data is run through a series of automated checks, such as range tests and spike tests, and then all data is manually inspected each week where any other faults are picked up, including sensor faults or connection issues. Quality control flags are provided for all recorded (30 minute) data, indicating the reason for any missing data. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact The dataset is widely used as the largest network of field scale soil moisture measurements in the UK by researchers interested in soil moisture, comparison with satellite data products, hydrological modelling, etc. The ENTRAIN project improved the dataset through addition of new Snow Water Equivalent measurements. 
Title Greenness index from phenocams for 50 COSMOS-UK sites 
Description Field greenness metrics (as well as individual RGB time series) from the images from phenocams established at 50 COSMOS-UK sites. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? No  
Impact The dataset is being used to understand the use of the method in phenological research applications. 
Description Digital Environment Expert Network 
Organisation Natural Environment Research Council
Country United Kingdom 
Sector Public 
PI Contribution Attending regular meetings of the NERC Digital Environment Expert Network, contributing to development of a horizon scanning activity and report, and other programme idea generation sessions.
Collaborator Contribution Travel and subsistence budgets were available but remained unused as all meetings were online. Communications and coordinating support was provided by the Digital Environment Programme.
Impact None as yet
Start Year 2020
Description UKCEH - Alan Turing Institute environment and sustainability project 
Organisation Alan Turing Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution Matt Fry (PI) has a part-time secondment to the Turing Institute to further collaborations around environment and sustainability, PGR data scientist is employed full time for 3 months on this project, furthering machine learning applications.
Collaborator Contribution Turing Institute have funded the secondment and PGR work, and provided networking opportunities to develop new data science applications.
Impact None as yet.
Start Year 2022
Description Presentation to Catchment Based Approach Catchment Data Evidence Forum 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Third sector organisations
Results and Impact Presentation and Q&A on new approaches to access to data for UK freshwaters for organisations involved in local catchment partnerships within the Catchment Based Approach (CaBA), as part of the CaBA Catchment Data and Evidence Forum. Title was "Webinar: Making catchment data more accessible to end users".. The workshop was held remotely and was recorded, with the presentations held on YouTube.
Year(s) Of Engagement Activity 2020