Engineering Transformation for the Integration of Sensor Networks: A Feasibility Study - 'ENTRAIN'

Lead Research Organisation: University of Lincoln
Department Name: School of Computer Science

Abstract

There is a need to make use of new digital data analysis techniques to improve our understanding of the environment. Data from a new generation of environmental sensors, combined with analyses based on Artificial Intelligence, has the potential to help us understand from human influences and long-term change are affecting the environment around us. Artificial Intelligence approaches enable computers to identify trends and relationships across different streams of data, often picking out patterns that would be too difficult or time-consuming for humans to identify manually.

To realise these benefits, data from diverse sensor networks must combined and analysed together. Currently many sensor networks are operated individually, and data are not readily combined due to differences in the way measurements are made (e.g. between weekly river samples and sub-second measurements of gases in the atmosphere). In addition, to combine these data in an automatic way without human intervention requires much finer and more consistent descriptions of the contents of data streams, so that machines can understand the content sufficiently. Links between sensors in space are also important, and machines will need an understanding of these links, not just in the sense of coordinates, but for example how sensors are linked along rivers. We can construct a digital representation of rivers in order to enable this.
We will describe the various elements of a future environmental analysis system that will be required in order to achieve these benefits, and addressing some of these currently missing components. We will look at technologies, from databases to data transfer mechanisms, to understand how a system could be built.

We will use data from 3 NERC sensor networks measuring environmental variables from the atmosphere to river water quality, and show how this data can be automatically integrated in such a way that machines would be able to analyse it automatically.
A significant issue when monitoring with high-resolution sensors is how to handle problems in the data, which could include missing data, and erroneous values due to sensor failure. There is too much data for humans to manually view and check, and so automated approaches are needed. Currently these are often simple checks of individual data values against expected ranges, but again there are opportunities for artificial intelligence to improve this. AI approaches can look across multiple sensors, identify relationships, and find subtle changes in data signals, and this can be used to both identify data problems and to fix them through infilling. We will enhance the 3 NERC networks by testing and applying such approaches to data quality control.

We will investigate some fundamental limitations of high-resolution monitoring, the transfer of large amounts of data from the field site to the data centre, the security of such systems, and whether more processing could be done on the instruments themselves to reduce data transfer volumes.

We will meet with the public, with policy-makers, with industry and with researchers to discuss where there will be most to be gained from development of AI approaches to analysing environmental sensor data. We will develop ideas for future work to realise these gains, and will promote the benefits of an integrated system for environmental monitoring. These stakeholders are likely to include the Environment Agency, SEPA, Natural Resources Wales, Defra, Water companies, sensor network developers, and public organisations with an interest in the environment, including the National Trust, the Rivers Trusts, and local community groups.

Planned Impact

The Digital Environment programme will benefit from ENTRAIN's foundation work, providing requirements, methods, best practice advice and recommendations for integration and data modelling across multiple sensor networks and other datasets, such as EO. This information and techniques will benefit other areas of science and industry, as well as public engagement.

Environmental practitioners, regulators, government, consultants, the water industry, agribusiness, insurance, and many others can benefit substantially from the joined-up evidence that ENTRAIN will start to generate.

The public, schools and colleges will benefit from access to meaningful data in a spatially aware context. There is strong public interest in environmental issues, and yet beyond weather forecasts, weather data and perhaps more recently air quality readings, there are few accessible data which have tangible meaning to the layperson.

The Earth Observation (EO) community will benefit from better access to connected in situ data for retrieval algorithm validation & development. E.g. flooding extent, soil moisture and land cover products etc. and from new automated Phenocam greenness products output by ENTRAIN. Other Big Data projects (e.g. Data Labs) will benefit from a greater and easier connection to datasets, with common spatio-temporal linking requirements and proper metadata description already done.

Data modellers, and environmental informatics will benefit from improved sensor metadata schemes, sensor registers/catalogues, and data structuring for interoperability of a network of networks. We will disseminate the results of the ENTRAIN feasibility study also through an environmental informatics paper, describing the proposed methods and advances made in data modelling and data provenance, to ensure wide impact and uptake of these methods.

Observational scientists and regulatory observers will benefit from our website case studies, webinars, training workshops and online video tutorials to disseminate best practice and training in realtime data collection, cyber security, data vocabularies, metadata schemes and new deep learning QC techniques. These studies will benefit the global environmental, and wider, data communities such as the Committee on Data for Science and Technology (CODATA). Harmonisation of data networks to increase or facilitate interoperability, and production of spatio-temporally connected observations across environmental domains (e.g. Digital Rivers), will benefit the British Geological Survey (BGS), CEH, UK MetOffice, the Environment Agency (EA), the Scottish Environmental Protection Agency (SEPA), Defra, water companies (and other utilities such as the power grid), argi-business, insurers, and public health. Other NERC and EPSRC funded programmes (e.g. ASSIST, Natural Flood Management, HydroJULES, Internet of Food Things) and Defra Air Quality Monitoring Networks will all benefit through higher data quality, more complete data and efficiency gains in analysing data across sensor networks, and by using developed data structures in other environmental monitoring and food chain domains. The spatially connected integrated data visualisations will be demonstrated at both academic and industry events and conferences, where practitioners can benefit from e.g. improved water quality alerts (e.g. algal blooms, nutrient levels etc.), and crucially see the connected drivers of those trends, and get decision support information. Similarly, this has the potential to provide interconnected, cross-discipline, environmental management information that can give government the evidence chain to implement, monitor and evaluate policy.

We will monitor and evaluate the success of our impacts by recording attendances at events, the number of website visits, and use Twitter to promote activities, whilst recording the number of followers and retweets. We will also log email enquiries, webinar and web video views.
 
Description New machine learning approaches and methods based on gradient boosted trees and neural networks were implemented and used for quality control and infilling of environmental sensor network time series data. The methods were developed, tested, and applied in practice to nationally important UK datasets in COSMOS-UK and the National River Flow Archive. This has produced data of higher quality, and more widely usable for research into how the land reacts to rainfall and weather inputs, and in practice for weather forecasting, flood and drought monitoring and prediction. It saves time for those undertaking manual quality control. A new approach to integration of CEH sensor data was developed, which will enable multiple data streams to be joined up so that it is easier for anyone to find and use environmental data. This work has been discussed with other organisations running sensor networks, including the Environment Agency, the Met Office, the British Geological Survey and National Oceanographic Centre, so that a common way forward for making sensor data available can be developed in future. New measurements of snow water equivalence have been implemented within the COSMOS-UK network, meaning real time snow data is available for 50 sites across the UK. River flow and water quality sites were linked to national scale digital river networks and the UK Lakes database, meaning data is more discoverable across rivers, and analysis at river monitoring sites can properly take account of upstream drivers and inputs.
Exploitation Route There is huge potential for an automated environmental sensor data store with in-built machine-learning quality control. This could significantly improve the quality of environmental datasets, particularly from research projects, and enable this data to be more readily accessible. There is potential for future collaboration around the production and management of vocabularies for description of environmental measurements (measured properties, units, instruments and their specifications, etc.). There is a need to link freshwater data via digital rivers, to enable increased opportunities for automated analysis of monitoring data to improve evidence and understanding.
Furthermore, techniques developed as part of this project were demonstrated in the Cosmos-UK network, but can be used across other sectors, such as agriculture and energy. At the University of Lincoln we focused on developing techniques to leverage time-series data; but time series data and missingness can be found in various domain, therefore proposed approaches can be tested and evaluated in other sectors as well.
Sectors Agriculture, Food and Drink,Energy,Environment

 
Title Infilled COSMOS-UK site rainfall data 
Description As part of machine learning QC / infilling research within the project, an interpolation method was applied to rainfall data at all COSMOS-UK sites. This produced an infilled dataset for COSMOS-UK sites, enabling improved modelling, in particular for modelling land-atmosphere water and energy fluxes. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact COSMOS-UK sites, including these infilled rainfall series, were used within the Hydro-JULES project for improvement of soil pedo-transfer functions at COSMOS-UK sites (and transferred more widely for these soil types). This will lead to improvements in the representation of soils and the ability of the JULES land surface model (itself run as part of the Met Office Unified Model) to represent soil water fluxes and land-atmosphere feedbacks. 
 
Title Machine Learning model for data imputation 
Description We have developed a new machine learning models for missing data imputation. It is relies on gradient boosted trees and neural networks to conduct a two step process. In the first stage, for each missing entry, a decision is made on whether rain was expected or not. In the second stage, we predict the value of rainfall that was expected for the specific time stamp. Along with using data from the COSMOS-UK network, we also leveraged data from weather stations that are located near the COSMOS-UK sites. The model developed demonstrated a very competitive performance, outperforming established methods. 
Type Of Material Computer model/algorithm 
Year Produced 2020 
Provided To Others? No  
Impact The underpinning research and model developed were used by MSc students as a baseline to further improve the model and has also been used in educational settings as part an MSc in AI course. 
 
Description Partnership with Environment Agency around improving analysis of water quality monitoring data 
Organisation Environment Agency
Country United Kingdom 
Sector Public 
PI Contribution We have engaged on a number of occasions with key staff in the Environment Agency leading a review of the monitoring networks (Sentinel EDM), initially focussing on water quality monitoring. We have met at CEH Wallingford to discuss our work on sensor data integration and analysis, furthered these discussions at workshops of the Constructing a Digital Environment programme, organised a workshop as part of the ENTRAIN project around AI for water quality attended by Defra / Environment Agency representatives (see Engagement Activities) and subsequently produced a proposal for the Constructing a Digital Environment programme demonstrators call (strongly supported by Defra / Environment Agency), and contributed to the Environment Agency Sentinel EDM pilot study on "strategic evidence linked to the occurrence & fate of phosphorous in river water", specifically providing information on how sensor data and other data sources could be better linked for analysis.
Collaborator Contribution Defra and Environment Agency staff provided input into the specific national scale needs for improved evidence regarding river water quality. This helped the ENTRAIN project to understand how the river water quality sensor network datasets being looked at within the project need to be made available, and which metadata is essential in discovering, identifying, understanding and exploiting the sensor data fully within analysis for evidence. They also provided information into the scale of requirements, how they expect policy changes to drive data needs in future, and specific priorities currently.
Impact Contribution to Environment Agency review of water quality monitoring network
Start Year 2019
 
Description Presentation at the Constructing a Digital Environment workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Add description later
Year(s) Of Engagement Activity 2019
URL https://nerc.ukri.org/innovation/activities/environmentaldata/digitalenv/news/digital-workshop/
 
Description Workshop on opportunities for improving understanding of UK water quality using AI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact A small focussed workshop of 12 people, largely researchers and potential users representing Defra and Environment Agency discussed requirements for improved understanding of freshwater quality, specific issues needing more evidence / analysis, datasets available to inform this, and machine learning / AI approaches to address these issues. Ideas were taken forward both for future funding application, supported by Defra and EA, and also within Environment Agency internal review of water quality monitoring data.
Year(s) Of Engagement Activity 2019