Engineering Transformation for the Integration of Sensor Networks: A Feasibility Study - 'ENTRAIN'

Lead Research Organisation: NERC CEH (Up to 30.11.2019)
Department Name: Water Resources (Wallingford)

Abstract

There is a need to make use of new digital data analysis techniques to improve our understanding of the environment. Data from a new generation of environmental sensors, combined with analyses based on Artificial Intelligence, has the potential to help us understand from human influences and long-term change are affecting the environment around us. Artificial Intelligence approaches enable computers to identify trends and relationships across different streams of data, often picking out patterns that would be too difficult or time-consuming for humans to identify manually.

To realise these benefits, data from diverse sensor networks must combined and analysed together. Currently many sensor networks are operated individually, and data are not readily combined due to differences in the way measurements are made (e.g. between weekly river samples and sub-second measurements of gases in the atmosphere). In addition, to combine these data in an automatic way without human intervention requires much finer and more consistent descriptions of the contents of data streams, so that machines can understand the content sufficiently. Links between sensors in space are also important, and machines will need an understanding of these links, not just in the sense of coordinates, but for example how sensors are linked along rivers. We can construct a digital representation of rivers in order to enable this.
We will describe the various elements of a future environmental analysis system that will be required in order to achieve these benefits, and addressing some of these currently missing components. We will look at technologies, from databases to data transfer mechanisms, to understand how a system could be built.

We will use data from 3 NERC sensor networks measuring environmental variables from the atmosphere to river water quality, and show how this data can be automatically integrated in such a way that machines would be able to analyse it automatically.
A significant issue when monitoring with high-resolution sensors is how to handle problems in the data, which could include missing data, and erroneous values due to sensor failure. There is too much data for humans to manually view and check, and so automated approaches are needed. Currently these are often simple checks of individual data values against expected ranges, but again there are opportunities for artificial intelligence to improve this. AI approaches can look across multiple sensors, identify relationships, and find subtle changes in data signals, and this can be used to both identify data problems and to fix them through infilling. We will enhance the 3 NERC networks by testing and applying such approaches to data quality control.

We will investigate some fundamental limitations of high-resolution monitoring, the transfer of large amounts of data from the field site to the data centre, the security of such systems, and whether more processing could be done on the instruments themselves to reduce data transfer volumes.

We will meet with the public, with policy-makers, with industry and with researchers to discuss where there will be most to be gained from development of AI approaches to analysing environmental sensor data. We will develop ideas for future work to realise these gains, and will promote the benefits of an integrated system for environmental monitoring. These stakeholders are likely to include the Environment Agency, SEPA, Natural Resources Wales, Defra, Water companies, sensor network developers, and public organisations with an interest in the environment, including the National Trust, the Rivers Trusts, and local community groups.

Planned Impact

The Digital Environment programme will benefit from ENTRAIN's foundation work, providing requirements, methods, best practice advice and recommendations for integration and data modelling across multiple sensor networks and other datasets, such as EO. This information and techniques will benefit other areas of science and industry, as well as public engagement.

Environmental practitioners, regulators, government, consultants, the water industry, agribusiness, insurance, and many others can benefit substantially from the joined-up evidence that ENTRAIN will start to generate.

The public, schools and colleges will benefit from access to meaningful data in a spatially aware context. There is strong public interest in environmental issues, and yet beyond weather forecasts, weather data and perhaps more recently air quality readings, there are few accessible data which have tangible meaning to the layperson.

The Earth Observation (EO) community will benefit from better access to connected in situ data for retrieval algorithm validation & development. E.g. flooding extent, soil moisture and land cover products etc. and from new automated Phenocam greenness products output by ENTRAIN. Other Big Data projects (e.g. Data Labs) will benefit from a greater and easier connection to datasets, with common spatio-temporal linking requirements and proper metadata description already done.

Data modellers, and environmental informatics will benefit from improved sensor metadata schemes, sensor registers/catalogues, and data structuring for interoperability of a network of networks. We will disseminate the results of the ENTRAIN feasibility study also through an environmental informatics paper, describing the proposed methods and advances made in data modelling and data provenance, to ensure wide impact and uptake of these methods.

Observational scientists and regulatory observers will benefit from our website case studies, webinars, training workshops and online video tutorials to disseminate best practice and training in realtime data collection, cyber security, data vocabularies, metadata schemes and new deep learning QC techniques. These studies will benefit the global environmental, and wider, data communities such as the Committee on Data for Science and Technology (CODATA). Harmonisation of data networks to increase or facilitate interoperability, and production of spatio-temporally connected observations across environmental domains (e.g. Digital Rivers), will benefit the British Geological Survey (BGS), CEH, UK MetOffice, the Environment Agency (EA), the Scottish Environmental Protection Agency (SEPA), Defra, water companies (and other utilities such as the power grid), argi-business, insurers, and public health. Other NERC and EPSRC funded programmes (e.g. ASSIST, Natural Flood Management, HydroJULES, Internet of Food Things) and Defra Air Quality Monitoring Networks will all benefit through higher data quality, more complete data and efficiency gains in analysing data across sensor networks, and by using developed data structures in other environmental monitoring and food chain domains. The spatially connected integrated data visualisations will be demonstrated at both academic and industry events and conferences, where practitioners can benefit from e.g. improved water quality alerts (e.g. algal blooms, nutrient levels etc.), and crucially see the connected drivers of those trends, and get decision support information. Similarly, this has the potential to provide interconnected, cross-discipline, environmental management information that can give government the evidence chain to implement, monitor and evaluate policy.

We will monitor and evaluate the success of our impacts by recording attendances at events, the number of website visits, and use Twitter to promote activities, whilst recording the number of followers and retweets. We will also log email enquiries, webinar and web video views.

Publications

10 25 50
 
Description New methods for using machine learning for quality control and infilling of environmental sensor network time series data were developed, tested, and applied in pratice to nationally important UK datasets in COSMOS-UK and the National River Flow Archive. This has produced data of higher quality, and more widely usable for research into how the land reacts to rainfall and weather inputs, and in practice for weather forecasting, flood and drought monitoring and prediction. It saves time for those undertaking manual quality control.
A new approach to integration of CEH sensor data was developed, which will enable multiple data streams to be joined up so that it is easier for anyone to find and use environmental data. This work has been discussed with other organisations running sensor networks, including the Environment Agency, the Met Office, the British Geological Survey and National Oceanographic Centre, so that a common way forward for making sensor data available can be developed in future.
New measurements of snow water equivalence have been implemented within the COSMOS-UK network, meaning real time snow data is available for 50 sites across the UK.
River flow and water quality sites were linked to national scale digital river networks and the UK Lakes database, meaning data is more discoverable across rivers, and analysis at river monitoring sites can properly take account of upstream drivers and inputs.
Exploitation Route There is huge potential for an automated environmental sensor data store with in-built machine-learning quality control. This could significantly improve the quality of environmental datasets, particularly from research projects, and enable this data to be more readily accessible.
There is potential for future collaboration around the production and management of vocabularies for description of environmental measurements (measured properties, units, instruments and their specifications, etc.).
There is a need to link freshwater data via digital rivers, to enable increased opportunities for automated analysis of monitoring data to improve evidence and understanding.
Sectors Environment

 
Description Input to Environment Agency Sentinel EDM review pilot joining up strategic evidence linked to the occurrence & fate of phosphorous in river water
Geographic Reach National 
Policy Influence Type Contribution to a national consultation/review
 
Title Dataset of water quality monitoring sites linked to digital river network and UK Lakes database 
Description All river and lake water quality monitoring sites from the Environment Agency Water Quality Archive were processed and linked to a digital 1:50k river network, with lake sites linked to lakes within the UK Lakes Database. Sites were matched based on proximity and river name, with automated approaches to resolving issues due to locations, and uncertainty metrics in the matching maintained to enable future filtering. This enables sites to be linked to upstream and downstream drivers and other monitoring datasets (e.g. river flow sites to enable load calculations). This work is essential to the integration of sensor data for rivers. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact The UK Lakes data portal now allows real time water quality monitoring time series data to be accessed from the Environment Agency API and visualised on the lake pages, e.g. https://eip.ceh.ac.uk/apps/lakes/detail.html#wbid=28847. This enhancement to the portal has been discussed with users of the UK Lakes portal (Environment Agency, Natural England, Rivers Trusts, etc.) and promoted at subsequent events. 
URL https://eip.ceh.ac.uk/apps/lakes
 
Title Infilled COSMOS-UK site rainfall data 
Description As part of machine learning QC / infilling research within the project, an interpolation method was applied to rainfall data at all COSMOS-UK sites. This produced an infilled dataset for COSMOS-UK sites, enabling improved modelling, in particular for modelling land-atmosphere water and energy fluxes. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact COSMOS-UK sites, including these infilled rainfall series, were used within the Hydro-JULES project for improvement of soil pedo-transfer functions at COSMOS-UK sites (and transferred more widely for these soil types). This will lead to improvements in the representation of soils and the ability of the JULES land surface model (itself run as part of the Met Office Unified Model) to represent soil water fluxes and land-atmosphere feedbacks. 
 
Title Machine learning model for QC of river flow time series data 
Description Automated approach to create and apply machine learning models for quality control of river flow data. Code uses boosted regression tree approach to rapidly identify the best inputs, for each river flow site, from real time data streams from sensor networks of river flow and rainfall, optimising against length of record and number of inputs. Ensemble of machine learning neural network models developed using these inputs. Ensemble provides indication of "certainty" of prediction, and can be used to identify if a measured result may be incorrect, when outside of the prediction uncertainty. Method flags instances where measured data are likely to be incorrect, with a measure of the likelihood. 
Type Of Material Computer model/algorithm 
Year Produced 2020 
Provided To Others? No  
Impact Improved river flow data within the National River Flow Archive - openly available to the research community. Subsequent improvements to uses of the data including improved estimates of water resources / flood risk, and improved modelling of river flows. Data issues fed back to Environment Agency for improvement of national hydrometric datases. 
 
Title New snow water equivalent measurement method implemented within the COSMOS-UK network 
Description The COSMOS-UK network is comprised of ~50 sites measuring a full range of meteorological variables as well as soil moisture using a cosmic neutron monitor. This work implemented a new method for calculation of snow water equivalent from the COSMOS probe data, incorporating albedo information from radiometers to identify presence of snow. The ENTRAIN project enhanced the COSMOS-UK network by automating the application of this method, producing snow water equivalent data in real time, on a daily basis, within the core COSMOS database. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact Snow information from 50 sites is now available in real time. Wider impacts will occur in future as data is taken up by end users. 
URL https://cosmos.ceh.ac.uk/data
 
Title Tools for rapid analysis of river network data 
Description A digital 1:50k river network has been converted to a graph network, using the python networkX package. A package has been developed to enable simple use of this network for identifying upstream and downstream river stretches (and associated information such as river lengths / distances) for any given point and identification of upstream and downstream monitoring sites. The code is very fast compared to GIS techniques, and thus enables analysis for thousands of sites of interest. 
Type Of Material Data analysis technique 
Year Produced 2020 
Provided To Others? No  
Impact A dataset of monitoring data for key nutrients was extracted for sites upstream and downstream of lakes in England, using the dataset of linked water quality monitoring sites and the Environment Agency live water quality data API. This is being analysed within the NERC Hydroscape project. National River Flow Archive gauging stations were linked to all upstream and downstream stations, including in-river distances, thus enabling comparability of information across sites and targetting of machine learning quality control model inputs. This output is now maintained as part of the National River Flow Archive dataset. River water quality sites were linked to nearest river flow gauging stations in order to calculate water quality loads within the automated load apportionment modelling. 
 
Description Industry-Research collaboration around sensor metadata standards 
Organisation Epimorphics
Country United Kingdom 
Sector Private 
PI Contribution The collaboration was born of a subcontract within the project to Epimorphics, a company with extensive expertise in linked data, including in the area of UK environmental data. We provided detailed information regarding our use of sensors, how we collect and process data, how we store data and the requirements we have for managing information around sensor networks and sensor measurements (e.g. instruments used, processing used, etc.). A main focus was the wider need for integration of sensor metadata and therefore standardisation of data and metadata between networks. The ENTRAIN project subsequently contributed use cases to the Research Data Alliance iADOPT (Interoperable Descriptions of Observable Properties) working group, aiming to standardise the way that environmental measurements are described.
Collaborator Contribution Epimorphics joined a number of calls with other organisations working on research sensor data networks, including the British Geological Survey, National Oceanographic Centre, Met Office, and Marine Ireland. The aim of these was to develop some common threads of understanding around sensor data management, how others are managing large sensor networks and what their requirements are. This has resulted in a loose network of research sensor metadata operators, and ongoing discussions and collaborations around vocabularies, and data management approaches for sensor network data and metadata. The specific output of the Epimorphics work was a review of existing sensor metadata standards and development of a sensor metadata standard to meet CEH, and wider, requirements. A significant part of this was discussion around definition of complex observed properties, and how work by NOC (through a collaboration within the Research Data Alliance iADOPT Working Group) should be taken forward to improve representation of environmental measurements to increase interoperability.
Impact Sensor metadata review (see url).
Start Year 2019
 
Description Partnership with Environment Agency around improving analysis of water quality monitoring data 
Organisation Environment Agency
Country United Kingdom 
Sector Public 
PI Contribution We have engaged on a number of occasions with key staff in the Environment Agency leading a review of the monitoring networks (Sentinel EDM), initially focussing on water quality monitoring. We have met at CEH Wallingford to discuss our work on sensor data integration and analysis, furthered these discussions at workshops of the Constructing a Digital Environment programme, organised a workshop as part of the ENTRAIN project around AI for water quality attended by Defra / Environment Agency representatives (see Engagement Activities) and subsequently produced a proposal for the Constructing a Digital Environment programme demonstrators call (strongly supported by Defra / Environment Agency), and contributed to the Environment Agency Sentinel EDM pilot study on "strategic evidence linked to the occurrence & fate of phosphorous in river water", specifically providing information on how sensor data and other data sources could be better linked for analysis.
Collaborator Contribution Defra and Environment Agency staff provided input into the specific national scale needs for improved evidence regarding river water quality. This helped the ENTRAIN project to understand how the river water quality sensor network datasets being looked at within the project need to be made available, and which metadata is essential in discovering, identifying, understanding and exploiting the sensor data fully within analysis for evidence. They also provided information into the scale of requirements, how they expect policy changes to drive data needs in future, and specific priorities currently.
Impact Contribution to Environment Agency review of water quality monitoring network
Start Year 2019
 
Description Presentation and hosting of discussion session around sensor data APIs at UKSCAPE engagement event 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact 42 people attended a one-day meeting in Manchester to help shape the development of the UKSCAPE Data Science Framework. Attendees included University researchers, representatives of Government bodies, Government agencies and non-Governmental public bodies. One of four areas of focus and engagement within the day's agenda was API access to environmental data. Matt Fry presented work on a number of APIs including work within ENTRAIN on standardisation of network metadata to enable interoperability, and subsequently ran one of the workshop activities on the subject of APIs. The enabled discussion with stakeholders on the future of integration of environmental data through APIs and standards, and delivered understanding of user needs to CEH.
Year(s) Of Engagement Activity 2019
URL https://www.ceh.ac.uk/get-involved/events/shaping-development-ceh-uk-scape-data-science-framework
 
Description Presentation at the Constructing a Digital Environment workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Add description later
Year(s) Of Engagement Activity 2019
URL https://nerc.ukri.org/innovation/activities/environmentaldata/digitalenv/news/digital-workshop/
 
Description Workshop on opportunities for improving understanding of UK water quality using AI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact A small focussed workshop of 12 people, largely researchers and potential users representing Defra and Environment Agency discussed requirements for improved understanding of freshwater quality, specific issues needing more evidence / analysis, datasets available to inform this, and machine learning / AI approaches to address these issues. Ideas were taken forward both for future funding application, supported by Defra and EA, and also within Environment Agency internal review of water quality monitoring data.
Year(s) Of Engagement Activity 2019