ISCF WAVE 1 AGRI TECH Agronomic Big Data Analytics for improved crop management

Lead Research Organisation: University of Nottingham
Department Name: Sch of Biosciences

Abstract

Agricultural systems are complex, and must be managed if we are to achieve food security and maintain environmental quality. The management of complex systems in industry and commerce is being improved by the collection, processing and analysis of "big data" sets. For some years farmers have had the potential to collect big data sets on their crops and soils using GPS-driven monitors on the combine or tractor, data from satellite-borne sensors and the direct sampling and analysis of soils. This raises the question of whether agriculture can enter the big data era in order to solve management problems more quickly and robustly than through the conventional approach of field trials at a limited number of experimental sites. We contend that this is possible, but only by using methods to analyse the data that are biologically meaningful rather than by blindly mining data for correlations. This is a feasibility study to test two tailored big-data analytical methods on a large data set on arable fields from across the U.K.

Two general approaches will be used, both of which have already been developed and published in the peer-reviewed literature, and used as research tools. The first is called boundary line analysis, a method to identify the maximum yield that a crop can achieve as a function of some soil or crop property that represents a factor (nutrient supply, canopy development) that may limit the potential yield. Boundary line analysis requires big data sets, but has the potential to give greater biological insight into the crop system, and to facilitate management decisions to remove limiting effects, than the relatively crude tools that are used in much data mining.

The second approach is focussed on the analysis of yield maps produced by yield monitors on combine harvesters equipped with GPS. These maps show complex patterns of spatial variation, which are often hard to interpret usefully. When maps for two or more seasons are overlaid, the variability is even more complex. In past research we have shown that a pattern-recognition method called k-means clustering can be used to subdivide a field into regions within which the season-to-season fluctuations in yield are more or less uniform. One region may show consistent high yields, and another consistent low yields, while others fluctuate between seasons. Such regions are likely to represent parts of the field where the crop is subject to similar limitations. For example, where the soil available water content is relatively small yields may drop in drier years. A region with an emerging nutrient deficiency may show a steady decline in yield over a series of seasons. By relating the regionalization of the field, and each regions characteristic yield variations over time, to soil and other environmental information, we can hope to identify the key limiting factors at subfield scales, and by doing these analyses on big data sets, farm and regional scale patterns should also emerge.

Within this project we shall show how a big agronomic data set can be most effectively analysed to allow the agronomy company which holds it best to advise their customers and obtain maximum value from the data that they collect. This will help to support improved management at farm scale, possibly including the use of precision agriculture methods to respond to within-field variation.

Technical Summary

Large data sets on crop yield and soil properties (as opposed to experimental data) rarely yield simple answers to conventional statistical analysis. It is unusual, for example, to find a strong relationship between crop yield and a soil property that can be expressed by a regression model . Similarly, the spatial variations of yield in a single field over successive seasons can be markedly different with small correlations between yield in any two seasons. This means that one cannot, in general, use yield maps to segment fields into zones with consistent yield performance. For this reason we propose that, to be agronomically informative, the analysis of big data sets requires hypothesis driven models. We will propose and test two such models, and then work with our commercial partner to integrate them into a data interpretation service.

First, we shall use boundary line analysis as a method to model limiting effects of environmental factors on crop yield. This was first proposed by Webb (1972), who suggested that the limiting response of a biological system, such as yield, (y) to an environmental variable, such as a nutrient concentration (x) is seen in the upper boundary of a scatter plot of y against x. A statistical formulation of this model, which can be fitted and tested by rigorous methods, has only recently become available through research by the PI and colleagues. It will be applied in the current project to examine potentially limiting effects on yield of a range of physical and chemical soil properties.

We shall address the problem of the complex spatio-temporal variation of crop yield using k-means cluster analysis of yield map sequences. The method identifies subregions of a field within which crop yields, over seasons, are more internally uniform than within the field/farm as a whole. We shall test the hypotheses that such subregions express underlying soil variation at farm scale in a way consistent with boundary line models.

Planned Impact

This project will demonstrate the potential for focussed statistical analysis of Big Data sets from the agricultural sector using analytical models which embody clear agronomic concepts, and which will allow the identification of limiting factors on crop production at within-field to regional scales. This has the potential to improve the management of farm land, whether by identifying regional trends in limiting factors, reflecting climatic and geological constraints, or factors that apply to just part of a single field (e.g. where a potentially high-yielding subregion has developed a nutrient deficiency due to sustained high offtake-rates for nutrients in crop products and residues). As such, the methodology has the potential to support the implementation of regional strategies for advisors and agribusinesses at one spatial scale, and the use of precision farming technology for improved profitability and reduced environmental impact at another.

In order to have this impact the methodology must be applicable to a substantial volume of data representing a significant proportion of agricultural land, integrated with existing workflows for data collection and advice and support to growers. This must be achieved in a sustainable commercial framework. Because this is a catalyst project there is enormous potential to achieve this through the enhanced opportunity for the commercial partner AgSpace to offer big-data based services to its customers. There is an existing commercial relationship between AgSpace and large agri-business like BASF, Syngenta, Farmcare, IPF and Agrii. This means that the project will have an immediate reach, via these land managers, advisors and commercial organizations that directly manage or influence the management of a significant proportion of agricultural land in the UK and further afield.

These immediate impacts would be seen in land management. In addition, there is also considerable potential value of these big data approaches to agricultural research. The agricultural research market that this project wishes to access is very large in terms of financial value. There are 2 key sectors we wish to explore; firstly, the commercial sector is increasing R & D budgets annually to stay ahead of competition. For example, Syngenta invested over $1.4 billion in 2014, BASF crop production invest £215 million annually and Agrii invest over £1m annually into agricultural research and development (R&D), these are a few examples of existing AgSpace customers that offer a clear route to market. Second, there is considerable potential value of these methods to publically-funded agricultural research. One particular opportunity in the public sector is the imminent revision of Defra's RB209 recommendations on soil nutrient and pH management.

Publications

10 25 50
 
Description In this project we have re-evaluated the guidelines which farmers in the UK use to decide whether their soils have sufficient reserves of key nutrients - potassium (K), phosphorus (P) and magnesium (Mg) - or whether these require "top-up" with fertilizers. If farmers get this wrong then they may lose financially, either through applying unnecessary fertilizer (which also has environmental impact) or losing yield if the nutrients are deficient (which also has environmental impact through reduced efficiency of the use of nitrogen by the crop). Key questions we wanted to address are whether the guidelines used are robust, and whether they hold at within-field scales, allowing farmers to make more efficient use of fertilizer by varying application rates across the field with GPS-driven technology (precision agriculture). The commercial partner in this project, AgSpace UK, have large data sets on yield and soil nutrient analyses from zones within their customer's fields across much of Britain. These can be used for research under licence terms. Such "big data" sets have considerable potential to improve management of systems such as soils and crops, but our view is that this requires "hypothesis-driven" science, not the blind application of algorithms. We used a "boundary line model", developed as a robust statistical tool by the University of Nottingham PI, to fit models to data which represent the "limiting" response of wheat yield to soil nutrients (i.e. when other factors are not limiting). These showed that the field-scale recommendations for P and K status in soil are robust, and hold at sub-field scales. However, (i) there appears to be potential to improve the efficiency with which P is managed in soils by responding to variations in soil depth and acidity and (ii) there is also evidence that guidelines on soil Mg status are too low.

Note that the work on cluster analysis of yield map systems, explained in the lay summary, is reported separately by NERC (British Geological Survey) under their work on this overall project.
Exploitation Route Our findings are to be the basis of revised recommendations to farmers on fertilizer management to be made by our commercial partner (AgSpace UK) and the development of the new basis for advice by AgSpace is under way. However, as the findings are shortly to be published, they could be used by other companies in the same sector.

In addition to this commercial route to impact, through the revision and updating of commercial advisory systems, our findings will feed in to current discussion on the revision of the official fertilizer recommendation system (known as "RB209") which is managed by the Agriculture and Horticulture Development board. One project partner is directly involved in those discussions, and the project findings, particularly on soil P status, will be of immediate relevance to them.
Sectors Agriculture, Food and Drink,Environment

 
Description The commercial partner, AgSpace UK, is using the findings from this research directly in the revision and refinement of their recommendations to customers on nutrient management. The findings will underpin an advanced level recommendation system available to their customers. This will allow farmers to manage the use of phosphate fertilizers with increased confidence that crop performance is not limited by P deficiency, while avoiding excess application of the fertilizer which can have negative environmental impacts. In certain conditions this improved efficiency could entail variable rate application within a field. The methodology developed in this paper is now being further developed by a PhD student at the University of Nottingham, in collaboration with Rothamsted Research, Wageningen University and several centres of the CGIAR (Consultative Group for International Agricultural Research). This is to examine the process of yield gap analysis (YGA), and how formal statistical models of the boundary line can improve the reproducibility, rigour and operational value of YGA.
First Year Of Impact 2021
Sector Agriculture, Food and Drink
Impact Types Economic

 
Description Changes to commercial practice on fertilizer recommendations: research findings influence recommendations in the Contour system
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
Impact The findings of this project have shown how phosphorous (P) requirements of wheat crops may vary with soil properties. Excess P applications can have harmful environmental impacts, but on soils where P is not readily available to the crop insufficient P supply can severely limit yields. The Contour system operated by AgSpace Ltd within Origin Enterprises Ltd provides a framework for growers to implement P recommendations. Revisions to this system delivered in 2020 took into account findings from this project, as described in the linked URL. This will enable farmers to maintain crop yields while reducing environmental impacts of P fertilizers by targetting application rates better to crop requirements.
URL https://ag-space.com/using-big-data-to-improve-phosphate-fertiliser-applications/
 
Title R code for boundary line analysis 
Description R code has been written to estimate parameters of boundary line models by maximum lilkelihood and to compare them against alternatives on the basis of the Akaike weights. In addition bootstrapping methods have been developed to quantify the uncertainty in practically-relevant estimates of model parameters (i.e. inflexion points in response functions for crop yields and soil nutrients). 
Type Of Technology Software 
Year Produced 2018 
Impact As described in another section, this code has enabled us to evaluate evidence for boundary responses, and so for their use to define index levels which diagnose potential nutrient deficiency in soils. In particular the assessment of uncertainty in these index values helps the agronomist to make robust recommendations