Data CAMPP (Innovative Training in Data Capture, Analysis and Management for Plant Phenotyping)

Lead Research Organisation: University of Nottingham
Department Name: Sch of Biosciences


Artificial Intelligence (AI) is revolutionising agriculture and agronomy. As an example, John Deere is a near 200-year-old agriculture company which has recently transformed its business, capitalizing on automation and AI [1]. So great is its capability to collect and manage huge quantities of data that the firm now considers itself a software company [2]. The ability to use sensors for collecting data in the field, glasshouse and/or polytunnel, and to act on that data via automated analysis, shows huge potential. However, taking advantage of these capabilities requires technical prowess that is currently lacking in the majority of UK bioscientists. The widespread ability to use and, indeed, develop AI systems exhibiting these functionalities, deployed for practical use in day-to-day bioscience settings, is sadly absent from both academia and industry.

Yet there is a compelling imperative nationally to provide bioscientists with the skills that enable them to realise this exciting potential. In last year's UK AI Sector Deal, agriculture and life sciences was identified as a key investment area where AI can boost productivity in the UK economy. But without access to a knowledgeable and skilled workforce, this initiative is doomed to fail; and without access to appropriate training, bioscientists will be unable to lead the global agriculture and life science revolution toward new AI-driven solutions.

Images are ubiquitous in the biosciences and are a key source of objective, quantitative data. Recent developments in AI-combined with robot-assisted image and other data capture, as well as the availability of small-footprint, relatively low-cost computing devices enable high-throughput acquisition and analysis of data in real-world settings, beyond academic research labs. While the technical facilities exist, the practical knowledge to design and implement them is also required. This is particularly relevant for bioscientists, who must answer key questions in order to select and implement effective solutions: How are AI-driven methods designed? How can they be adapted to new domains in the biosciences? How can we utilise them in our lab or field research? What consideration should be given to the resulting datasets? Without appropriate training and skills, bioscientists are ill-equipped to address these questions.

The Data CAMPP project, therefore, provides an innovative training course with flexible, hands-on learning opportunities spanning key aspects of an automated data gathering pipeline for the critical bioscience setting. "Data CAMPP" refers to the automated Capture, Analysis and Management of data. The course will deliver units covering fundamental and advanced aspects of image analysis, machine learning and data handling applied to Plant Phenotyping. Training units are accompanied by downloadable software tools, exercises and datasets, and novel "lab-by-post" project kits (physical hardware and plants) to enable hands-on learning experiences via remote participation. The course will also offer complementary in-person activities. This unique mode of mixed delivery promotes accessibility for a broad cohort, to support participants from a range of education backgrounds and skill sets, at diverse career stages, and with varied personal constraints that might limit travel and/or regular daytime attendance.

The overarching goal of Data CAMPP is to create a unique and timely learning experience for the bioscience community, covering topics from development and placement of robotics in the field, through to management of phenotyping image sets, and good experimental practices for, and ethics of, machine learning. Data CAMPP will prepare today's bioscientists to lead tomorrow's AI-driven innovations.


Technical Summary

Data CAMPP provides an innovative training course with 12 learning units, as below. Each is led by two investigators. Delivery is either: online (OL), blended (BL) or face2face (F2F). Learning is enhanced by: software exercises (SW), hardware labs (HW) or group discussion (GD). HW labs can be sent as lab-by-post kits for remote participants.

Plant Phenotyping
- Intro to Plant Phenotyping Technologies (RG/JA; OL, SW): Current tools and methods that facilitate collecting data about plants.
- Affordable Phenotyping (RG/TP; OL, GD): Data collection in Lower-Middle Income Countries, e.g. Cassava phenotyping using low-cost sensors and ML.
- Case Studies Workshop (TP/SP; F2F, GD): Discussion around attendees' workplace challenges.

Image Analysis and Machine Learning
- Intro to Image Analysis (TP/AF; OL, SW): Computer vision techniques used in the biosciences.
- ML for Image Data (AF/SP; BL, SW): Machine learning methods for bioscience, identification and ethical use of appropriate approaches.
- Deep Learning Internals (MP/TP; OL, SW): Deep Learning is no longer a black box, understanding components and development of new approaches.
- Experiment Design for ML (MP/ES; OL, SW): Data requirements, strategies for training and testing algorithms.

Data Capture and Management
- Intro to Robotics for Bioscientists (ES/SP; BL, HW): Robotic components and control.
- Coding for Robotics & Data Capture (SP/AF; OL, SW): Industry-standard programming languages and tools (e.g. Python, Labview, Matlab).
- Data Management (ES/SP; OL, SW): Formats and standards, storage and backup, software tools, hardware terms, sharing mechanisms, licensing, GDPR.
- Data Capture in the Lab (DW/JA; BL, HW): Sensors for gathering data in controlled environments, lab-by-post with low-cost and 3D printed components.
- Data Capture in the Field (ES/RG; F2F, HW): Mechanisms for intelligent data collection in uncontrolled environments, field robotics, data transfer, reliability, robustness.


10 25 50