HOW MEGA-DIVERSE ARE ASIAN RAINFORESTS? DEVELOPMENT OF INNOVATIVE AI MODELS TO UNDERSTAND TROPICAL PLANT BIODIVERSITY

Lead Research Organisation: Loughborough University
Department Name: School of Science

Abstract

The United Nations Sustainable Development Goal 15 'Life on Land' (UN, 2022) strives to protect terrestrial ecosystems and halt biodiversity loss. Documenting the world's plant diversity through taxonomic publications is key to its conservation: without understanding which species are present, it is not possible to protect them or to evaluate their potential significance (Cheek et al., 2020). In the worldwide network of over 3000 herbaria, millions of herbarium specimens are preserved records of plant diversity (Heberling et al., 2019). It is these specimens, already in herbaria, that represent >50% of yet-to-be described species (Bebber et al. 2010). However, the accurate identification of these specimens is a time-consuming process, due to the large volume of specimens, challenges in taxonomically difficult groups and a decreasing number of experts. Increasing the speed that taxonomic outputs are produced is key to acting against the threats to the world's habitats.
By developing artificial intelligence methods to automatically identify specimens, this project aims to accelerate taxonomic efforts in the genus Cyrtandra (African violet family), a mega-diverse, poorly-known group, common in Southeast Asian rain forests (Atkins et al., 2021). More specifically, it will develop a hierarchical framework that consists of a cascade network to address classification at different levels and a meta-learning strategy to solve extreme challenges where only one sample is available. The cascade networks can make use of the knowledge of species, genus, family and other higher taxonomic levels to improve identification accuracy. Figure 1 shows an example of a herbarium specimen to be classified. Specimen data at the species level is often imbalanced, i.e one class label might just have one observation and the other might have a very high number of observations. Directly training deep learning models on such kinds of datasets will result in overfitting. To overcome this challenge, the meta-learning strategy (Snell et al. 2017) will be explored to improve the accuracy of species-level recognition.
This project has the following specific objectives:
1. Develop a cascade multi-label deep architecture which can take the prior knowledge of a given class hierarchy and key information of herbaria into account when performing identification at different taxon levels.
2. Explore effective meta-learning and fine-tuning methods to improve the identification performance on taxonomically unbalanced datasets
3. Build an easy-to-use software system to classify herbarium images into different labels with confidence values and speed up the discovery of new species.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
NE/S007350/1 01/10/2019 30/09/2027
2887670 Studentship NE/S007350/1 01/10/2023 31/03/2027 Yuyue Guo