Building a Bioinformatics Ecosystem for Agri-Ecologists

Lead Research Organisation: University College London
Department Name: Genetics Evolution and Environment

Abstract

The omics era heralds excitement for the biosciences. Omics approaches allow the phenotype to be interrogated at the genomic level - the genome, transcriptome, epigenome and proteome. Such insights are revolutionising all aspects of the biosciences, including agriculture and food production: for example, genes and their regulatory machinery can be identified, edited and engineered to select productive, disease-resistant phenotypes for farming and to control pest populations. But we cannot transform agri-food systems to be environmentally sustainable and productive enough to feed for our growing populations in a warming planet, without an understanding of the fundamental biology surrounding, supporting, and challenging agricultural systems; e.g. we need effective methods to monitor and detect pathogens, to assess the effects of intensive agriculture on the health of pollinators and natural biocontrol agents. The BBSRC recognises this and the critical role that agri-ecologists play in mitigating these challenges. Yet, this community is at risk of falling through the omics gap, and their potential curtailed due to mismatch in the productivity of agri-ecologists and the bioinformatics skills and resources they need to analyse their data. Our Project provides the resources needed to support this community, such that UK agri-ecologists are able to directly address the BBSRC's strategic goal to deliver sustainable agriculture and food security. We will implement user-friendly bioinformatics pipelines within the world-leading open-access bioinformatics workflow manager, Nextflow, which provides intuitive, seemless 'plug-and-play' modular platforms, delivering pipelines to analyse data from sequence data to results outputs without the need to be able to write or troubleshoot coding or deal with incompatibilities between packages. Workflows consist of modules, which can be easily updated, switched, added and removed, making organic, responsive pipelines that grow with the user community. The organic, time-proofed nature is made uniquely possible by the skills of an open-source community of over 10,000 members ('nf-core'), who collectively test, correct, update and refine pipelines. This community-led approach means that the pipelines remain state-of-the-art, reproducible, endorsed by bioinformaticians drawn from academia and industry; because pipelines are open-source and widely used, these resources are future-proofed beyond the life of the grant. We will achieve this through three Objectives: (1) Develop a community of agri-ecologist as Ambassadors for omics; (2) Deliver the bioinformatics pipelines needed to address the needs of the ecological community; (3) Wider dissemination of pipelines to end-users; train and stimulate innovation.

Technical Summary

We will implement user-friendly bioinformatics pipelines within the world-leading open-access bioinformatics workflow manager, Nextflow, which provides intuitive, seemless 'plug-and-play', reproducible, scalable and modular bioinformatics pipelines for a broad range of research topics in the Agri-ecology field. Each workpackage/pipeline will be stored on a Github repository with the Nextflow workflow code detailing the flow of data through a multitude of programs (run in containers, Biocontainers/Dockerhub) to result in usable output files for the analysis of complex omics data. Nextflow is a domain specific language (DSL), that provides a well-documented framework to structure modular workflows. We will produce gold-standard nf-core style pipelines, stored on Github for easy-access by the community. The nf-core community sets the best standards for readability, flexibility and reusability of Nextflow code so will be our guide to adhere to at all stages. Where we can, all code will be integrated into the nf-core Github repositories. Finally, Nextflow has a separation of the workflow code and the application runtime, which allows the resource to be scaled to operable across a multitude of HPC and cloud platforms, making the pipelines accessible to a diverse community of end-users.

Publications

10 25 50