BBSRC Institute Strategic Programme: Decoding Biodiversity (DECODE) - Partner Grant

Lead Research Organisation: Royal Botanic Gardens
Department Name: Directorate

Abstract

The start of the 21st Century saw the landmark publication of the human genome, changing the way we do biology and having a huge impact on medicine. This heralded a new era of genomics that was initially dominated by the generation and analysis of genomes of model organisms and more economically important species. Concurrently, genome technologies have enabled advances in microbiology, such as disentangling complex communities or, as has been seen in the pandemic, identifying new emerging variants of SARS-CoV-2. These rapid advances have been driven by innovation in high-throughput sequencing technologies and software to assemble and analyse genomes. Recently, step changes in these areas enable the generation of high-quality genomes at scale, making ambitious projects like the Earth Biogenome Project, with the goal of generating genomes for all eukaryotic life, feasible. Furthermore, it means that rather than being limited to a single genome for a species, it is now possible to generate multiple genomes, helping to capture the diversity of the species. However, the scale and complexity of this genomic data presents an analytical challenge and there is a pressing need across the public and private sectors (our stakeholders) for tools, expertise and capacity to translate genomes and long-read technologies into discoveries. The outputs of the Decoding Biodiversity (DECODE) research programme will deliver to this need, to the BBSRC Transformative Technologies theme, and to the government prioritization of investment and innovation in genomics and bioinformatics (UK Innovation Strategy).

DECODE brings together expertise in computational biology, mathematics and genomics. It builds on innovations from our previous core strategic programme "Genomics for Food Security", the cross Institute Strategic Programme (ISP) "Designing Future Wheat", and the Quadram ISP "Gut Microbes and Health". In addition, it draws on the experience and networks gained through the research capacity-building programme "Grow Colombia", and as a partner in the Darwin Tree of Life (DToL) consortium. DECODE is delivered through three interconnected work packages:
Work package 1 will develop tools and techniques to investigate biodiversity. Specifically, this includes developing methods for: comparing multiple genomes within and across species to identify structural changes; using multiple genomes to improve annotation of coding and regulatory regions in the genome; resolving complexity of bacteria communities and biological roles within those communities; the deployment of sequences as real-time sensors of environmental communities. With our partners IBM and Eagle genomics, we will make the software and workflows developed are robust, deployable and scalable.
Work package 2 will use the tools developed in WP1 to investigate biodiversity in publicly available genomes. We will use multiple analytical approaches to: assign function to genomic "dark matter"-genes of currently unknown function; investigate mechanisms underpinning chemical diversity in plants; and identify mechanisms driving genetic diversity in key agricultural crops and aquaculture species.
Work Package 3 will use long read sequencing technologies and the tools developed in WP1 to uncover and explore biodiversity. Specifically, how community structures change over time in increasingly complex systems (the gut, anaerobic digesters and soil) will be investigated. Furthermore, through quantifying gene content changes, WP3 will aim to identify how biological functions change in a community and link these to community health.

To deliver this programme, we have established four key strategic partnership: RBG Kew will provide expertise in plant metabolism, pangenomics and crop wild relatives, IBERS brings expertise in UK orphan crops, the UK Center for Ecology and Hydrology will provide valuable soil samples and access to datasets, and IBM Research will support deployment and scalability of tools.

Technical Summary

This project represents RBG Kew's contribution to the delivery of the following Institute Strategic Programme Grant: Decoding Biodiversity (DECODE), BB/X011089/1.
RBG Kew will contribute to four of the programme's deliverables. Firstly, we will work on the representation and analysis of genome sequences from multiple closely-related individuals (typically, individuals from a single taxonomic domain, e.g., a species). Sequencing multiple individuals allows the exploration of genomic variation among them and the linkage of this variation to organismal traits. However, the dominant paradigm for representing these data is poor at representing large-scale structural variation and relies on a single reference individual, which may lack sequence presence in others. Kew has been developing a new browser for exploring such variations in a reference-free manner, whose use we will extend to important crop genomes with complex genomic architectures. We will furthermore explore sequence repeats their role in genome expansion in these species.
Secondly, we will use targeted and untargeted metabolomics to identify tissues of UK Asteraceae species that are rich in bioactive metabolites, facilitated by analysis of specimens in Kew's extensive Asteraceae Collections (Herbarium, Millennium Seed Bank, Living Collections) with a focus on terpenoids. Many terpenoids have substantial pharmacological bioactivity and we will work integrating these novel data with transcriptomic data produced at the Earlham Institute to reconstruct biosynthetic pathways of potential medical interest.
Finally, we will draw on this work (and other work conducted in the programme) to explore the functional consequences of allelic and gene content variation, with an initial focus on rice.

Publications

10 25 50