Ensembl plant populations: integrating trait analyses and population-based sequence variants into a browsable genomic context

Lead Research Organisation: European Bioinformatics Institute
Department Name: Genome Assembly and Annotation

Abstract

The plant R&D community has generated high-quality annotated reference genome assemblies for numerous model and crop species, available for web-based investigation via our existing Ensembl Plants platform. Similarly, numerous plant genetic resources and populations have been generated using different approaches to capture and exploit genetic diversity. The most important of these serve as focal community resources for plant R&D, and many now come with associated parental genome assemblies and extensive variant data on the offspring. However, making use of these genetic resources and analysing the results in the context of the genes, genetic variants and appropriate reference genomes, remains a disjointed workflow and their use requires significant bioinformatic, genetic, statistical and technical expertise from users. Coordinated integration of the results of genetic analyses with variant datasets against one or more reference genomes within the familiar Ensembl Plants environment would enable a broad range of UK users to rapidly access and use these complementary datasets for multiple plant species.

We will establish the 'Ensembl Plant Populations' platform - a web-tool containing existing population-based sequence and variant data, allowing users to easily run statistically sound genetic analyses using key plant populations. We will focus on seven plant/crop species of high relevance to UK researchers: wheat, barley, rice, brassica, arabidopsis, tomato and oat. These species have been selected based on current Ensembl Plants UK user access statistics, and on the importance of the species to UK agriculture and research. The Ensembl Plant populations tool will provide users with an integrated pipeline to undertake genetic analyses from start to finish, including: (i) upfront investigation of the predicted power of the selected population to detect genetic loci, (ii) inclusion of pre-prepared statistics to support users e.g. to account for varying levels of relatedness between genotypes, (iii) interactive genome-wide view of the results allowing users to move to identified genomic locations of interest in Ensembl Plants, (iv) presentation of useful information linked to genes and variants within those identified regions to help users identify candidate genes for further study.

We will work with the UK plant research community to select appropriate populations for inclusion, a process which has already started, and attend community meetings throughout the project to raise awareness and gather feedback, including holding a dedicated stakeholder workshop. By adding targeted value to our current Ensembl-based tools, resources, and user-base, and tailoring these to plant species of high importance to UK research and agriculture, we aim to maximise the impact of the bioinformatic resources generated here. Collectively, these activities will further support the use and exploitation of the powerful biological resources the wider community has generated and genotyped.

Technical Summary

Ensembl Plants contains community-generated high-quality annotated reference genome assemblies for >100 model and crop species. Numerous plant genetic resources and populations have been generated to capture and exploit genetic diversity, e.g. association mapping/diversity panels and experimental populations, many of which now come with founder genome assemblies and extensive variant data on the progeny. However significant bioinformatic, genetic, statistical and technical expertise from users is required to use these genetic resources and analyse the results in the context of the genes, genetic variants and appropriate reference genomes.

We will establish the 'Ensembl Plant Populations' platform - a web-tool containing existing population-based sequence and variant data, supporting users to run statistically sound genetic analyses. We will focus on seven plant/crop species of high relevance to UK researchers: wheat, barley, rice, brassica, arabidopsis, tomato and oat. The tool will provide users with an integrated pipeline to undertake genetic analyses from start to finish, including: (i) upfront investigation of the predicted power of the selected population to detect genetic loci, (ii) adjustable forward genetic analysis settings, including selection of co-factors, significance threshold type/level, and appropriate corrections for varying levels of relatedness between genotypes, (iii) interactive genome-wide view of the results allowing users to move to identified genomic locations of interest in Ensembl Plants, (iv) presentation of useful information linked to genes and variants within those identified regions to help users identify candidate genes for further study.

We will work with the UK plant research community to select appropriate populations for inclusion, a process which has already started, and attend community meetings throughout the project to raise awareness and gather feedback, including hosting a dedicated stakeholder workshop.

Publications

10 25 50