OctoSEQ- Sequencing the octoploid strawberry

Lead Research Organisation: National Institute of Agricultural Botany
Department Name: Centre for Research

Abstract

This proposal assembles a multinational academic and industry partnership to generate a reference octoploid genome sequence using a set of innovative experimental and computational approaches. This team includes industry and academic partners from the UK, Netherlands, Spain, Italy and Norway.

Recent advances in strawberry genotyping technologies, for example the development of the Axiom IStraw90k SNP genotyping platform through the US-led Rosbreed programme (only possible due to the earlier part-BBSRC funded sequencing of the diploid strawberry genome) have led to the creation of multiple linkage maps, which highly saturate some areas of the genetic map for octoploid strawberry. However, the shortfalls of having only one of four of the 'diploid' ancestral subgenomes sequenced is now apparent, as coverage of the 'non-vesca'- like subgenomes is comparatively poor.

Using some of the latest advances in bioinformatics and sequencing, combined with a technique termed massively parallel BAC sequencing, the proposed project team will first assemble a haploidised version of the octoploid strawberry genome. This will then be separated into separate parental genomes using a sequencing approach, which will combine using information from BAC sequences with single molecule optical mapping. Further anchoring of scaffolds will be deployed to assemble the genome into whole chromosomes. This approach has never been tried before and has only become possible in the last six months due to a number of recent innovations in genome sequencing and visualisation and is at the cutting edge of genome technology. This will resolve the genome into two 'haplotypes', one from each parent of the sequenced cultivar allowing inheritance to be tracked, which is an important innovation.

Strawberry production is one of UK horticulture's greatest success stories and domestic output still continues to expand, leading to over 80% self sufficiency when in season. The value of the crop to the UK recently exceeded £500m per annum, making it the highest value fruit crop in the UK. Globally, the primary problems of production remain the threats of oomycete and fungal diseases, which are now being addressed in the UK through a comprehensive research programme funded by both the UK industry, BBSRC and Innovate UK. The industry are supporting this proposal through the IPA scheme, as they recognize the need for an octoploid genome sequence, for marker assisted breeding (MAB) and other breeding techniques.

MAB is a technique that uses the approximate location of important genes to improve the efficiency of selection in breeding programmes, actively deployed in a number of strawberry breeding programmes around the world, both in the public and private sector. However, due to the lack of an octoploid strawberry genome, progress at identifying the causative genes underpinning important disease resistance and fruit quality traits is slow. Identification and characterisation of gene function is important, not only to enable use of the latest generation of tools in cisgenic and targeted mutagenesis approaches in research and breeding, but also facilitates the study of questions of fundamental scientific interest about trait evolution in polyploids, which is of great important to both crop scientists and fundamental researchers. Further research on the evolution and structure of the genome and gene ohnologs will be of broad scientific interest.

The impact of this research will be large, as the open and collaborative approach that is being taken in this project engages both industry and national research leaders, allowing rapid adoption of the results arising from this project. Similarly, the value for money of the project is high, as it leverages the currently high level of informatics capability available at EMR and sequencing and informatics capabilities at TGAC and provides a springboard to pan-European projects and industry-academia partnerships.

Technical Summary

The octoploid strawberry is compact (800Mb) yet as a genome is complex due to the high levels of heterozygosity (and associated inbreeding depression) and the allo-octoploid nature of the genome. It behaves disomically and the latest evidence suggests that it arose due to the fusion of two allo-tetraploids, which themselves had a (A-B) (B' B'') genome structure. Second generation sequencing approaches using paired-end and mate jumping libraries have largely failed at resolving biologically meaningful contig lengths and it is clear that an alternative approach is required.

Using long-read paired end (450bp Hiseq sequenced illumina libraries) an assembly using Discovar and a novel haplotype selection procedure will be carried out (harnessing the heterozygosity). Following this, a step integrating PCR bias free mate jump libraries and local BAC sequences will be used to phase and extend the assembly (the hypothesis is that a minimum tiling path of BACs is not required due to the downstream use of other technologies). The massively parallel BAC sequencing approach allows the resolution of extremely long haplotypes, allowing the subgenome and heterozygosity assembly problems to be alleviated during the assembly step. This approach is possible due to the highly heterozygous and compact nature of the genome and is a novel approach to complex genome assembly. Further scaffolding and haplotyping is then accomplished by the use of a multi-user generated consensus SNP linkage map, generated by a novel mapping method, recently developed by a collaborator Dr Eric van de Weg and Dr Rob Vickerstaff (EMR PI's group) which allows an extremely accurate reconstruction of marker orders, integrating data from multiple biparental mapping populations.

Downstream feature calling and annotation will be generated along with a Web Apollo server instance for further collaboration. Resequencing of key germplasm will be contributed by partners for reference guided assembly.

Planned Impact

Impact Summary

This grant will have a global impact, both on the research field internationally and on the international industry, especially the UK industry. Through full engagement with industry stakeholders, via workshops and the involvement of 5 industrial partners in the project (contributing £125k cash), maximum translation of this research will be ensured, driving forward the UK plant breeding industry in a globally competitive market. The availability of an octoploid genome sequence (integrated into existing resources see LOS Davey and Main) far outstrips the utility of the diploid strawberry genome, as it allow subgenome-level resolution of gene families, which is essential for both basic research and for molecular breeding.

Who will benefit and how?

Direct beneficiaries:

1. Commercial private sector
The UK and international plant breeding sector will benefit enormously from this endeavour, as the generation of a gold-standard reference genome will allow these industries to first develop better markers for QTL (most traits in strawberry are quantitative and multiple QTL underpin these) and move from marker level associations to candidate gene associations. This is important for next-generation genome editing approaches and functional validation of candidate genes. Furthermore, it facilitates cheap genotyping through reference-guided GbS and a variety of other 'next-gen' technologies to be applied to strawberry that are currently impossible. This moves the industry very quickly to a point where pedigree-based selection and genome-wide selection are affordable and tractable options for crop improvement. Placing this in the hands of the UK partners will give the UK business a significant competitive edge. Note that some UK breeding firms are ineligible to contribute to this proposal due to BBSRC rules, but fully lend their support. (Impact 12-36 months)

2. Fruit growing sector in the UK
UK industry will benefit as they will be able to access a resource that is beyond their means to create. Longer term it is anticipated that the UK levy body the HDC will make significant use of this resource and knowledge generated from this pre-competitive work can for some levy payers lead to competitive work funded by other research bodies (e.g. innovate UK) to benefit the UK economy. Advancing genomic resources in horticultural crops is a key aim of the HDC and evidenced by their support in this proposal (Benefit within 5-10 years).

3. Public and retail sector-
Several UK retailers aim to double sales of UK-produced fruit by 2020 (i.e. to £440m farm gate); this project will assist that aim and improve UK productivity and competitiveness. Downstream science conducted utilising this resource will lead to more reliable production methods and potentially reduce wastage in the supply chain (through reduced inputs and better variety development) (Benefit within 5-10 years).

Indirect beneficiaries
The wider strawberry growing industry
As a result of the genome sequence, the rate of change of varietal development will increase, leading to greater benefits to downstream growers, packers and producers. (Benefits 3-5 years)

Government, public and policy benefits
The public will benefit, not only from the improved position of UK agri-business (and access of breeders to novel technologies), but also through the long term improvement in supply chain resilience through improved cultivar development. In the longer term the public will benefit through increased food security and sustainability, as a result of scientific improvements on horticultural crops. This feeds in to many UK Government and EU policy agendas including: health (improving produce quality, pesticides (reducing residues through improved resistance), water (ability to grow nearer water courses), climate (growing crops perennially will improve carbon sequestration) and environment (reduced carbon and pesticides) (Benefit from 3-15 years).

Publications

10 25 50
 
Description Update for 2020- We have had zero positive contact or progress from EI this year; the industry partners are not happy and there appears to be no way to seek output from the PI at EI. It is most disappointing. I am going to try in 2021 to find resource from other projects to get these data into the public domain and a publication prepared. This has been reputationally damaging for NIAB and at a personal level. There appear to be wider structural issues, as I have had similar reports from other NIAB group leaders working with the same group, who appear to have massively overstated their capabilities across the board.

Update for 2019. NIAB have continued (without funding) to deliver resource into this project to support the Earlham Institute partners in their delivery of this work. So-far this has not led to the promised outputs at the level promised within the grant from the EI team. However, the work continues at EI and we will continue to support for as long as we possibly can to help them deliver their side of the project.

We continue, where we can to derive value from the data that has been generated using third party resources to take analysis forward. We are still hopeful of high-impact publication, but this is entirely contingent on delivery from our partners.

The work from EI that is still progressing:

1. We have generated a fully haplotype resolved genome sequence for the octoploid strawberry for a key set of cultivars that are important for research 2. For a further 16 lines we have generated complete genome graphs for pangenome analysis
3. For 200+ cultivars we have resequenced to ~30x
4. For 20 individuals from a mapping population we have generated a haplotype validation set
5. For Redgauntlet we have generated matched isoseq and illumina PE RNAseq data for six bulk tissues for annotation
6. We have generated a combined genetic map for 5+ mapping populations
7. As part of this project EI have developed the sequence scaffolding / assembly pipeline.
Exploitation Route This is an invaluable resource for breeding and molecular biology alike and will be a gold standard genome reference set for future functional analysis and breeding. Industry partners are taking forward their own analysis using the resources that have been generated.
Sectors Agriculture, Food and Drink

 
Description This project is delivering genomes into the hands of breeders for enhancement of their breeding programmes. NIAB EMR have made all data available to the breeders so far. A final genome release is still pending from the Earlham institute, at which point we anticipate significant impact, as the genome releases to date have been significantly better than other published work. This will allow breeders to accurately develop markers and develop functional trait characterisation pipelines. A fuller update will be provided once the work is released. Update- 2019- this status remains unchanged. NIAB have continued to assist the industry in making available what has emerged from EI and tried to supplement with our own analysis to breeders. We will continue to do so. Update 2020-21- Our collaborators have still not delivered anything that is of any use from this project. I have ensured that all companies have full access to the available data, though it is clear that a major opportunity has been missed due to non-delivery of assembled genomes. I am now making efforts to secure other funds in order to move this project forward without other collaborators, but it is a struggle to obtain funding for this work.
First Year Of Impact 2021
Sector Agriculture, Food and Drink
Impact Types Economic

 
Title Genomic prediction for day neutrality 
Description This innovation has led to genomic prediction ability for day neutrality at a high level of accuracy. This was underpinned by a number of linked developments in genome sequencing, GWAS population development, phenotyping and statistical genetics. 
IP Reference  
Protection Protection not required
Year Protection Granted
Licensed Commercial In Confidence
Impact Day neutrality is a key trait for breeding, which was resolved through analysis of the octoploid genome. This is now being used in rapid combining of resistance and quality traits in breeding programmes, delivering enhanced varieties to market.
 
Title Molecular markers/ genomic prediction for low-input, fruit uniformity and other quality traits 
Description Molecular markers for key fruit quality traits, along with low input (low P and AMF conlonisation ) were developed as a result of the synthesis of work from multiple projects. 
IP Reference  
Protection Protection not required
Year Protection Granted
Licensed Commercial In Confidence
Impact These markers are used in genomic selection pipelines for delivery of enhanced varieties of strawberry to the market, with improvements in fruit quality.
 
Title Crosslink genetic mapping programme 
Description This is a piece of software for genetic mapping in complex outcrossing polyploids 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact Assembled the first multi-population linkage map for octoploid strawberry