OctoSEQ- Sequencing the octoploid strawberry

Lead Research Organisation: Earlham Institute
Department Name: Research Faculty


This proposal assembles a multinational academic and industry partnership to generate a reference octoploid genome sequence using a set of innovative experimental and computational approaches. This team includes industry and academic partners from the UK, Netherlands, Spain, Italy and Norway.

Recent advances in strawberry genotyping technologies, for example the development of the Axiom IStraw90k SNP genotyping platform through the US-led Rosbreed programme (only possible due to the earlier part-BBSRC funded sequencing of the diploid strawberry genome) have led to the creation of multiple linkage maps, which highly saturate some areas of the genetic map for octoploid strawberry. However, the shortfalls of having only one of four of the 'diploid' ancestral subgenomes sequenced is now apparent, as coverage of the 'non-vesca'- like subgenomes is comparatively poor.

Using some of the latest advances in bioinformatics and sequencing, combined with a technique termed massively parallel BAC sequencing, the proposed project team will first assemble a haploidised version of the octoploid strawberry genome. This will then be separated into separate parental genomes using a sequencing approach, which will combine using information from BAC sequences with single molecule optical mapping. Further anchoring of scaffolds will be deployed to assemble the genome into whole chromosomes. This approach has never been tried before and has only become possible in the last six months due to a number of recent innovations in genome sequencing and visualisation and is at the cutting edge of genome technology. This will resolve the genome into two 'haplotypes', one from each parent of the sequenced cultivar allowing inheritance to be tracked, which is an important innovation.

Strawberry production is one of UK horticulture's greatest success stories and domestic output still continues to expand, leading to over 80% self sufficiency when in season. The value of the crop to the UK recently exceeded £500m per annum, making it the highest value fruit crop in the UK. Globally, the primary problems of production remain the threats of oomycete and fungal diseases, which are now being addressed in the UK through a comprehensive research programme funded by both the UK industry, BBSRC and Innovate UK. The industry are supporting this proposal through the IPA scheme, as they recognize the need for an octoploid genome sequence, for marker assisted breeding (MAB) and other breeding techniques.

MAB is a technique that uses the approximate location of important genes to improve the efficiency of selection in breeding programmes, actively deployed in a number of strawberry breeding programmes around the world, both in the public and private sector. However, due to the lack of an octoploid strawberry genome, progress at identifying the causative genes underpinning important disease resistance and fruit quality traits is slow. Identification and characterisation of gene function is important, not only to enable use of the latest generation of tools in cisgenic and targeted mutagenesis approaches in research and breeding, but also facilitates the study of questions of fundamental scientific interest about trait evolution in polyploids, which is of great important to both crop scientists and fundamental researchers. Further research on the evolution and structure of the genome and gene ohnologs will be of broad scientific interest.

The impact of this research will be large, as the open and collaborative approach that is being taken in this project engages both industry and national research leaders, allowing rapid adoption of the results arising from this project. Similarly, the value for money of the project is high, as it leverages the currently high level of informatics capability available at EMR and sequencing and informatics capabilities at TGAC and provides a springboard to pan-European projects and industry-academia partnerships.

Technical Summary

The octoploid strawberry is compact (800Mb) yet as a genome is complex due to the high levels of heterozygosity (and associated inbreeding depression) and the allo-octoploid nature of the genome. It behaves disomically and the latest evidence suggests that it arose due to the fusion of two allo-tetraploids, which themselves had a (A-B) (B' B'') genome structure. Second generation sequencing approaches using paired-end and mate jumping libraries have largely failed at resolving biologically meaningful contig lengths and it is clear that an alternative approach is required.

Using long-read paired end (450bp Hiseq sequenced illumina libraries) an assembly using Discovar and a novel haplotype selection procedure will be carried out (harnessing the heterozygosity). Following this, a step integrating PCR bias free mate jump libraries and local BAC sequences will be used to phase and extend the assembly (the hypothesis is that a minimum tiling path of BACs is not required due to the downstream use of other technologies). The massively parallel BAC sequencing approach allows the resolution of extremely long haplotypes, allowing the subgenome and heterozygosity assembly problems to be alleviated during the assembly step. This approach is possible due to the highly heterozygous and compact nature of the genome and is a novel approach to complex genome assembly. Further scaffolding and haplotyping is then accomplished by the use of a multi-user generated consensus SNP linkage map, generated by a novel mapping method, recently developed by a collaborator Dr Eric van de Weg and Dr Rob Vickerstaff (EMR PI's group) which allows an extremely accurate reconstruction of marker orders, integrating data from multiple biparental mapping populations.

Downstream feature calling and annotation will be generated along with a Web Apollo server instance for further collaboration. Resequencing of key germplasm will be contributed by partners for reference guided assembly.

Planned Impact

Impact Summary

This grant will have a global impact, both on the research field internationally and on the international industry, especially the UK industry. Through full engagement with industry stakeholders, via workshops and the involvement of 5 industrial partners in the project (contributing £125k cash), maximum translation of this research will be ensured, driving forward the UK plant breeding industry in a globally competitive market. The availability of an octoploid genome sequence (integrated into existing resources see LOS Davey and Main) far outstrips the utility of the diploid strawberry genome, as it allow subgenome-level resolution of gene families, which is essential for both basic research and for molecular breeding.

Who will benefit and how?

Direct beneficiaries:

1. Commercial private sector
The UK and international plant breeding sector will benefit enormously from this endeavour, as the generation of a gold-standard reference genome will allow these industries to first develop better markers for QTL (most traits in strawberry are quantitative and multiple QTL underpin these) and move from marker level associations to candidate gene associations. This is important for next-generation genome editing approaches and functional validation of candidate genes. Furthermore, it facilitates cheap genotyping through reference-guided GbS and a variety of other 'next-gen' technologies to be applied to strawberry that are currently impossible. This moves the industry very quickly to a point where pedigree-based selection and genome-wide selection are affordable and tractable options for crop improvement. Placing this in the hands of the UK partners will give the UK business a significant competitive edge. Note that some UK breeding firms are ineligible to contribute to this proposal due to BBSRC rules, but fully lend their support. (Impact 12-36 months)

2. Fruit growing sector in the UK
UK industry will benefit as they will be able to access a resource that is beyond their means to create. Longer term it is anticipated that the UK levy body the HDC will make significant use of this resource and knowledge generated from this pre-competitive work can for some levy payers lead to competitive work funded by other research bodies (e.g. innovate UK) to benefit the UK economy. Advancing genomic resources in horticultural crops is a key aim of the HDC and evidenced by their support in this proposal (Benefit within 5-10 years).

3. Public and retail sector-
Several UK retailers aim to double sales of UK-produced fruit by 2020 (i.e. to £440m farm gate); this project will assist that aim and improve UK productivity and competitiveness. Downstream science conducted utilising this resource will lead to more reliable production methods and potentially reduce wastage in the supply chain (through reduced inputs and better variety development) (Benefit within 5-10 years).

Indirect beneficiaries
The wider strawberry growing industry
As a result of the genome sequence, the rate of change of varietal development will increase, leading to greater benefits to downstream growers, packers and producers. (Benefits 3-5 years)

Government, public and policy benefits
The public will benefit, not only from the improved position of UK agri-business (and access of breeders to novel technologies), but also through the long term improvement in supply chain resilience through improved cultivar development. In the longer term the public will benefit through increased food security and sustainability, as a result of scientific improvements on horticultural crops. This feeds in to many UK Government and EU policy agendas including: health (improving produce quality, pesticides (reducing residues through improved resistance), water (ability to grow nearer water courses), climate (growing crops perennially will improve carbon sequestration) and environment (reduced carbon and pesticides) (Benefit from 3-15 years).
Description This project is now reaching its final objective of providing a genomic overview of Octoploid Strawberry. We have generated draft genome assemblies for 16 varieties of strawberry, and haplotype specific backbone-sequences for 2 varieties, which we have shown to be structurally correct. These are a valuable resource for breeding.
As part of the work on this grant, we have developed methods for haplotype-specific assemblies of complex species. These enable analyses that take into account the whole content of the multiple subgenomes of polyploid crops, whiles previous techniques where producing a "consensus version" mixing up the genomic content that was not being distinguished by the assemblies.
We are also on the process of generating higher quality, haplotype-specific genome assemblies for 6 varieties, which will have a greater haplotypic definition and completion than our previous assemblies.
These assemblies are being shared between the grant members first and will be soon used to enable better breeding, with initial results showing already that features of the genome previously impossible to find on the existing assemblies are being found on these new sequences.
Exploitation Route Strawberry genomes for agriculturally significant cultivars can (and will) be used to enable better breeding.
Sectors Agriculture, Food and Drink

Title SDG 
Description SDG is a framework to analyse sequence graphs such as those generated by various genome assemblers. It provides a workspace that can contains a graph and datastores for paired, linked and long reads. These reads can be mapped to the graph, and can be used to untangle or scaffold the graph. A SWIG API enables SDG to be used as a Python module, and there is experimental Julia and R support. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact We are currently producing genome assemblies of: multiple wheat cultivars, multiple strawberry cultivars, and more. 
URL https://f1000research.com/articles/8-1490
Title SKM-tools 
Description These are a series of tools to compare skip-mers (cyclic spaced-seeds) spectra between different datasets. It can be used to study conservation of sequence across evelotuonary distant organisms. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact We are using skm-tools to study conservation in the context of EI's CSP and BBSRC's DFW projects. 
URL https://github.com/bioinfologics/skm-tools
Title w2rap 
Description w2rap is a genome assembly pipeline for complex genomes from short reads. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact W2rap has enabled wheat genomics to jump into a new era of high-quality genomes from short reads. While there are some alternative tools from private companies, w2rap remains the standard for quality reconstruction across the genome. W2rap has already been used to assemble 5 wheat genomes in the public domain, putting the UK at the forefront of wheat genomics. With tens of genomes being assembled now, new modules veing developed for new data types, and 5 wheat lines assembled in a £1M private project, w2rap is one of the flagship projects for Earlham Institute. 
URL https://github.com/bioinfologics/w2rap/
Description BBSRC Plant Breeding workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation on Haplotype-specfic genome assembly
Year(s) Of Engagement Activity 2018
Description Keynote lecture: Assembling complex crop genomes for comparative analyses 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Bioinformatics for Plant Biology - EBI Cambridge, 6-9 November
Year(s) Of Engagement Activity 2018
URL https://www.ebi.ac.uk/training/events/2018/bioinformatics-plant-biology
Description Octoseq Workshop 18-19 December 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Project leaders and researchers engaged in the grant came together to report on the development of the project plus a short training activity relating to the grant output - a browser developed to explore the data. 14 people attended.
Year(s) Of Engagement Activity 2018