OctoSEQ- Sequencing the octoploid strawberry

Lead Research Organisation: East Malling Research (United Kingdom)
Department Name: Science

Abstract

This proposal assembles a multinational academic and industry partnership to generate a reference octoploid genome sequence using a set of innovative experimental and computational approaches. This team includes industry and academic partners from the UK, Netherlands, Spain, Italy and Norway.

Recent advances in strawberry genotyping technologies, for example the development of the Axiom IStraw90k SNP genotyping platform through the US-led Rosbreed programme (only possible due to the earlier part-BBSRC funded sequencing of the diploid strawberry genome) have led to the creation of multiple linkage maps, which highly saturate some areas of the genetic map for octoploid strawberry. However, the shortfalls of having only one of four of the 'diploid' ancestral subgenomes sequenced is now apparent, as coverage of the 'non-vesca'- like subgenomes is comparatively poor.

Using some of the latest advances in bioinformatics and sequencing, combined with a technique termed massively parallel BAC sequencing, the proposed project team will first assemble a haploidised version of the octoploid strawberry genome. This will then be separated into separate parental genomes using a sequencing approach, which will combine using information from BAC sequences with single molecule optical mapping. Further anchoring of scaffolds will be deployed to assemble the genome into whole chromosomes. This approach has never been tried before and has only become possible in the last six months due to a number of recent innovations in genome sequencing and visualisation and is at the cutting edge of genome technology. This will resolve the genome into two 'haplotypes', one from each parent of the sequenced cultivar allowing inheritance to be tracked, which is an important innovation.

Strawberry production is one of UK horticulture's greatest success stories and domestic output still continues to expand, leading to over 80% self sufficiency when in season. The value of the crop to the UK recently exceeded £500m per annum, making it the highest value fruit crop in the UK. Globally, the primary problems of production remain the threats of oomycete and fungal diseases, which are now being addressed in the UK through a comprehensive research programme funded by both the UK industry, BBSRC and Innovate UK. The industry are supporting this proposal through the IPA scheme, as they recognize the need for an octoploid genome sequence, for marker assisted breeding (MAB) and other breeding techniques.

MAB is a technique that uses the approximate location of important genes to improve the efficiency of selection in breeding programmes, actively deployed in a number of strawberry breeding programmes around the world, both in the public and private sector. However, due to the lack of an octoploid strawberry genome, progress at identifying the causative genes underpinning important disease resistance and fruit quality traits is slow. Identification and characterisation of gene function is important, not only to enable use of the latest generation of tools in cisgenic and targeted mutagenesis approaches in research and breeding, but also facilitates the study of questions of fundamental scientific interest about trait evolution in polyploids, which is of great important to both crop scientists and fundamental researchers. Further research on the evolution and structure of the genome and gene ohnologs will be of broad scientific interest.

The impact of this research will be large, as the open and collaborative approach that is being taken in this project engages both industry and national research leaders, allowing rapid adoption of the results arising from this project. Similarly, the value for money of the project is high, as it leverages the currently high level of informatics capability available at EMR and sequencing and informatics capabilities at TGAC and provides a springboard to pan-European projects and industry-academia partnerships.

Technical Summary

The octoploid strawberry is compact (800Mb) yet as a genome is complex due to the high levels of heterozygosity (and associated inbreeding depression) and the allo-octoploid nature of the genome. It behaves disomically and the latest evidence suggests that it arose due to the fusion of two allo-tetraploids, which themselves had a (A-B) (B' B'') genome structure. Second generation sequencing approaches using paired-end and mate jumping libraries have largely failed at resolving biologically meaningful contig lengths and it is clear that an alternative approach is required.

Using long-read paired end (450bp Hiseq sequenced illumina libraries) an assembly using Discovar and a novel haplotype selection procedure will be carried out (harnessing the heterozygosity). Following this, a step integrating PCR bias free mate jump libraries and local BAC sequences will be used to phase and extend the assembly (the hypothesis is that a minimum tiling path of BACs is not required due to the downstream use of other technologies). The massively parallel BAC sequencing approach allows the resolution of extremely long haplotypes, allowing the subgenome and heterozygosity assembly problems to be alleviated during the assembly step. This approach is possible due to the highly heterozygous and compact nature of the genome and is a novel approach to complex genome assembly. Further scaffolding and haplotyping is then accomplished by the use of a multi-user generated consensus SNP linkage map, generated by a novel mapping method, recently developed by a collaborator Dr Eric van de Weg and Dr Rob Vickerstaff (EMR PI's group) which allows an extremely accurate reconstruction of marker orders, integrating data from multiple biparental mapping populations.

Downstream feature calling and annotation will be generated along with a Web Apollo server instance for further collaboration. Resequencing of key germplasm will be contributed by partners for reference guided assembly.

Planned Impact

Impact Summary

This grant will have a global impact, both on the research field internationally and on the international industry, especially the UK industry. Through full engagement with industry stakeholders, via workshops and the involvement of 5 industrial partners in the project (contributing £125k cash), maximum translation of this research will be ensured, driving forward the UK plant breeding industry in a globally competitive market. The availability of an octoploid genome sequence (integrated into existing resources see LOS Davey and Main) far outstrips the utility of the diploid strawberry genome, as it allow subgenome-level resolution of gene families, which is essential for both basic research and for molecular breeding.

Who will benefit and how?

Direct beneficiaries:

1. Commercial private sector
The UK and international plant breeding sector will benefit enormously from this endeavour, as the generation of a gold-standard reference genome will allow these industries to first develop better markers for QTL (most traits in strawberry are quantitative and multiple QTL underpin these) and move from marker level associations to candidate gene associations. This is important for next-generation genome editing approaches and functional validation of candidate genes. Furthermore, it facilitates cheap genotyping through reference-guided GbS and a variety of other 'next-gen' technologies to be applied to strawberry that are currently impossible. This moves the industry very quickly to a point where pedigree-based selection and genome-wide selection are affordable and tractable options for crop improvement. Placing this in the hands of the UK partners will give the UK business a significant competitive edge. Note that some UK breeding firms are ineligible to contribute to this proposal due to BBSRC rules, but fully lend their support. (Impact 12-36 months)

2. Fruit growing sector in the UK
UK industry will benefit as they will be able to access a resource that is beyond their means to create. Longer term it is anticipated that the UK levy body the HDC will make significant use of this resource and knowledge generated from this pre-competitive work can for some levy payers lead to competitive work funded by other research bodies (e.g. innovate UK) to benefit the UK economy. Advancing genomic resources in horticultural crops is a key aim of the HDC and evidenced by their support in this proposal (Benefit within 5-10 years).

3. Public and retail sector-
Several UK retailers aim to double sales of UK-produced fruit by 2020 (i.e. to £440m farm gate); this project will assist that aim and improve UK productivity and competitiveness. Downstream science conducted utilising this resource will lead to more reliable production methods and potentially reduce wastage in the supply chain (through reduced inputs and better variety development) (Benefit within 5-10 years).

Indirect beneficiaries
The wider strawberry growing industry
As a result of the genome sequence, the rate of change of varietal development will increase, leading to greater benefits to downstream growers, packers and producers. (Benefits 3-5 years)

Government, public and policy benefits
The public will benefit, not only from the improved position of UK agri-business (and access of breeders to novel technologies), but also through the long term improvement in supply chain resilience through improved cultivar development. In the longer term the public will benefit through increased food security and sustainability, as a result of scientific improvements on horticultural crops. This feeds in to many UK Government and EU policy agendas including: health (improving produce quality, pesticides (reducing residues through improved resistance), water (ability to grow nearer water courses), climate (growing crops perennially will improve carbon sequestration) and environment (reduced carbon and pesticides) (Benefit from 3-15 years).
 
Description 'Please refer to BB/N006682/2 for details of this award's outcomes'.
Exploitation Route NA
Sectors Agriculture, Food and Drink

 
Title Crosslink genetic mapping software program 
Description Crosslink is a software program able to create genetic maps from genotype data collected from the progeny of a cross between two individuals. The program is suitable for use with an "outcross" where the two parents do not need to be genetically inbred, and there is applicable to a wide range of plants where inbreeding cannot be used. The program is designed to scale efficiently to handle the large number of genetic markers typically being generated by modern and emerging genotyping technologies. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact This tool has allowed us to automate the creation of genetic maps using a larger number of markers, and across multiple mapping families, which would otherwise have been extremely time consuming. Researchers at IBERS, Aberystwyth University, and Earlham Institute have also begun using the tool. Our maps will be used as the basis for constructing the cultivated strawberry genome sequence. 
URL https://github.com/eastmallingresearch/crosslink
 
Description 8th International Rosaceae Genomics Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Attended the eigth International Rosaceae Genomics Conference, presented a poster about progress creating genetic maps and software tools for studying disease resistance in strawberry.
Year(s) Of Engagement Activity 2016
URL https://colloque.inra.fr/rgc8/
 
Description The Third International Horticulture Research Conference (Nanjing, China) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I presented a post about my work at the conference.
Year(s) Of Engagement Activity 2016
URL http://www.hortres-conference.org/uploadfiles/The%20Third%20International%20Horticulture%20Research%...