OctoSEQ- Sequencing the octoploid strawberry

Lead Research Organisation: Earlham Institute
Department Name: Research Faculty

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Planned Impact

Impact Summary

This grant will have a global impact, both on the research field internationally and on the international industry, especially the UK industry. Through full engagement with industry stakeholders, via workshops and the involvement of 5 industrial partners in the project (contributing £125k cash), maximum translation of this research will be ensured, driving forward the UK plant breeding industry in a globally competitive market. The availability of an octoploid genome sequence (integrated into existing resources see LOS Davey and Main) far outstrips the utility of the diploid strawberry genome, as it allow subgenome-level resolution of gene families, which is essential for both basic research and for molecular breeding.

Who will benefit and how?

Direct beneficiaries:

1. Commercial private sector
The UK and international plant breeding sector will benefit enormously from this endeavour, as the generation of a gold-standard reference genome will allow these industries to first develop better markers for QTL (most traits in strawberry are quantitative and multiple QTL underpin these) and move from marker level associations to candidate gene associations. This is important for next-generation genome editing approaches and functional validation of candidate genes. Furthermore, it facilitates cheap genotyping through reference-guided GbS and a variety of other 'next-gen' technologies to be applied to strawberry that are currently impossible. This moves the industry very quickly to a point where pedigree-based selection and genome-wide selection are affordable and tractable options for crop improvement. Placing this in the hands of the UK partners will give the UK business a significant competitive edge. Note that some UK breeding firms are ineligible to contribute to this proposal due to BBSRC rules, but fully lend their support. (Impact 12-36 months)

2. Fruit growing sector in the UK
UK industry will benefit as they will be able to access a resource that is beyond their means to create. Longer term it is anticipated that the UK levy body the HDC will make significant use of this resource and knowledge generated from this pre-competitive work can for some levy payers lead to competitive work funded by other research bodies (e.g. innovate UK) to benefit the UK economy. Advancing genomic resources in horticultural crops is a key aim of the HDC and evidenced by their support in this proposal (Benefit within 5-10 years).

3. Public and retail sector-
Several UK retailers aim to double sales of UK-produced fruit by 2020 (i.e. to £440m farm gate); this project will assist that aim and improve UK productivity and competitiveness. Downstream science conducted utilising this resource will lead to more reliable production methods and potentially reduce wastage in the supply chain (through reduced inputs and better variety development) (Benefit within 5-10 years).

Indirect beneficiaries
The wider strawberry growing industry
As a result of the genome sequence, the rate of change of varietal development will increase, leading to greater benefits to downstream growers, packers and producers. (Benefits 3-5 years)

Government, public and policy benefits
The public will benefit, not only from the improved position of UK agri-business (and access of breeders to novel technologies), but also through the long term improvement in supply chain resilience through improved cultivar development. In the longer term the public will benefit through increased food security and sustainability, as a result of scientific improvements on horticultural crops. This feeds in to many UK Government and EU policy agendas including: health (improving produce quality, pesticides (reducing residues through improved resistance), water (ability to grow nearer water courses), climate (growing crops perennially will improve carbon sequestration) and environment (reduced carbon and pesticides) (Benefit from 3-15 years).
 
Description We have generated draft genome assemblies for 16 varieties of strawberry, haplotype specific backbone-sequences for 2 varieties, which we have shown to be structurally correct as opposed to those generated with other techniques. These are a valuable resource for breeding.
As part of the work on this grant, we have developed methods for haplotype-specific assemblies of complex species. These enable analyses that take into account the whole content of the multiple subgenomes of polyploid crops, whiles previous techniques where producing a "consensus version" mixing up the genomic content that was not being distinguished by the assemblies.
We are also on the process of generating higher quality, haplotype-specific genome assemblies for 6 varieties, which will have a greater haplotypic definition and completion than our previous assemblies.
These assemblies are being shared between the grant members first and will be soon used to enable better breeding, with initial results showing already that features of the genome previously impossible to find on the existing assemblies are being found on these new sequences.
Exploitation Route Strawberry genomes and variation data for agriculturally significant cultivars are being used to enable better breeding.
Sectors Agriculture, Food and Drink

 
Description Variation data, mapped to draft references, has been provided to partners, who are using it in their strawberry breeding programmes.
First Year Of Impact 2020
Sector Agriculture, Food and Drink
Impact Types Economic

 
Title SDG 
Description SDG is a framework to analyse sequence graphs such as those generated by various genome assemblers. It provides a workspace that can contains a graph and datastores for paired, linked and long reads. These reads can be mapped to the graph, and can be used to untangle or scaffold the graph. A SWIG API enables SDG to be used as a Python module, and there is experimental Julia and R support. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact We are currently producing genome assemblies of: multiple wheat cultivars, multiple strawberry cultivars, and more. 
URL https://f1000research.com/articles/8-1490
 
Title SKM-tools 
Description These are a series of tools to compare skip-mers (cyclic spaced-seeds) spectra between different datasets. It can be used to study conservation of sequence across evelotuonary distant organisms. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact We are using skm-tools to study conservation in the context of EI's CSP and BBSRC's DFW projects. 
URL https://github.com/bioinfologics/skm-tools
 
Title w2rap 
Description w2rap is a genome assembly pipeline for complex genomes from short reads. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact W2rap has enabled wheat genomics to jump into a new era of high-quality genomes from short reads. While there are some alternative tools from private companies, w2rap remains the standard for quality reconstruction across the genome. W2rap has already been used to assemble 5 wheat genomes in the public domain, putting the UK at the forefront of wheat genomics. With tens of genomes being assembled now, new modules veing developed for new data types, and 5 wheat lines assembled in a £1M private project, w2rap is one of the flagship projects for Earlham Institute. 
URL https://github.com/bioinfologics/w2rap/
 
Description BBSRC Plant Breeding workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation on Haplotype-specfic genome assembly
Year(s) Of Engagement Activity 2018
 
Description Keynote lecture: Assembling complex crop genomes for comparative analyses 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Bioinformatics for Plant Biology - EBI Cambridge, 6-9 November
Year(s) Of Engagement Activity 2018
URL https://www.ebi.ac.uk/training/events/2018/bioinformatics-plant-biology
 
Description Octoseq Workshop 18-19 December 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Project leaders and researchers engaged in the grant came together to report on the development of the project plus a short training activity relating to the grant output - a browser developed to explore the data. 14 people attended.
Year(s) Of Engagement Activity 2018