Comparative population genomics of red clover domestication and improvement

Lead Research Organisation: Earlham Institute
Department Name: Research Faculty


We propose to use the largely undomesticated red clover forage crop as a model for unravelling a key domestication trait. Forage legumes have superior feeding value for ruminant animals, and their nitrogen fixing capability enables them to provide useful ecosystem services in terms of improvement of soil fertility. Despite these properties their use in livestock agriculture declined particularly in Europe in the 70's and 80's, chiefly due to the availability of cheap chemically produced nitrogen fertilizer. The drive towards more sustainable agriculture, particularly less use of fertilizer manufactured from fossil fuels has halted the decline, and there is increasing interest in these legume crops, particularly in mixtures with forage grasses. There is thus an urgent need to accelerate their genetic improvement, which has stalled in later years due to lack of investment. This proposal aims to use recently developed genomics resources for red clover, second only to alfalfa in importance in temperate agriculture, to assess the genetic and phenotypic diversity of a European-wide collection of germplasm. One of the most fundamental requirements for genetic improvement programmes is to have access to genetic variation within your germplasm. There are suggestions that most recent European breeding populations have a relatively narrow base. With a very recent history of breeding, the largely undomesticated red clover crop is an ideal candidate to provide a comprehensive assessment of the role of domestication in changing the genome landscape during a crop improvement programme. In other words we will aim to characterise the genomic impact of domestication in a crop improvement programme by using red clover as a model.
We will use a collection of populations from a range of habitats from throughout Europe together with elite breeding material. We will use this diversity panel to assess the genome-wide nucleotide diversity and use this information to tell us which regions of the genome have been subject to selective pressures either as a result of breeding or environmental adaptation. The focus will be on a key domestication trait, namely prostrate versus erect growth habit, which has a profound effect on grazing tolerance and persistency in forage crops. Plants with more prostrate growth habits are likely to be more tolerant to grazing and be more persistent. On the other hand, there is a yield penalty associated with prostrateness. Unravelling the genetic architecture is thus of major importance for genetic improvement, and will also give us novel insight into this fundamental trait in plants.
We will use two types of plant material for this: A diversity panel consisting of ecotypes and natural populations with varying degrees of prostrate growth habit, and compare with elite breeding populations, all of which are erect. Secondly, we will generate populations segregating for this trait by crossing an erect female parent from elite material with five pollen donors taken from the prostrate natural populations. Phenotypic analysis of agronomic and growth traits in these populations will be accompanied by chemical analysis of various forage quality traits, and by obtaining genome-wide SNP polymorphism data. This will be achieved by restriction associated DNA (RAD) marker analysis in the mapping populations, as well as the diversity panel. In combination with the improved genome sequence assembly, this will enable us to identify and map genomic regions under selection, and allow identification of some of the genes governing this trait. This will provide novel insight into the architecture of domestication traits. The partnership with Germinal Holdings Ltd gives us a pipeline into the breeding programme, which will ensure that the genomic data and knowledge we obtain will benefit genetic improvement of red clover.

Technical Summary

The aim of this project is to characterise the genetic diversity in natural and breeding populations to identify genome wide changes during breeding of red clover. We wish to investigate to what extent artificial selection has affected the genome including genic and non-genic regions, and whether this has resulted in a reduction in diversity and increase in rare alleles. We propose the following programme to test this. Firstly, we will use RAD marker polymorphisms from an existing mapping family to improve alignment of the red clover genome assembly to a genetic map. Secondly, we will use a 600 genotype strong panel of ecotypes with mostly prostrate growth habit together with erect elite breeding material to analyse the genetic diversity by RAD marker sequencing. Phenotypic, chemical analyses and ecogeographical information of the diversity panel will allow us to obtain information on how haplotype structure correlates with phenotype and environmental gradients likely to impact on environmental adaptations. Nucleotide diversity and locus by locus genetic differentiation will reveal genomic regions under selection. We will also generate mapping families segregating for growth habit, thus enabling us to associate and map more accurately the target traits relating to a prostrate and erect phenotype. The goal is to identify the genes responsible for most of the genetic variation in this key domestication trait. This project will generate information on the genetic basis of a fundamental trait, provide insight into selection during recent domestication and inform the forage legume breeding programme. We will also re-sequence the five pollen donors in the new mapping family. This, the RAD marker data and the reference assembly will give us valuable information about genome-wide linkage disequilibrium, levels of heterozygosity, SNP density and patterns of polymorphisms in coding and non-coding sequence.

Planned Impact

The delivery of the proposed objectives is focused on current needs for which suitable tools and skilled researchers are lacking or not fully developed. The objectives will also directly contribute solutions in areas of research relevant to the BBSRC's scientific priorities in food security and integrate established methods in genetics with emerging multidisciplinary research areas such as genomics and bioinformatics.
This programme will generate new opportunities for collaborative work between TGAC and IBERS and will also extend to other R&D groups in industry and academic institutions. We will communicate our results of fundamental aspects of domestication traits in forage crops through peer reviewed publications, and at national and international conferences. This work will significantly improve the draft red clover genome, which will be one of the first for a temperate forage crop. It will thus have general interest beyond the immediate circle of colleagues interested in red clover. We will disseminate the data using TGAC computing resources and by depositing the raw sequences and assemblies in long-term repositories established at the EMBL-EBI. One of the main stakeholders is Germinal Holdings Ltd who funds a significant part of IBERS breeding programme. This work is aligned closely with the initial stages of the new red clover breeding programme. The generation of large numbers of SNP marker polymorphisms will bring with it the potential to use genomics-based prediction of selection candidates. Accelerated breeding cycles lead to a faster route from breeding material to new varieties. The current programme of genetic improvement in red clover has the potential to lead to significant increases in seed sales, of which Germinal Holdings have 18% currently in the UK. New varieties with better persistence and disease resistance also open up significant opportunities for export to the European market, as well as North America and New Zealand.

The proposed work will directly impact the local community with the generation of new jobs and potential funding opportunities. The collaboration between IBERS and TGAC has strengthened the position of these Institutes in hosting specific expertise in advanced breeding for forage legumes. This will create new opportunities around the development of applications in genomics and bioinformatics, which will translate into job opportunities. It is widely recognised that the shortage of expertise and skills in biomathematics and informatics in the UK and across the world is a major risk for future development in life sciences, in this context this proposal will attract talented staff to work with IBERS and TGAC. Finally, the ecotype collection we use here consists partly of natural populations that were collected from locations in Eastern Europe in the period from the early 1990's to the early 2000's. Many of these habitats are now under severe threat, which means that some of the accessions we work with may no longer exist naturally. Our work thus represents the best way of utilising this unique resource, and preserving the diversity.

This collaboration will contribute to reinforce the UK's leadership in translating the development of genetic and genomic resources from fundamental science to applications with a potential impact on the local and national economy. The development of a more sustainable agriculture is a key aspect of the UK strategy and this is aligned with the objectives in the recently published Agri-tech Strategy. A consequence of the implementation of this proposal is to position IBERS and TGAC as international leaders in biotechnology specifically in the area of forage legumes. This will deliver impact to a broad range of stakeholders emphasising the key role that the Institutes will play in enabling researchers to develop cutting-edge science in the coming years, and ensure that the genomics resources will be translated to research and breeding programmes.
Description Red clover is one of the most important forage legume crops in temperate agriculture, and a key component of sustainable intensification of livestock farming systems. Enhancing its role further in sustainable agriculture requires genetic improvement of persistency, disease resistance, and tolerance to grazing. To help address these challenges, we have assembled a chromosome-scale reference genome for red clover, and made it available to the researchers and breeders through ensEMBL, Phytozome and LegumeBase. We have used this reference ourselves to analyse three red clover populations with the ultimate goal of bringing novel donor accessions to the breeding programme. Furthermore, our analysis evidenced that red clover recently diverged from the model legume Medicago truncatula and most of the findings in it could efficiently been applied in red clover. During the last part of the project, we have integrated the genome assembly with a biparental genetic map segregating for growth habit (erectile or prostatic), which is a predictive trait for persistency in several temperate forages.
Exploitation Route Our reference is publically available in three genome databases (ensEMBL, Phytozome and LegumeBase) and being used in 16 studies to date, e.g. mining for markers for breeding (, transcriptomics (doi:10.3390/agronomy7010016) and metagenomics ( Our collaborators at IBERS lead the temperate forages improvement programme and have used this reference to analyse a collection of accessions from the genebank and incorporate new material in the breeding programme. Specifically, we have identified markers and donor accessions segregating in growth habit, which is a predictive trait for persistency. Specifically, we have identified markers and later donor accessions with prostatic growth, which have been incorporated in new crosses by the breeders at IBERS, aiming for improvements in persistency.
Sectors Agriculture, Food and Drink

Description Darwin Tree of Life
Amount £9,360,421 (GBP)
Funding ID 218328/Z/19/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 11/2019 
End 05/2022
Title PRJEB34003 - Red clover Trifolium pratense high-quality genome 
Description Red clover Trifolium pratense high-quality genome assembly USING OUR PIPELINE DEVELOPED IN WP1 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact Used as the reference for a new dataset for a GCRF funded project. 
Title Trifolium pratense reference 
Description We assembled 309 Mb of the red clover genome in 39,904 scaffolds. Half of the assembly was contained in 353 scaffolds (N50 = 223 Kb), while 1054 scaffolds longer than 50 Kbp contained another 25%. We annotated 40,868 genes and 42,223 transcripts. Of those, 22,042 genes were anchored onto the seven chromosomes. The reference is available through ensEMBL (, Phytozome (!info?alias=Org_Tpratense) and Legumebase ( 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact Enhancing red clover's role in sustainable agriculture requires genetic improvement of persistency, disease resistance, and tolerance to grazing. To help address these challenges, we assembled a chromosome-scale reference genome for red clover. We observed large blocks of conserved synteny with the model legume Medicago truncatula and estimated that the two species diverged ~23 million years ago. Among the 40,868 annotated genes in red clover, we identified gene clusters involved in biochemical pathways of importance for forage quality and livestock nutrition. 
Description Aberystwyth University - IBERS, in temperate and tropical forages 
Organisation University of Wales
Department Institute of Biological Environmental and Rural Sciences (IBERS)
Country United Kingdom 
Sector Academic/University 
PI Contribution Genetic map integration with genome assemblies. SNP and diversity calling in natural and induced populations of forages.
Collaborator Contribution Genetic map assembly and validation. Genotyping-by-synthesis (GBS) library preparation and sequencing for diversity analysis. Gene and transcriptomic analysis.
Impact Chromosome-scale genome references for Trifolium pratense, Lolium perenne and Brachiaria ruziziensis.
Start Year 2014
Description Agrosavia-Earlham Institute MoU 
Organisation Colombian Agricultural Research Corporation
Country Colombia 
Sector Charity/Non Profit 
PI Contribution Genomic approaches to access the crop diversity at the Colombian National Germplasm collection hosted by Agrosavia/Corpoica
Collaborator Contribution Making available genetic resources and evaluation trials
Impact Two pilot projects on Musa accessions and legume forages to identify genome-wide trait-SNP associations for molecular breeding.
Start Year 2019
Description Bioinformatics for Breeding 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The practical course featured a collection of methods and bioinformatics tools fundamental for modern breeding, especially for crops. Next generation sequencing (NGS) has made large collections of open-source diversity genomic data possible, such as SNPs, that can be used as molecular markers for breeding. Combined with phenotypes, genome-wide association studies provide breeders with an understanding of the molecular basis of complex traits. Content: SNP calling/discovery and SNPs effects and context; NGS techniques for genotyping; Genetic markers, linkage analysis, and genetic maps; High-throughput phenotyping and image analysis; Association mapping (GWAS); Genome-wide predictions, modelling and simulations; Genomic selection.
Year(s) Of Engagement Activity 2017