Leveraging the genome sequence of two Arabidopsis relatives for evolutionary and ecological genomics

Lead Research Organisation: University of Glasgow
Department Name: Environmental and Evolutionary Biology

Abstract

The question of how changes in DNA sequence result in novel adaptations and in the formation of new species is at the heart of evolutionary biology, and approaches developed from natural changes are also important in studying evolutionary changes during crop domestication. Comparing the genomic DNA of related species does not identify which sequence differences were selected. Statistical population genetic approaches using sequence differences between and within species can pinpoint regions of the DNA that may underlie adaptive changes. However, these techniques are only effective if the genome sequences to be compared are neither too similar nor too dissimilar, and few pairs of genome sequences suitable for the analyses are yet available in the animal or plant kingdoms, or even in fungi. The impending completion of the genome sequences of the two Brassicaceae Arabidopsis lyrata and Capsella rubella, together with the available genome sequence of A. thaliana, offers opportunities to study such questions in plant species at the intermediate evolutionary distances that are ideal for computational studies of evolutionary processes; these three species are also suitable for functional studies. We will exploit this resource by studying sequence evolution on a genome-wide scale and by studying the molecular basis of evolution in two well-characterized and ecologically relevant traits, flowering time and self-incompatibility (SI). We will first generate sequence alignments of all three species and compile all sets of orthologous genes, i.e. descended from the same gene in the species' common ancestor. As the genetic maps of the three species are known to be very similar, large orthologous stretches of genome can be identified. The alignments will immediately allow us to detect the especially interesting category of genes that are present/absent in individual species, allowing study of genome evolution. We will next estimate rates of synonymous sequence changes (not changing the amino-acid sequence of the proteins encoded) and non-synonymous changes between pairs of genes present in each species pair. Comparing these rates across all genes can answer several important questions, including whether rates of non-synonymous substitutions are similar between genes or regions, and if not, whether variation is systematic across large genomic regions. Candidates for having evolved adaptively and/or contributed to speciation will be genes with unusually high rates of non-synonymous substitutions, relative to polymorphism levels within populations (which we shall estimate from a large set of loci to serve as 'controls'). Other interesting candidate genes can be identified from unusually high or low differentiation between natural populations. The sequence analyses will provide a foundation for functional studies of two adaptive traits, flowering time and SI. Genes affecting flowering time will be identified with two complementary approaches. First, variation in flowering time in naturally occurring A. lyrata populations will be correlated with sequence changes in orthologues of known A. thaliana flowering time regulators. Second, we will identify A. lyrata genomic regions with large effects on flowering time by genetic mapping, and then study candidate genes in these regions by manipulating their activity. We will use the self-incompatible species A. lyrata to study the transition to self-compatibility (SC) in some natural populations, and will do similar studies in C. rubella (SC) and its self-incompatible sister species C. grandiflora, including establishing an immortalized mapping population from a cross of the species to map genes associated with SC/SI, and other traits of evolutionary significance, such as flower size, etc. Together, our studies are expected to answer several interesting evolutionary and genome evolution questions, and should also advance breeding programmes in crops.

Technical Summary

The molecular basis of adaptation and species formation are fundamental questions in evolutionary biology, with relevance also to crop biology. Specifically, tools and approaches developed for natural populations are valuable for studying crop domestication. The impending completion of the genome sequences of the two Brassicaceae Arabidopsis lyrata and Capsella rubella, together with the available sequence of A. thaliana, offers opportunities to study adaptation using molecular evolutionary approaches complemented by functional analyses. Our consortium will exploit this resource to study variation underlying differences, including two ecologically important traits, flowering time and self-incompatibility (SI). We will generate genomic sequence alignments of the three species and compile sets of orthologous genes. Non-synonymous substitution estimates per site (Ka) across the orthologous gene set and across syntenic genome regions, together with information on within-species variation, will identify genomic regions with unusually high Ks relative to synonymous divergence (Ks), and rapidly evolving sequences that may have undergone directional selection. Genes lost or gained are also interesting, and will be further analyzed. Genes contributing to flowering time variation will be isolated by two complementary approaches. First, natural phenotypic variation amongst A. lyrata populations will be compared with allele frequencies at orthologues to known A. thaliana flowering time regulators. Second, we will map QTL in A. lyrata and Capsella populations and analyze candidate genes. We will also use self-compatible populations of the usually self-incompatible A. lyrata, and of C. rubella with its self-incompatible, but inter-fertile sister C. grandiflora, to identify and characterize genes associated with the transition from SI to self-compatibility. Together, these studies will answer important evolutionary questions that are of strategic relevance to crop breeding.
 
Description This grant was part of an ERA-PG consortium grant lead by Detlef Weigel, at the Max Planck Institute in Tübingen, Germany. Our part of the project was to investigate the genetic causes and consequences of mating system variation within Arabidopsis lyrata.

This grant was closely associated with NE/D013461 and the postdoctoral research fellows worked closely together. Many of the main outcomes are thus shared with that grant.

However, the outcomes lead by the postdocs employed on this grant (Annabelle Haudry and Hong-Guang Zha) were as follows:

Shifts in mating system in species that have genetically controlled self-incompatibility systems (such as A. lyrata) involve a two-step process: 1) disruption of the system controlling self-incompatibility (SI); 2) transition to a predominance of inbreeding within populations. This research importantly emphasised that the selection pressures for these two steps are different and that loss of SI can occur without a shift in mating system. Specifically, plants that are capable of selfing in populations that are predominantly outcrossing still reproduce predominantly by outcrossing whereas in populations where most individuals are capable of selfing, high levels of inbreeding occurs.

This research investigated the causes of these two shifts. Ecological factors were not identified (e.g. current population density or differences in habitat) and mutations were not identified in the genes directly involved in the SI reaction. Comparison of variation in genes flanking these genes (part of the S-locus) also suggested that loss of SI was not due to increased recombination in the region, which would disrupt the required pairing between genes controlling recognition in male and female components.

A next generation sequencing approach (Illumina sequencing of self-incompatible, SI, and self-compatible, SC, pools of crosses between SI and SC individuasl) was used to investigate the genomic basis of loss of SI. These results were delayed beyond the time that the postdocs left and so the informatics component has taken a number of years to complete. The results have recently been submitted for publication in a special issue of Heredity. Combined with the sequencing of the female recognition gene and its flanking gene region and classical genetic analyses, the results suggest that the loss of SI is due to an unlinked recessive factor, rather than changes in the recognition genes themselves. This is important because most other studies have examined differences between species that vary in mating system and have suggested that mutations that currently appear in the recognition genes caused the original loss of SI. Our results suggest that these changes might have happened after SI was lost, once selection pressures were relaxed. We also found evidence for a bottleneck in SI alleles in inbreeding populations. An important finding from these analyses was that highly divergent genes such as SI genes are at risk of not being assembled in resequencing studies. Our original analyses were confounded because genes at the S-locus (along with other highly polymorphic regions) were found in the unaligned reads which normally would not have been analysed.


We also used a next generation sequencing approach (tagged amplicon sequencing) to investigate whether the shift to inbreeding in only some populations might have been due to a bottleneck in the number of S-alleles (which would reduce the number of effective mating partners if populations remained outcrossing). This research was also delayed due to technological issues with the approach, because the target gene was larger than the available approaches at the time could handle. Working with the Liverpool node of the NERC biomolecular analysis facility, we were able to pilot an alternative, which did produce useful results, which have added to the paper discussed above focused in describing mechanisms of loss of SI.
Exploitation Route Most of what is known about shifts in mating system come from comparisons between species, where additional changes after speciation can obscure the original causes of loss of SI. They also confound loss of SI with a shift to inbreeding. Our results have important implications for considering these as separate processes. This is important for understanding the effects of crop breeding programmes.

Our results suggesting difficulties in assembly of highly polymorphic genes in genome resequencing studies will have important implications for a wide range of studies. Highly polymorphic gene families are known to be critical for adaptation to changing environments but they may not be included in reassemblies.
Sectors Agriculture, Food and Drink,Environment

URL http://www.gla.ac.uk/researchinstitutes/bahcm/staff/barbaramable/barbaramable/
 
Title SRK and flanking gene genotypes for Arabidopsis lyrata 
Description Genotypes at the S-related kinase (SRK) gene and associated flanking genes (B70, B80, B120, B160) of Arabidopsis lyrata, comparing patterns of diversity in self-compatible and self-incompatible populations from the Great Lakes region of Eastern North America. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact This is the only multi-population survey of genotypes at an important locus under balancing selection, the SRK gene that controls self-incompatibility in Brassicaceae and its associated flanking genes. Importantly, this documents a clear bottleneck in self-compatible populations and provides new insights into the evolutionary processes associated with shifts in mating system. 
URL http://dx.doi.org/10.5061/dryad.832t8
 
Title Tagged amplicon sequences for the S-locus related kinase gene for Arabidopsis lyrata 
Description These data have been deposited to the short read archive (bioproject ID: PRJNA339675). These are Illumina sequences generated using a novel shearing approach to sequence a 900 bp fragment of the S-related kinase gene (SRK) from Arabidopsis lyrata samples from the Great Lakes region of eastern North America. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact This is a new methodology that we think will be useful to others. It was developed by the Nerc Biomolecular analysis facility team at the University of Liverpool, to enable amplicon sequencing of large gene fragments (the previous limit was 450 bp). 
 
Title Whole genome sequence pools from self-compatible Arabidopsis lyrata 
Description Illumina sequences have been deposited to the short read archive (Bioproject ID: PRJNA339675) from a pool of 10 individuals with a self-compatible phenotype (Arabidopsis lyrata). 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact These data will be useful for other researchers investigating the genetic basis for loss of self-incompatibility, particularly in plants with a sporophytic system.