Assembling and recombining the Arabidopsis centromeres

Lead Research Organisation: University of Cambridge
Department Name: Plant Sciences

Abstract

When cells divide the chromosomes must copy themselves and segregate to opposite cell poles. It is critical that each daughter cell inherits a balanced number of chromosomes. This process is achieved by the chromosomes binding to spindle microtubules. Chromosomes attach to the spindle via specialised regions called centromeres. A failure to attach to the spindle correctly can cause defects in chromosome segregation and is associated with cancer and infertility.

The role of the centromeres in ensuring chromosome segregation is an ancient and deeply conserved function in cells. Typically, when such highly conserved processes are studied, the mechanisms involved are very similar between different species. However, surprisingly the opposite is true for the centromeres, and the DNA sequences and proteins associated with them are some of the fastest changing in the genome. This phenomenon is termed the 'centromere paradox'.

One challenge to studying the centromeres is that the associated DNA sequences are highly repetitive. For example, in many species the centromeres consist of short (~170-180 base pairs) sequences copied many times (100s-1000s) in a tandem head-to-tail orientation. These are known as satellite arrays and it is within these sequences that the microtubules will bind to the chromosome. It is also known that the centromere satellite arrays are capable of rapid and extensive change between species, yet how these satellite arrays evolve and change is poorly understood.

The very high degree of repetition has made the centromeres essentially impossible to study with the previous generation of short read DNA sequencing technologies. However, new opportunities are arising with the advent of long-read DNA sequencing technologies, including Oxford Nanopore. In this proposal we will harness long-read sequencing to assemble the centromeres of the model plant species Arabidopsis for the first time. We will use these maps of the centromeres to investigate how recombination processes occurring during the germline might contribute to the fast evolution of the centromeres. A major output from this work will be completion of the centromere gaps in the Arabidopsis genome, which we will release to the community.

In addition to specific repeat DNA sequences, the centromeres are known to require epigenetic marks for their function during cell division. For example, a special histone protein called CENH3 binds to the centromeres and is critical for chromosomes to attach to the spindle microtubules. Additionally, centromeres are often highly modified by DNA methylation, although the function of this epigenetic mark in the centromeres is unknown. Therefore, in the final objective we will use long-read sequencing to investigate the centromeres and recombination in Arabidopsis mutants that lack DNA methylation.

Together our work will reveal new insights into how the centromeres are structured and how they evolve so quickly. Importantly, in many crop species the regions surrounding the centromeres are also suppressed for recombination, which can limit strain improvement during breeding. The knowledge we generate in this proposal may therefore provide ways to unlock recombination close to the centromeres in order to accelerate crop breeding.

Technical Summary

Centromeres play a conserved and essential function during cell division, where they attach chromosomes to spindle microtubules during segregation. Despite this deeply conserved function, the DNA sequences underlying centromeres are both extremely fast evolving and highly repetitive. Recent advances in long read DNA sequencing technology are providing new opportunities to map and understand the centromeres.

In this project we will harness Oxford nanopore sequencing and optical mapping techniques to generate gold-standard assemblies of the Arabidopsis centromeres in key ecotypes (Col, Ler and Cvi). These maps will serve as a foundation to investigate mechanisms of recombination that mediate change to the centromeric repeat arrays. Specifically we will sequence and assemble the centromeres from F2 individuals generated from Col x Ler and Col x Cvi crosses, in order to test for the signatures of meiotic recombination (crossover and gene conversion) and transposition. We hypothesise that meiotic recombination plays a central role in evolution of centromere repeat sequences, with the potential to increase and decrease satellite numbers.

Centromeres are also typified by specialized chromatin states, which are thought to contribute to their function. For example, centromere repeats are known to be densely DNA methylated. In the final objective we will use Arabidopsis DNA methylation mutants to test for changes to recombination within the centromeres. Specifically, we will generate F2 Col x Ler populations from DNA methylation mutants and sequence and assemble the centromeres. We will also use the nanopore reads to directly test for DNA methylation within the centromeres.

Together this work has relevance for precision agriculture and unlocking recombination in centromere-proximal regions of crop chromosomes. Understanding how to increase meiotic recombination close to the centromeres has the potential to accelerate strain improvement and reduce linkage drag.

Publications

10 25 50