Long-read sequencing the HTT CAG repeat

Lead Research Organisation: CARDIFF UNIVERSITY
Department Name: School of Medicine

Abstract

Huntington's disease (HD) is an autosomal dominant neurodegeneration caused by a CAG repeat expansion in the HTT gene. In HD, the greater the repeat number the earlier the onset of disease, however, repeat number only explains approximately 50% of the variance in symptom onset. One important cis factor seems to be the sequence of the CAG repeat: pure CAG tracts are associated with earlier disease onset, whereas interruption of the CAG tract by other codons is associated with later disease onset (Massey, McAllister, Jones unpublished data). Furthermore, allelic phasing is important when trying to untangle the role of genetic variants in disease, however we are not able to phase HTT alleles reliably in our Illumina-based exome sequencing due to short read lengths. We are collaborating with Prof Monckton in Glasgow to use Illumina miSeq technology which gives reads of 300bp across the HTT repeat, but this will not work in all HTT alleles and is not long enough to read our cell and animal model alleles which contain >120 tandem repeats. Hence there is a pressing need to establish other methods that can reliably read through the HTT expanded alleles and other long repetitive alleles. We propose to use long-read sequencing on the state-of-the-art PacBio Sequel machine to establish the size and distribution of CAG repeats and the phase of disease-modifying CAG repeat interruptions in HTT alleles from HD patients.

In addition, little is known about the role of epigenetic variation at the HTT locus in HD pathogenesis, however, due to its critical role in regulating gene expression, differential epigenetic changes correlated with repeat length may well be relevant to disease. Emerging evidence suggests that one such type of variation, DNA methylation, is associated with age of onset of HD and has important implications for transgenerational effects in HD. When paired with Cas9-based target capture technology, next-generation sequencing platforms can be used to detect the methylation status of a specific gene or gene panel without PCR. Furthermore, the throughput of this technique can be scaled up to hundreds of samples with the inclusion of barcoded adaptors and provide thousands of reads per sample.

Hypothesis and aims

Repeat sequence, structure and epigenetic modification affect the somatic stability of the HTT CAG repeat and pathogenesis of HD

Aim: to characterise the expansion, phase and epigenetic status of the HTT CAG repeat in samples derived from HD patients

Publications

10 25 50
 
Title Method for generating libraries of HTT repeat amplicons for long read sequencing 
Description We have validated a method for amplifying and sequencing a large library of CAG repeat-containing HTT amplicons in patients with early and late onset HD. The method is based on nested PCR, barcoding and single molecule long read sequencing. 
Type Of Material Biological samples 
Year Produced 2019 
Provided To Others? Yes  
Impact This method will allow us to characterise the phase and sequence of the CAG repeat and interruptions in both wild-type and expanded alleles in HD patients with early and late onset. This information will afford insight into the dynamics of repeat expansion and will inform the direction of further research. In addition, the method is adaptable to other repeats and therefore other repeat disorders. Furthermore, it is applicable to any cellular model and as such will be widely used to investigate repeat disease pathogenesis throughout the centre. 
 
Title Long-read sequencing data set of HTT amplicons from early and late onset HD patients 
Description We plan to long-read sequence HTT amplicons of samples from a cohort of 500 HD patients with late or early onset. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact We plan to use this data to phase the WT and expanded alleles, characterise interruption sequences, count CAGs and determine somatic repeat instability on this cohort and various iPSC and animal models of HD. Together this will help use build a picture of repeat expansion dynamics in HD and will inform the direction of further research into disease modifying drug targets.