Identifying emerging subclones in individual colorectal cancer patients by mapping the spatio-temporal evolution of the tumour phenotype

Lead Research Organisation: Institute of Cancer Research
Department Name: Division of Molecular Pathology

Abstract

The changes in the DNA that give rise to a tumour are not the same changes that make the tumour progress further and become metastatic. This is because a cancer evolves, and hence continuously changes over time, shaped by selective forces that favour more and more aggressive cancer cells. Late but clinically relevant changes may not be detectable everywhere in the cancer because of the so-called intra-tumour heterogeneity, or variation of cells within the same malignancy. This is critical because those late changes may be the most clinically relevant ones for the progression of the disease and therefore focusing on this would maximise the benefit to the patient.

Current genomic studies have limited power in detecting late but clinically relevant genomic alterations because they are either based on a single sample per patient (not informing on intra-tumour heterogeneity), or because they employed a small number of patients and therefore the statistics is poor. A major problem is that each tumour is different, and the large variation between patients makes the genomic analysis very challenging. To date, the most common strategy to handle this complexity is using statistics on large groups of patients, thus loosing the personalised focus and requiring huge costs. I argue that a radically different strategy, based on a new type of sampling and analysis, is necessary if we want to reliably identify those newly emerging subpopulations in cancers that need prioritised treatment. Moreover, we need to do this patient by patient to maximise clinical impact.

Every cancer is the result of a unique and extremely complex evolutionary process. This is the crucial yet overlooked reason why the inter-patient variation in cancer is so large and consequently the statistics in cancer genomic studies is often poor. Whereas a few "usual suspects" driver alterations have been identified, the long tail of many yet rare putative drivers is a major obstacle to personalised medicine.

Here I propose to refocus the analysis on individual evolutionary processes in each patient. Although this seems extremely challenging, I argue that this can be achieved using the paradigm of cancer evolution. Indeed also in evolutionary biology we only have a single instance of the evolutionary process: the evolution of life on earth happened only once. Other scientific fields, such as cosmology, are based on observations from unique processes (our universe, the only one we can observe), but despite this limitation, they can attain extraordinary predictive power. This is achieved through the integration of data and theory, to obtain a mechanistic understanding of a phenomenon, rather than for example measuring statistical correlations on a group, which does not necessarily imply understanding of the system.

Here I propose a novel approach based on integrating a new strategy to collect samples from human tumours, with novel and powerful analysis methods that are based on the physics and mathematics of how tumours grow. Together, this multi-disciplinary approach allows identifying and characterising newly emerging and potentially aggressive subpopulations in an individual human tumour, one patient at a time. This personalises the analysis of patient data and allows tailoring the treatment not only to a specific patient, but also to a specific cancer cell subpopulation, the most clinically relevant.

Technical Summary

Aim 1: We will isolate 300 glands as previously described (Sottoriva et al., 2015; 2013) from 4 regions of 10 stage IV, 10 stage II/III carcinomas and 10 adenomas, and perform whole-genome sequencing at 30x in each gland and matched normal. We will use Platypus (Rimmer et al., 2014) to call somatic nucleotide variants (SNVs) and reconstruct copy number profiles using VarScan2 (Anderson et al., 2012). We will perform ATAC-seq in each sample and call chromatin states with MACS (Zhang et al., 2008). We will reconstruct the phylogenies using maximum likelihood (Guindon et al., 2010). We will check for mutations in mismatch repair genes and analyse the mutational signatures in each branch (Alexandrov et al., 2013).

Aim 2: We will develop phylogenomic and population genetics methods to perform rigorous measurements in the phylogeny and identify distinct subclones with Poisson changepoint analysis (Killick et al., 2012), tree balancing algorithms (Heard, 1992) and dN/dS ratio analysis. We will use SNVs as molecular clock to time CNAs (Durinck et al., 2011). We will also develop a statistical model to time chromatin changes based on the observation that mutations accumulate differently in areas of open versus close chromatin (Polak et al., 2015). We will reveal recent clonal expansions in the tree by TMRCA analysis in each subclone (P Donnelly and Tavaré, 2003).

Aim 3: We will apply the methods from Aim 2 to the ITH data in Aim 1. We will detect subclones and time their alterations to identify subclonal drivers. We will validate them at the transcriptomic level using qPCR or IHC when possible. Using the results from Aim 2, we will characterise the dynamic properties of each subclone:
- How fast is a subclone growing?
- How long has the emerging subclone been there?
- Has the subclone evolution punctuated or gradual?
- How fast is the clone mutating?
- How big is the clone in the tumour?

My preliminary data show the feasibility of these methods in my hands

Planned Impact

The users of the results that will be generated through this proposal are multiple and from radically different disciplines:

1. This study will lead to the identification of new potentially targetable driver alterations that remain consistently subclonal in CRC because they occur late or because they may be forced to remain non-dominant by the evolutionary dynamics. As the effect of these drivers is directly measured in the tumour, I argue these are more solid candidates for drug discovery. The Institute of Cancer Research has an established track record on the development of targeted drugs, especially small molecules and the results of this study could give rise to new lines of research focussing on new genes that drive late-emerging clones in CRC.

2. Scientists working on animal and tissue culture model will use the list of new subclonal driver to explore new avenues and potentially reconsider genes that, although may have emerge in models, had not been picked up by current genomic studies.

3. This study will impact the way cancer genomic scientists interpret the data in light of personalised therapy. The current rationale behind the identification of cancer driving alterations is that those appear more often than by chance in a large cohort. The results in this proposal will offer a different method based on the actual behaviour of cancer clones. The evolutionary methods I propose will identify a driver alteration, even subclonal, based on the observation that the clone with that alteration is growing more aggressively than the rest of the tumour. Hence, the results in this proposal would radically impact the way we analyse existing cancer genomic data, as well as the way we design sampling methods in cancer. Cancer genomics scientists may be able to extrapolate more information from their previously generated datasets.

4. The methods we propose here allow not only analysing the data with the proper statistical rigour, but also allow new power calculations, based on this initial dataset, on the number of samples per patient required to detect emerging subclones efficiently. As collection of multiple biopsies per patient poses serious ethical and medical concerns, the approach we propose could contribute to minimise the impact of biopsies while maximising the detection power. Those who design sample collection, especially within clinical trials, would gain important benefit in deciding how many biopsies are necessary for an eventual evolutionary analysis.

5. The new measurements we propose in this study, especially on the characterisation of the emerging subclones (Aim 3), represent new candidate measurements for the development of predictive and prognostic biomarkers in cancer. The biomarker community will benefit from new tools to test the predictive power of these new measurements, even applicable to single samples such as the copy number timing analysis (e.g. is punctuated versus gradual accumulation of copy number or chromatin changes prognostic or predictive?).
 
Description Wellcome Trust Investigator Award
Amount £1,176,028 (GBP)
Funding ID 202778/B/16/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 02/2017 
End 02/2022
 
Title CHESS: Cancer HEterogeneity with Spatial Simulations 
Description This is a spatial stochastic cellular automaton model of tumour growth that accounts for somatic mutations, selection, drift and spatial constrains, to simulate multi-region sequencing data derived from spatial sampling of a tumour. This also includes a statistical inference framework that considers the spatial effects of a growing tumour and allows inferring the evolutionary dynamics from patient genomic data. 
Type Of Material Computer model/algorithm 
Year Produced 2019 
Provided To Others? Yes  
Impact The method has just been made available to the public, so no notable impact yet. 
URL https://github.com/kchkhaidze/CHESS.cpp
 
Title MOBSTER (MOdel Based cluSTing in cancER) 
Description In a recent work we have developed a new model-based approach to carry out subclonal deconvolution from the site frequency spectrum of somatic mutations. This new method integrates, for the first time, an explicit model for neutral evolutionary forces that participate in clonal expansions; in that work we have also shown that our method improves largely over competing data-driven methods. In this Software paper we present mobster, an open source R package built around our new deconvolution approach, which provides several functions to plot data and fit models, assess their confidence and compute further evolutionary analyses that relate to subclonal deconvolution. 
Type Of Material Data analysis technique 
Year Produced 2020 
Provided To Others? Yes  
Impact MOBSTER is now being used by us and others in several large-scale genomics projects as a central tool for subclonal reconstruction. 
URL https://github.com/sottorivalab/MOBSTER
 
Title The co-evolution of the genome, epigenome and transcriptome in colorectal cancer 
Description We collected 1,373 samples from 30 primary cancers and 9 concomitant adenomas and generated 1,212 chromatin accessibility profiles, 527 whole-genomes and 297 whole-transcriptomes. This dataset provides a comprehensive map of genetic, epigenetic and transcriptomic heterogeneity in colon cancer, with fundamental implications for our understanding of disease biology. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact We are finalising the initial publications based on this dataset. Meanwhile we have released the calls and upon publication we will release the raw data as well. 
URL https://data.mendeley.com/datasets/dvv6kf856g/2
 
Description Genomics England - Colorectal Cancer GeCIP 
Organisation Genomics England
Country United Kingdom 
Sector Public 
PI Contribution My lab is one of the leading groups in the Genomics England colorectal cancer working group. We apply our computational and mathematical methods to measure tumour evolution in large scale whole-genome datasets.
Collaborator Contribution The other labs in the working group work collaboratively as part of the consortium.
Impact We are still working on the data analysis. This is a highly multidisciplinary collaboration, involving genetics, bioinformatics, mathematics, statistics, machine learning, molecular biology.
Start Year 2017
 
Description Luca Magnani 
Organisation Imperial College London
Country United Kingdom 
Sector Academic/University 
PI Contribution We provide cancer evolution and bioinformatics expertise.
Collaborator Contribution Partner has expertise in cancer epigenetics data generation and interpetation.
Impact Partnership allowed the successful generation of the first large batch of ATAC-seq data from single colorectal cancer glands.
Start Year 2016
 
Description Trevor Graham 
Organisation Queen Mary University of London
Department Barts Cancer Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution We provide data generation, analysis and modelling of colorectal cancer samples. We interpret the data in light of tumour evolution to construct genotype-phenotype maps in colorectal cancer.
Collaborator Contribution Partners provide pathologically annotated samples, as well as expertise in cancer evolution.
Impact We now have a large set of single-gland samples from colorectal cancer resections for which we are performing whole-genome sequencing and ATAC-seq.
Start Year 2016
 
Description Trevor Graham 
Organisation University College Hospital
Country United Kingdom 
Sector Hospitals 
PI Contribution We provide data generation, analysis and modelling of colorectal cancer samples. We interpret the data in light of tumour evolution to construct genotype-phenotype maps in colorectal cancer.
Collaborator Contribution Partners provide pathologically annotated samples, as well as expertise in cancer evolution.
Impact We now have a large set of single-gland samples from colorectal cancer resections for which we are performing whole-genome sequencing and ATAC-seq.
Start Year 2016