Identifying emerging subclones in individual colorectal cancer patients by mapping the spatio-temporal evolution of the tumour phenotype
Lead Research Organisation:
Institute of Cancer Research
Department Name: Division of Molecular Pathology
Abstract
The changes in the DNA that give rise to a tumour are not the same changes that make the tumour progress further and become metastatic. This is because a cancer evolves, and hence continuously changes over time, shaped by selective forces that favour more and more aggressive cancer cells. Late but clinically relevant changes may not be detectable everywhere in the cancer because of the so-called intra-tumour heterogeneity, or variation of cells within the same malignancy. This is critical because those late changes may be the most clinically relevant ones for the progression of the disease and therefore focusing on this would maximise the benefit to the patient.
Current genomic studies have limited power in detecting late but clinically relevant genomic alterations because they are either based on a single sample per patient (not informing on intra-tumour heterogeneity), or because they employed a small number of patients and therefore the statistics is poor. A major problem is that each tumour is different, and the large variation between patients makes the genomic analysis very challenging. To date, the most common strategy to handle this complexity is using statistics on large groups of patients, thus loosing the personalised focus and requiring huge costs. I argue that a radically different strategy, based on a new type of sampling and analysis, is necessary if we want to reliably identify those newly emerging subpopulations in cancers that need prioritised treatment. Moreover, we need to do this patient by patient to maximise clinical impact.
Every cancer is the result of a unique and extremely complex evolutionary process. This is the crucial yet overlooked reason why the inter-patient variation in cancer is so large and consequently the statistics in cancer genomic studies is often poor. Whereas a few "usual suspects" driver alterations have been identified, the long tail of many yet rare putative drivers is a major obstacle to personalised medicine.
Here I propose to refocus the analysis on individual evolutionary processes in each patient. Although this seems extremely challenging, I argue that this can be achieved using the paradigm of cancer evolution. Indeed also in evolutionary biology we only have a single instance of the evolutionary process: the evolution of life on earth happened only once. Other scientific fields, such as cosmology, are based on observations from unique processes (our universe, the only one we can observe), but despite this limitation, they can attain extraordinary predictive power. This is achieved through the integration of data and theory, to obtain a mechanistic understanding of a phenomenon, rather than for example measuring statistical correlations on a group, which does not necessarily imply understanding of the system.
Here I propose a novel approach based on integrating a new strategy to collect samples from human tumours, with novel and powerful analysis methods that are based on the physics and mathematics of how tumours grow. Together, this multi-disciplinary approach allows identifying and characterising newly emerging and potentially aggressive subpopulations in an individual human tumour, one patient at a time. This personalises the analysis of patient data and allows tailoring the treatment not only to a specific patient, but also to a specific cancer cell subpopulation, the most clinically relevant.
Current genomic studies have limited power in detecting late but clinically relevant genomic alterations because they are either based on a single sample per patient (not informing on intra-tumour heterogeneity), or because they employed a small number of patients and therefore the statistics is poor. A major problem is that each tumour is different, and the large variation between patients makes the genomic analysis very challenging. To date, the most common strategy to handle this complexity is using statistics on large groups of patients, thus loosing the personalised focus and requiring huge costs. I argue that a radically different strategy, based on a new type of sampling and analysis, is necessary if we want to reliably identify those newly emerging subpopulations in cancers that need prioritised treatment. Moreover, we need to do this patient by patient to maximise clinical impact.
Every cancer is the result of a unique and extremely complex evolutionary process. This is the crucial yet overlooked reason why the inter-patient variation in cancer is so large and consequently the statistics in cancer genomic studies is often poor. Whereas a few "usual suspects" driver alterations have been identified, the long tail of many yet rare putative drivers is a major obstacle to personalised medicine.
Here I propose to refocus the analysis on individual evolutionary processes in each patient. Although this seems extremely challenging, I argue that this can be achieved using the paradigm of cancer evolution. Indeed also in evolutionary biology we only have a single instance of the evolutionary process: the evolution of life on earth happened only once. Other scientific fields, such as cosmology, are based on observations from unique processes (our universe, the only one we can observe), but despite this limitation, they can attain extraordinary predictive power. This is achieved through the integration of data and theory, to obtain a mechanistic understanding of a phenomenon, rather than for example measuring statistical correlations on a group, which does not necessarily imply understanding of the system.
Here I propose a novel approach based on integrating a new strategy to collect samples from human tumours, with novel and powerful analysis methods that are based on the physics and mathematics of how tumours grow. Together, this multi-disciplinary approach allows identifying and characterising newly emerging and potentially aggressive subpopulations in an individual human tumour, one patient at a time. This personalises the analysis of patient data and allows tailoring the treatment not only to a specific patient, but also to a specific cancer cell subpopulation, the most clinically relevant.
Technical Summary
Aim 1: We will isolate 300 glands as previously described (Sottoriva et al., 2015; 2013) from 4 regions of 10 stage IV, 10 stage II/III carcinomas and 10 adenomas, and perform whole-genome sequencing at 30x in each gland and matched normal. We will use Platypus (Rimmer et al., 2014) to call somatic nucleotide variants (SNVs) and reconstruct copy number profiles using VarScan2 (Anderson et al., 2012). We will perform ATAC-seq in each sample and call chromatin states with MACS (Zhang et al., 2008). We will reconstruct the phylogenies using maximum likelihood (Guindon et al., 2010). We will check for mutations in mismatch repair genes and analyse the mutational signatures in each branch (Alexandrov et al., 2013).
Aim 2: We will develop phylogenomic and population genetics methods to perform rigorous measurements in the phylogeny and identify distinct subclones with Poisson changepoint analysis (Killick et al., 2012), tree balancing algorithms (Heard, 1992) and dN/dS ratio analysis. We will use SNVs as molecular clock to time CNAs (Durinck et al., 2011). We will also develop a statistical model to time chromatin changes based on the observation that mutations accumulate differently in areas of open versus close chromatin (Polak et al., 2015). We will reveal recent clonal expansions in the tree by TMRCA analysis in each subclone (P Donnelly and Tavaré, 2003).
Aim 3: We will apply the methods from Aim 2 to the ITH data in Aim 1. We will detect subclones and time their alterations to identify subclonal drivers. We will validate them at the transcriptomic level using qPCR or IHC when possible. Using the results from Aim 2, we will characterise the dynamic properties of each subclone:
- How fast is a subclone growing?
- How long has the emerging subclone been there?
- Has the subclone evolution punctuated or gradual?
- How fast is the clone mutating?
- How big is the clone in the tumour?
My preliminary data show the feasibility of these methods in my hands
Aim 2: We will develop phylogenomic and population genetics methods to perform rigorous measurements in the phylogeny and identify distinct subclones with Poisson changepoint analysis (Killick et al., 2012), tree balancing algorithms (Heard, 1992) and dN/dS ratio analysis. We will use SNVs as molecular clock to time CNAs (Durinck et al., 2011). We will also develop a statistical model to time chromatin changes based on the observation that mutations accumulate differently in areas of open versus close chromatin (Polak et al., 2015). We will reveal recent clonal expansions in the tree by TMRCA analysis in each subclone (P Donnelly and Tavaré, 2003).
Aim 3: We will apply the methods from Aim 2 to the ITH data in Aim 1. We will detect subclones and time their alterations to identify subclonal drivers. We will validate them at the transcriptomic level using qPCR or IHC when possible. Using the results from Aim 2, we will characterise the dynamic properties of each subclone:
- How fast is a subclone growing?
- How long has the emerging subclone been there?
- Has the subclone evolution punctuated or gradual?
- How fast is the clone mutating?
- How big is the clone in the tumour?
My preliminary data show the feasibility of these methods in my hands
Planned Impact
The users of the results that will be generated through this proposal are multiple and from radically different disciplines:
1. This study will lead to the identification of new potentially targetable driver alterations that remain consistently subclonal in CRC because they occur late or because they may be forced to remain non-dominant by the evolutionary dynamics. As the effect of these drivers is directly measured in the tumour, I argue these are more solid candidates for drug discovery. The Institute of Cancer Research has an established track record on the development of targeted drugs, especially small molecules and the results of this study could give rise to new lines of research focussing on new genes that drive late-emerging clones in CRC.
2. Scientists working on animal and tissue culture model will use the list of new subclonal driver to explore new avenues and potentially reconsider genes that, although may have emerge in models, had not been picked up by current genomic studies.
3. This study will impact the way cancer genomic scientists interpret the data in light of personalised therapy. The current rationale behind the identification of cancer driving alterations is that those appear more often than by chance in a large cohort. The results in this proposal will offer a different method based on the actual behaviour of cancer clones. The evolutionary methods I propose will identify a driver alteration, even subclonal, based on the observation that the clone with that alteration is growing more aggressively than the rest of the tumour. Hence, the results in this proposal would radically impact the way we analyse existing cancer genomic data, as well as the way we design sampling methods in cancer. Cancer genomics scientists may be able to extrapolate more information from their previously generated datasets.
4. The methods we propose here allow not only analysing the data with the proper statistical rigour, but also allow new power calculations, based on this initial dataset, on the number of samples per patient required to detect emerging subclones efficiently. As collection of multiple biopsies per patient poses serious ethical and medical concerns, the approach we propose could contribute to minimise the impact of biopsies while maximising the detection power. Those who design sample collection, especially within clinical trials, would gain important benefit in deciding how many biopsies are necessary for an eventual evolutionary analysis.
5. The new measurements we propose in this study, especially on the characterisation of the emerging subclones (Aim 3), represent new candidate measurements for the development of predictive and prognostic biomarkers in cancer. The biomarker community will benefit from new tools to test the predictive power of these new measurements, even applicable to single samples such as the copy number timing analysis (e.g. is punctuated versus gradual accumulation of copy number or chromatin changes prognostic or predictive?).
1. This study will lead to the identification of new potentially targetable driver alterations that remain consistently subclonal in CRC because they occur late or because they may be forced to remain non-dominant by the evolutionary dynamics. As the effect of these drivers is directly measured in the tumour, I argue these are more solid candidates for drug discovery. The Institute of Cancer Research has an established track record on the development of targeted drugs, especially small molecules and the results of this study could give rise to new lines of research focussing on new genes that drive late-emerging clones in CRC.
2. Scientists working on animal and tissue culture model will use the list of new subclonal driver to explore new avenues and potentially reconsider genes that, although may have emerge in models, had not been picked up by current genomic studies.
3. This study will impact the way cancer genomic scientists interpret the data in light of personalised therapy. The current rationale behind the identification of cancer driving alterations is that those appear more often than by chance in a large cohort. The results in this proposal will offer a different method based on the actual behaviour of cancer clones. The evolutionary methods I propose will identify a driver alteration, even subclonal, based on the observation that the clone with that alteration is growing more aggressively than the rest of the tumour. Hence, the results in this proposal would radically impact the way we analyse existing cancer genomic data, as well as the way we design sampling methods in cancer. Cancer genomics scientists may be able to extrapolate more information from their previously generated datasets.
4. The methods we propose here allow not only analysing the data with the proper statistical rigour, but also allow new power calculations, based on this initial dataset, on the number of samples per patient required to detect emerging subclones efficiently. As collection of multiple biopsies per patient poses serious ethical and medical concerns, the approach we propose could contribute to minimise the impact of biopsies while maximising the detection power. Those who design sample collection, especially within clinical trials, would gain important benefit in deciding how many biopsies are necessary for an eventual evolutionary analysis.
5. The new measurements we propose in this study, especially on the characterisation of the emerging subclones (Aim 3), represent new candidate measurements for the development of predictive and prognostic biomarkers in cancer. The biomarker community will benefit from new tools to test the predictive power of these new measurements, even applicable to single samples such as the copy number timing analysis (e.g. is punctuated versus gradual accumulation of copy number or chromatin changes prognostic or predictive?).
Publications
Caravagna G
(2020)
The MOBSTER R package for tumour subclonal deconvolution from bulk DNA whole-genome sequencing data.
in BMC bioinformatics
Nawaz S
(2019)
Analysis of tumour ecological balance reveals resource-dependent adaptive strategies of ovarian cancer.
in EBioMedicine
Heide T
(2022)
The co-evolution of the genome and epigenome in colorectal cancer.
in Nature
Househam J
(2022)
Phenotypic plasticity and genetic control in colorectal cancer evolution.
in Nature
Chen B
(2023)
Contribution of pks+ E. coli mutations to colorectal carcinogenesis
in Nature Communications
Caravagna G
(2020)
Subclonal reconstruction of tumors by using machine learning and population genetics.
in Nature genetics
Chkhaidze K
(2019)
Spatially constrained tumour growth affects the patterns of clonal selection and neutral drift in cancer genomic data.
in PLoS computational biology
Househam J
(2021)
Phenotypic plasticity and genetic control in colorectal cancer evolution
Caravagna G
(2019)
Model-based tumor subclonal reconstruction
Description | Wellcome Trust Investigator Award |
Amount | £1,176,028 (GBP) |
Funding ID | 202778/B/16/Z |
Organisation | Wellcome Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 02/2017 |
End | 02/2022 |
Title | CHESS: Cancer HEterogeneity with Spatial Simulations |
Description | This is a spatial stochastic cellular automaton model of tumour growth that accounts for somatic mutations, selection, drift and spatial constrains, to simulate multi-region sequencing data derived from spatial sampling of a tumour. This also includes a statistical inference framework that considers the spatial effects of a growing tumour and allows inferring the evolutionary dynamics from patient genomic data. |
Type Of Material | Computer model/algorithm |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | The method has just been made available to the public, so no notable impact yet. |
URL | https://github.com/kchkhaidze/CHESS.cpp |
Title | MOBSTER (MOdel Based cluSTing in cancER) |
Description | In a recent work we have developed a new model-based approach to carry out subclonal deconvolution from the site frequency spectrum of somatic mutations. This new method integrates, for the first time, an explicit model for neutral evolutionary forces that participate in clonal expansions; in that work we have also shown that our method improves largely over competing data-driven methods. In this Software paper we present mobster, an open source R package built around our new deconvolution approach, which provides several functions to plot data and fit models, assess their confidence and compute further evolutionary analyses that relate to subclonal deconvolution. |
Type Of Material | Data analysis technique |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | MOBSTER is now being used by us and others in several large-scale genomics projects as a central tool for subclonal reconstruction. |
URL | https://github.com/sottorivalab/MOBSTER |
Title | The co-evolution of the genome, epigenome and transcriptome in colorectal cancer |
Description | We collected 1,373 samples from 30 primary cancers and 9 concomitant adenomas and generated 1,212 chromatin accessibility profiles, 527 whole-genomes and 297 whole-transcriptomes. This dataset provides a comprehensive map of genetic, epigenetic and transcriptomic heterogeneity in colon cancer, with fundamental implications for our understanding of disease biology. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | We are finalising the initial publications based on this dataset. Meanwhile we have released the calls and upon publication we will release the raw data as well. |
URL | https://data.mendeley.com/datasets/dvv6kf856g/2 |
Description | Genomics England - Colorectal Cancer GeCIP |
Organisation | Genomics England |
Country | United Kingdom |
Sector | Public |
PI Contribution | My lab is one of the leading groups in the Genomics England colorectal cancer working group. We apply our computational and mathematical methods to measure tumour evolution in large scale whole-genome datasets. |
Collaborator Contribution | The other labs in the working group work collaboratively as part of the consortium. |
Impact | We are still working on the data analysis. This is a highly multidisciplinary collaboration, involving genetics, bioinformatics, mathematics, statistics, machine learning, molecular biology. |
Start Year | 2017 |
Description | Luca Magnani |
Organisation | Imperial College London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We provide cancer evolution and bioinformatics expertise. |
Collaborator Contribution | Partner has expertise in cancer epigenetics data generation and interpetation. |
Impact | Partnership allowed the successful generation of the first large batch of ATAC-seq data from single colorectal cancer glands. |
Start Year | 2016 |
Description | Trevor Graham |
Organisation | Queen Mary University of London |
Department | Barts Cancer Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We provide data generation, analysis and modelling of colorectal cancer samples. We interpret the data in light of tumour evolution to construct genotype-phenotype maps in colorectal cancer. |
Collaborator Contribution | Partners provide pathologically annotated samples, as well as expertise in cancer evolution. |
Impact | We now have a large set of single-gland samples from colorectal cancer resections for which we are performing whole-genome sequencing and ATAC-seq. |
Start Year | 2016 |
Description | Trevor Graham |
Organisation | University College Hospital |
Country | United Kingdom |
Sector | Hospitals |
PI Contribution | We provide data generation, analysis and modelling of colorectal cancer samples. We interpret the data in light of tumour evolution to construct genotype-phenotype maps in colorectal cancer. |
Collaborator Contribution | Partners provide pathologically annotated samples, as well as expertise in cancer evolution. |
Impact | We now have a large set of single-gland samples from colorectal cancer resections for which we are performing whole-genome sequencing and ATAC-seq. |
Start Year | 2016 |