Novel statistical approaches for the characterization of genomic structural rearrangements in cancers from high-throughput genome sequencing data
Lead Research Organisation:
University of Oxford
Department Name: Wellcome Trust Centre for Human Genetics
Abstract
Structural alterations are an important class of genetic mutations that arise due to the loss, gain or rearrangement of DNA segments. In cancer, these structural alterations can alter or disrupt the normal function of important genes and the biological systems in which they participate that may contribute to disease initiation, progression and later confer resistance to therapy and contribute to the spread of the disease to other systems. These structural alterations can be detected using high-throughput genome sequencing on a genome-wide scale at the theoretical resolution of a single DNA nucleotide.
Various international collaborative efforts have enabled extensive catalogues of structural alterations to be built for various cancers and so that the effect of these structural variations can now investigated alongside other genetic abnormalities in studies of disease etiology, prognosis and therapeutic effectiveness. Yet, it is interesting to note though that despite the fact that we have analysed many cancers, the structural analysis of new cancer specimens is conducted no differently from the very first cancer that was studied. Existing knowledge is not utilised in spite of the fact that many structural variants are recurrent in cancer.
The aim of this project is to construct libraries of cancer-related structural alterations based upon existing data resources and to develop advanced statistical algorithms that utilise these libraries to identify and classify structural variants from genome sequencing data of new patients. These algorithms will allow us to identify recurring patterns of structural abnormalities that are shared by patients and whether these relate to particular genetic and clinical features as well as track disease progression where multiple specimens are obtained from the same patient. These algorithms may enable patients to be identified who maybe suitable for novel targeted therapies or treatment regimes.
Various international collaborative efforts have enabled extensive catalogues of structural alterations to be built for various cancers and so that the effect of these structural variations can now investigated alongside other genetic abnormalities in studies of disease etiology, prognosis and therapeutic effectiveness. Yet, it is interesting to note though that despite the fact that we have analysed many cancers, the structural analysis of new cancer specimens is conducted no differently from the very first cancer that was studied. Existing knowledge is not utilised in spite of the fact that many structural variants are recurrent in cancer.
The aim of this project is to construct libraries of cancer-related structural alterations based upon existing data resources and to develop advanced statistical algorithms that utilise these libraries to identify and classify structural variants from genome sequencing data of new patients. These algorithms will allow us to identify recurring patterns of structural abnormalities that are shared by patients and whether these relate to particular genetic and clinical features as well as track disease progression where multiple specimens are obtained from the same patient. These algorithms may enable patients to be identified who maybe suitable for novel targeted therapies or treatment regimes.
Technical Summary
The objective of this project is to develop advanced statistical algorithms to address limitations in existing methods for analysing cancer genome sequencing data. The aim is to build a set of related robust statistical tools for genomic copy number profiling of cancer that can operate using a variety of data platforms (array, whole genome or targeted sequencing) and in a range of experimental scenarios (single or multiple specimens, cohort or obtained from the same patient, single cell, etc). This tool will use multi-scale techniques to decompose genomic profiles into different scales of interpretation and data augmentation techniques to integrate the use of data from previous cancer studies through a series of cancer karyotype libraries that will be constructed as part of this project. These reference libraries will contain genomic copy number profiles and clinical information for patients that have previously been examined and will be act as ``prior information" in the classification of new patient specimens. The algorithms will attempt to relate copy number changes in the unclassified patient to previously observed events in the reference libraries allowing patients to be clustered and common clinicopathological and other genomic features to be investigated. These investigations may lead to plausible patient stratification groups that could be the basis of targeted therapies and clinical management strategies.
Planned Impact
The main beneficiaries of this research will be:
1. Cancer researchers using genome sequencing as part of their research,
2. Sequencing Services (including MRC Hubs and commercial organisation) offering cancer sequencing services,
3. Bioinformatics software developers who wish to incorporate the ideas into their commercial platforms.
1. Cancer researchers using genome sequencing as part of their research,
2. Sequencing Services (including MRC Hubs and commercial organisation) offering cancer sequencing services,
3. Bioinformatics software developers who wish to incorporate the ideas into their commercial platforms.
Publications
Campbell K
(2016)
A descriptive marker gene approach to single-cell pseudotime inference
Campbell KR
(2019)
A descriptive marker gene approach to single-cell pseudotime inference.
in Bioinformatics (Oxford, England)
Hu Z
(2017)
A pan-cancer genome-wide analysis reveals tumour dependencies by induction of nonsense-mediated decay.
in Nature communications
Hu Z
(2021)
CIDER: an interpretable meta-clustering framework for single-cell RNA-seq data integration and evaluation.
in Genome biology
Taylor JC
(2015)
Factors influencing success of clinical genome sequencing across a broad spectrum of disorders.
in Nature genetics
Campbell KR
(2016)
Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference.
in PLoS computational biology
Mourikis TP
(2019)
Patient-specific cancer genes contribute to recurrently perturbed pathways and establish therapeutic vulnerabilities in esophageal adenocarcinoma.
in Nature communications
Zurauskiene J
(2015)
pcaReduce: Hierarchical Clustering of Single Cell Transcriptional Profiles
Žurauskiene J
(2016)
pcaReduce: hierarchical clustering of single cell transcriptional profiles.
in BMC bioinformatics
Hellner K
(2016)
Premalignant SOX2 overexpression in the fallopian tubes of ovarian cancer patients: Discovery and validation studies.
in EBioMedicine
Campbell KR
(2017)
Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers.
in Wellcome open research
Miranda F
(2016)
Salt-Inducible Kinase 2 Couples Ovarian Cancer Cell Metabolism with Survival at the Adipocyte-Rich Metastatic Niche.
in Cancer cell
Titsias MK
(2016)
Statistical Inference in Hidden Markov Models Using k-Segment Constraints.
in Journal of the American Statistical Association
Campbell KR
(2017)
switchde: inference of switch-like differential expression along single-cell trajectories.
in Bioinformatics (Oxford, England)
Titsias M
(2015)
The Hamming Ball Sampler
Titsias M
(2020)
The Hamming Ball Sampler
Titsias MK
(2017)
The Hamming Ball Sampler.
in Journal of the American Statistical Association
Campbell KR
(2018)
Uncovering pseudotemporal trajectories with covariates from single cell and bulk expression data.
in Nature communications
Cazier JB
(2014)
Whole-genome sequencing of bladder cancers reveals somatic CDKN1A mutations and clinicopathological associations with mutation burden.
in Nature communications
Pierson E
(2015)
ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis.
in Genome biology
Description | Exploiting ovarian cancer genetic evolution for personalisation of therapy |
Organisation | University of Oxford |
Department | Wellcome Trust Centre for Human Genetics |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | Computational analysis of whole genome sequencing data and the development of new experimental methodologies for sequencing microscopic residual disease samples in cancer. |
Collaborator Contribution | Whole genome sequencing of ovarian cancer patients and shared postdoctoral and graduate researchers. |
Impact | 10.1016/j.ebiom.2016.06.048 10.1016/j.ccell.2016.06.020 |
Start Year | 2014 |