Novel statistical approaches for the characterization of genomic structural rearrangements in cancers from high-throughput genome sequencing data

Lead Research Organisation: University of Oxford

Department Name: Wellcome Trust Centre for Human Genetics

Abstract

Structural alterations are an important class of genetic mutations that arise due to the loss, gain or rearrangement of DNA segments. In cancer, these structural alterations can alter or disrupt the normal function of important genes and the biological systems in which they participate that may contribute to disease initiation, progression and later confer resistance to therapy and contribute to the spread of the disease to other systems. These structural alterations can be detected using high-throughput genome sequencing on a genome-wide scale at the theoretical resolution of a single DNA nucleotide.

Various international collaborative efforts have enabled extensive catalogues of structural alterations to be built for various cancers and so that the effect of these structural variations can now investigated alongside other genetic abnormalities in studies of disease etiology, prognosis and therapeutic effectiveness. Yet, it is interesting to note though that despite the fact that we have analysed many cancers, the structural analysis of new cancer specimens is conducted no differently from the very first cancer that was studied. Existing knowledge is not utilised in spite of the fact that many structural variants are recurrent in cancer.

The aim of this project is to construct libraries of cancer-related structural alterations based upon existing data resources and to develop advanced statistical algorithms that utilise these libraries to identify and classify structural variants from genome sequencing data of new patients. These algorithms will allow us to identify recurring patterns of structural abnormalities that are shared by patients and whether these relate to particular genetic and clinical features as well as track disease progression where multiple specimens are obtained from the same patient. These algorithms may enable patients to be identified who maybe suitable for novel targeted therapies or treatment regimes.

Technical Summary

The objective of this project is to develop advanced statistical algorithms to address limitations in existing methods for analysing cancer genome sequencing data. The aim is to build a set of related robust statistical tools for genomic copy number profiling of cancer that can operate using a variety of data platforms (array, whole genome or targeted sequencing) and in a range of experimental scenarios (single or multiple specimens, cohort or obtained from the same patient, single cell, etc). This tool will use multi-scale techniques to decompose genomic profiles into different scales of interpretation and data augmentation techniques to integrate the use of data from previous cancer studies through a series of cancer karyotype libraries that will be constructed as part of this project. These reference libraries will contain genomic copy number profiles and clinical information for patients that have previously been examined and will be act as ``prior information" in the classification of new patient specimens. The algorithms will attempt to relate copy number changes in the unclassified patient to previously observed events in the reference libraries allowing patients to be clustered and common clinicopathological and other genomic features to be investigated. These investigations may lead to plausible patient stratification groups that could be the basis of targeted therapies and clinical management strategies.

Planned Impact

The main beneficiaries of this research will be:

1. Cancer researchers using genome sequencing as part of their research,

2. Sequencing Services (including MRC Hubs and commercial organisation) offering cancer sequencing services,

3. Bioinformatics software developers who wish to incorporate the ideas into their commercial platforms.

Funded Value:

£345,267

Funded Period:

Feb 14 - Feb 18

Funder:

MRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

MR/L001411/1

Principal Investigator:

Christopher Yau

Health Category:

Unclassified

Organisations

People	ORCID iD
Christopher Yau (Principal Investigator)	http://orcid.org/0000-0001-7615-8523

Publications

Author Name Title

Publication Date Published

10 25 50

Campbell K (2016) A descriptive marker gene approach to single-cell pseudotime inference

Campbell KR (2019) A descriptive marker gene approach to single-cell pseudotime inference. in Bioinformatics (Oxford, England)

Hu Z (2017) A pan-cancer genome-wide analysis reveals tumour dependencies by induction of nonsense-mediated decay. in Nature communications

LžurauskienÄ? J (2016) Additional file 1 of pcaReduce: hierarchical clustering of single cell transcriptional profiles

Pierson E (2015) Additional file 1 of ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis

Campbell K (2015) Bayesian Gaussian Process Latent Variable Models for pseudotime inference in single-cell RNA-seq data

Hu Z (2021) CIDER: an interpretable meta-clustering framework for single-cell RNA-seq data integration and evaluation. in Genome biology

Taylor JC (2015) Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. in Nature genetics

Chedom-Fotso D (2016) OncoPhase: Quantification of somatic mutation cellular prevalence using phase information

Campbell K (2016) Order under uncertainty: robust differential expression analysis using probabilistic models for pseudotime inference

Campbell KR (2016) Order Under Uncertainty: Robust Differential Expression Analysis Using Probabilistic Models for Pseudotime Inference. in PLoS computational biology

Mourikis TP (2019) Patient-specific cancer genes contribute to recurrently perturbed pathways and establish therapeutic vulnerabilities in esophageal adenocarcinoma. in Nature communications

Mourikis T (2018) Patient-specific detection of cancer genes reveals recurrently perturbed processes in esophageal adenocarcinoma

Zurauskiene J (2015) pcaReduce: Hierarchical Clustering of Single Cell Transcriptional Profiles

Žurauskiene J (2016) pcaReduce: hierarchical clustering of single cell transcriptional profiles. in BMC bioinformatics

Hellner K (2016) Premalignant SOX2 overexpression in the fallopian tubes of ovarian cancer patients: Discovery and validation studies. in EBioMedicine

Campbell K (2016) Probabilistic inference of bifurcations in single-cell data using a hierarchical mixture of factor analysers

Campbell KR (2017) Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers. in Wellcome open research

Miranda F (2016) Salt-Inducible Kinase 2 Couples Ovarian Cancer Cell Metabolism with Survival at the Adipocyte-Rich Metastatic Niche. in Cancer cell

Titsias MK (2016) Statistical Inference in Hidden Markov Models Using k-Segment Constraints. in Journal of the American Statistical Association

Campbell KR (2017) switchde: inference of switch-like differential expression along single-cell trajectories. in Bioinformatics (Oxford, England)

Titsias M (2015) The Hamming Ball Sampler

Titsias M (2020) The Hamming Ball Sampler

Titsias MK (2017) The Hamming Ball Sampler. in Journal of the American Statistical Association

Knight S (2014) The Identification of Further Minimal Regions of Overlap in Chronic Lymphocytic Leukemia Using High-Resolution SNP Arrays in Blood

Hu Z (2020) The Repertoire of Serous Ovarian Cancer Non-genetic Heterogeneity Revealed by Single-Cell Sequencing of Normal Fallopian Tube Epithelial Cells. in Cancer cell

Campbell KR (2018) Uncovering pseudotemporal trajectories with covariates from single cell and bulk expression data. in Nature communications

Cazier JB (2014) Whole-genome sequencing of bladder cancers reveals somatic CDKN1A mutations and clinicopathological associations with mutation burden. in Nature communications

Yau C (2015) ZIFA: Dimensionality reduction for zero-inflated single cell gene expression analysis

Pierson E (2015) ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis. in Genome biology

Collaboration


Description	Exploiting ovarian cancer genetic evolution for personalisation of therapy
Organisation	University of Oxford
Department	Wellcome Trust Centre for Human Genetics
Country	United Kingdom
Sector	Charity/Non Profit
PI Contribution	Computational analysis of whole genome sequencing data and the development of new experimental methodologies for sequencing microscopic residual disease samples in cancer.
Collaborator Contribution	Whole genome sequencing of ovarian cancer patients and shared postdoctoral and graduate researchers.
Impact	10.1016/j.ebiom.2016.06.048 10.1016/j.ccell.2016.06.020
Start Year	2014