Regulation of transcript isoforms

Lead Research Organisation: University of Cambridge
Department Name: Genetics


Theme: World-Class Underpinning Bioscience

1. Benchmark the performance of isoform quantification tools run on single-cell RNA- seq data.
2. Characterise the impact of technical noise on isoform quantification in single-cell RNA-seq data.
3. Determine how the patterns of isoform expression seen in populations of cells are generated by individual cells.
4. Consider how patterns of isoform expression are regulated.

A brief explanation of these aims is given as follows. Use of single-cell RNA-seq is becoming increasingly popular despite a lack of knowledge regarding best bioinformatic practices for analysing the data. Single-cell RNA-seq could be particularly informative regarding alternative splicing. However, before this can be fully addressed, data processing issues need to be explored. One aim of this PhD project is to benchmark the performance of existing isoform quantification tools when run on single cell data. This will establish whether existing isoform quantification tools are capable of accurately performing isoform quantification when applied to single-cell data and will determine which tool performs best. In addition, the impact of technical noise on isoform quantification must be considered.

Drop-outs are more common for lowly expressed genes (Kharchenko et al., 2014), consequently the expression of lowly expressed genes and isoforms is less likely to be detected in single cells. It will be important to determine whether there is a threshold level of expression below which performing isoform quantification and splicing analysis is not meaningful for single-cell RNA-seq data, and if so where that threshold lies.

With this information, it will be more meaningful to use single-cell RNA-seq data to address biological questions as the limitations of current single-cell RNA-seq technologies will be more fully understood. The next aim will be to determine how the patterns of isoform expression seen in populations of cells are generated by individual cells - whether each cell expresses all the isoforms seen at the population level, or whether individual cells express only a subset of the isoforms seen at the population level. A comprehensive understanding of drop-outs will be important when answering this question, as a high rate of drop-outs could give the impression that less isoforms are expressed in individual cells than is truly the case.

It is plausible based on current evidence that there is some degree of heterogeneity in the number of isoforms expressed in the individual cells making up a population (Marinov et al., 2014; Shalek et al., 2013; Zhao et al., 2016). Assuming this is the case, the next aim will be to determine how the number of isoforms expressed in a single cell is regulated. An obvious candidate for how this regulation could be effected is epigenetic marks. Machine learning approaches could be used to investigating this possibility.

It is possible that the number of isoforms expressed in each cell in a population is largely homogenous between cells, or that it is not possible to distinguish whether the heterogeneity present is biological or technical in origin. If this proves to be the case, the regulation of the number of isoforms expressed in populations of cells could instead be considered. Again, epigenetic marks are a clear candidate for how the number of expressed isoforms might be regulated.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/M011194/1 01/10/2015 31/03/2024
1804962 Studentship BB/M011194/1 01/10/2016 10/01/2020 Jennifer Westoby
Description My research is on developing and understanding existing methods for analysing sequencing data. My first project, which I have now finished, has helped us understand which of the existing methods for analysing a type of sequencing data called single-cell RNA-seq better. My second project was on determining the extent to which it is currently possible to analyse alternative splicing using scRNA-seq data. I found that there is currently substantial technical and possibly biological confounding.
Exploitation Route My findings are relevant to other bioinformaticians in industry and academia
Sectors Agriculture, Food and Drink,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology

Description I have published a benchmark of isoform quantification software. This benchmark is likely to be used in academic and industrial biotechnology environments to select the best performing isoform quantification tool for single-cell RNA-seq. I also released a preprint to biorXiV on the current limitations of studying alternative splicing using scRNA-seq. My findings are relevant to scientists studying scRNA-seq in academic and industry contexts.
First Year Of Impact 2018
Sector Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Economic