Learning signalling pathways from single-cell RNA profiles of CRISPR perturbations

Lead Research Organisation: University of Cambridge
Department Name: Cancer Research UK Cambridge Institute

Abstract

How is information flow in the cell organised? How are outside signals transferred to the cell nucleus to turn transcriptional programs on or off? The proposed research addresses these questions by combining data from novel experimental techniques, which have only been published in the last few months, with an established computational approach pioneered by the applicant.

The novel experimental techniques use a gene editing method called CRISPR to perturb genes in a cell and then measure the gene expression response in it using single cell RNA sequencing. By using many perturbations in many cells the data give a comprehensive picture of the effects of gene perturbations and thus of what function the genes have in the cell. The data fit perfectly to a computational method the applicant has developed to infer gene interactions and pathways from the expression effects of gene perturbations. The method is called Nested Effects Models. Over the last 12 years the method has been very well developed and many key ideas have been introduced in different applications (where genes were perturbed differently or effects were measured differently). But the key ideas are there and can now be translated to the new data from single cell RNA seq CRISPR screens.

The goals of the project are, first, to understand the features of the new type of data better and make sure that perturbation effects can be estimated robustly. Second, to tailor NEMs to the specifics of these new data. Third, to understand which effect different experimental parameters have and thus be able to design better experiments in the future. And finally, in collaboration with leading experimental scientists, to use the methodological advances to gain new insights into biology. Two case studies will be on regulatory networks in T helper cells and on how the JAK-STAT pathway shapes epigenetic landscapes.

Technical Summary

Several recent high-impact papers introduced experimental techniques for single-cell based genetic screens to understand gene function and cellular signalling pathways. These techniques combine single-cell RNA sequencing (scRNA-seq) and clustered regularly interspaced short palindromic repeats (CRISPR)-based perturbations to massively scale up the resolution and scope of previous genetic screening technologies. In a first step, CRISPR vectors deliver guide RNAs (gRNAs) targeted at particular genes to a pool of cells or to one well of an array. In a second step, the cells carrying the different perturbations are then RNA sequenced to measure transcriptional effects of the perturbations, which provides information on gene function and pathway activity. The technology is flexible and will most likely soon be used very widely across molecular biology. As these technologies are brand-new, tailored computational analysis of these data is lagging behind experimental advances.

Here I propose a machine learning approach to efficiently analyse scRNA-seq CRISPR screens and infer gene interaction networks and pathways of information flow in the cell. Our approach is based on an established machine learning method called Nested Effect Models (NEMs), which has been pioneered by the applicant. NEMs are built on inferring subset relations and thus are complementary to other graphical models like Bayesian networks and Gaussian graphical models. Over the last twelve years NEMs have been refined, extended, and applied by a world-wide community of independent groups, and now there exists a substantial body of methodological developments and experience in applications, which we propose to leverage for the analysis of scRNA-seq CRISPR screens. Working with leading developers of scRNA-seq CRISPR screens, we will use our methodological advances to optimise the study design of future screens and showcase the power of our approach in collaborative case studies.

Planned Impact

This proposal integrates cutting-edge experimental techniques (single cell genomics, CRISPR) with a powerful machine learning approach (Nested Effects Models). It uses novel experimental techniques and innovative computational analyses in an inter-disciplinary approach. Thus, a key group of beneficiaries are academic researchers and this proposal contributes to worldwide academic enhancement.

The proposed research combines machine learning and computational biology with applications in immunology and epigenomics. Another group of beneficiaries are early career researchers just after their PhD, because this proposed project helps to train them to become highly skilled researchers. The proposed project thus enhances the knowledge economy of the UK. The skills they will learn are highly sought after both in academia and in industry.

The research outcomes of the project are a better understanding or regulatory networks and signaling pathways on a single cell level which can translate into improving health and well-being. Other potential beneficiaries are thus the biomedical community and patients, because the improved understanding of basic biological mechanisms provided by the research proposed here might translate into new drugs against these mechanisms in disease.

Other beneficiaries come from the general public and we will work on increasing public engagement with research by giving talks to lay people, for example charity supporters who visit our institute regularly.

Publications

10 25 50

publication icon
Hosseini SR (2019) Estimating the predictability of cancer evolution. in Bioinformatics (Oxford, England)

 
Description This grant funded our work into developing novel methods to analyse a particular type of data that shows what effect gene perturbations have on other genes. Perturbations were done by CRISPR and effects were measured by single cell RNA seq. The methods we developed integrate the different perturbation effects into a functional network that can tell us how the different parts of the perturbed biological process work together to achieve the observed outcome. Our first main achievement is a comprehensive method benchmarking to explore the power and limitations of existing approaches and to define where new developments are needed. We are currently writing this up and will submit soon. Based on this benchmarking, our second achievement is a new network inference method.
Exploitation Route We will apply our methods to experimental data derived in my own lab and by collaborators. We make all our code available and so every researcher in this field can use our approaches to their own perturbation data.
Sectors Pharmaceuticals and Medical Biotechnology