Linking recombinant gene sequence to protein product manufacturability using CHO cell genomic resources

Lead Research Organisation: University of Sheffield

Department Name: Chemical & Biological Engineering

Abstract

Biopharmaceutical companies producing the new generation of recombinant DNA derived therapeutic proteins (e.g. cancer medicines such as Herceptin and Avastin) often use mammalian cells grown in culture to make the protein product. All production processes are based, fundamentally, upon the ability of the host mammalian cell factory to use a synthetic DNA genetic "code" to manufacture the complex protein product. This is a cornerstone of modern biotechnology. However, because protein synthesis is so complex, involving many cellular resources and machines, it is extremely difficult for genetic engineers to design a DNA code that will best enable the mammalian cell factory to operate most efficiently. Moreover, as individual mammalian cell factories can be very variable, they may differ substantially in their relative ability to make the product. As a consequence, a lot of time and money has to be spent by companies on the initial phases of the biopharmaceutical development process conducting intensive screening operations to find the best cell factory (out of a large population) able to use the genetic code it has been given. For a different protein product it is necessary to start the whole development process again.
In this project we will utilise recently available high information content molecular analysis technologies and computational tools to "de-convolute" the complexity of protein synthesis in mammalian cell factories. Effectively, we know that the mammalian cell factory uses its own genetic code to make thousands of its own proteins (machines) that together perform a variety of functions that enable the cell to grow and divide. The rate at which these proteins are made varies hugely, over 1000-fold, so that the cell can make each bit of protein machinery in the right quantity to do its job. We will measure how efficiently each cellular protein is made then using advanced biological information analysis (bioinformatics) and mathematics we will determine how the cell uses pieces of information embedded in each of its genes to vary the rate at which a specific protein is made.
This will enable us to create, for the first time, a usable set of "design rules" (computer programmes) that genetic engineers and cell factory developers can employ to (i) reliably design the best genetic code for any given protein product and (ii) accurately predict how much of the protein product the mammalian cell factory can make. This is important as it means that biopharmaceutical companies can design a predictable production system from scratch, enabling a more rapid transition through lengthy cell factory development processes towards (pre-)clinical trials.

Technical Summary

For any engineered production process it is highly desirable to perform as much process or component design in silico as possible. This minimises trial and error testing of component interactions in the laboratory/factory. Underpinning in silico design are computational tools that can confidently be employed to predict the functional consequences of parameter change.

Our previous first-round BBSRC BRIC funded grant clearly identified the importance of recombinant mRNA dynamics in controlling recombinant protein production by CHO cells. Accordingly, very recent genome-scale studies have highlighted the pre-eminence of mRNA (synthesis/stability and primarily, translational efficiency) in controlling the relative abundance of proteins in mammalian cell generally. This project is therefore concerned with the development and application of a computational design platform, necessarily derived from a combination of genome-scale datastreams, that can be reliably employed to speed the development of mammalian cell factories through the optimal design of synthetic genes with predictable in vivo performance during whole production processes.

This project will also provide important tools that can be employed for a variety of genome-scale applications. By confident prediction of mRNA dynamics at the genome scale we will be able to re-create whole CHO cell proteomes in silico from high-throughput RNA sequencing data. This computational "bridge" between layers of cellular functional organisation will greatly facilitate the in silico design of synthetic genetic systems with a desired proportion of functional components and predict the relative abundance of protein components of complex cellular networks for fundamental studies of CHO cell function in the engineered environment. All proteomic and transcriptomic databases and associated computational resources will be available to the BRIC community.

Planned Impact

This research project clearly derives from (i) underpinning BRIC 1/1 research in DCJs lab which generated a fundamental understanding of the control of recombinant protein synthesis by CHO cells during production processes and (ii) a BRIC 2 Enabling Grant which was used to sequence the CHO cell genome. Based on this pre-competitive knowledge (bioscience underpinning bioprocessing) the proposed research is clearly focused on the creation of new tools and resources that would benefit a number of clearly defined user-groups:

1. UK bioindustry. This project will support UK companies developing biological medicines produced by mammalian cells in culture. We will provide our industrial partners with a data-rich resource as well as new, validated computational and informatic methods that can be implemented immediately to reduce time and costs spent in the creation of biomanufacturing systems - this represents a clear economic benefit and increased capability and competitiveness for UK bioindustry. All data and tools will be made available to BRIC partners as soon as they are generated.
2. BRIC/Bioprocessing researchers. We will produce large reference datasets and computational modelling resources (people and tools) dedicated to biomanufacturing systems. These represent a significant resource not just for industry but for any researcher engaged in pre-competitive research on CHO cell based manufacturing systems. We anticipate that adaptations of our modelling approaches could be applied to other cell factories (e.g. yeast, E. coli) or to other mammalian cell culture systems (e.g. human cell therapies etc). Development of the UKs ability to productively utilise genome-scale datasets to improve biomanufacturing systems is absolutely necessary.
3. Other researchers. This project directly address the BBSRC's 10-year vision "towards predictive biology" concentrating on a core problem for functional genomics; how to reliably predict cellular protein abundances from measured mRNA abundances. We anticipate that our research and development would be relevant to many projects utilising genome-scale transcriptomic data.

Funded Value:

£654,948

Funded Period:

Aug 13 - May 17

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/K011197/1

Principal Investigator:

David James

Research Subject:

Bioengineering (40%)

Biomolecules & biochemistry (20%)

Omic sciences & technologies (40%)

Research Topic:

Biochemical engineering (20%)

Bioreactors (20%)

Functional genomics (20%)

Protein expression (20%)

Proteomics (20%)

Organisations

People	ORCID iD
David James (Principal Investigator)
Paul Dobson (Co-Investigator)
Neil Lawrence (Co-Investigator)	http://orcid.org/0000-0001-9258-1030
Josselin Noirel (Co-Investigator)
Mark Dickman (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Brown A (2017) In silico design of context-responsive mammalian promoters with user-defined functionality in Nucleic Acids Research

Brown AJ (2018) Transcriptome-Based Identification of the Optimal Reference CHO Genes for Normalisation of qPCR Data. in Biotechnology journal

Brown AJ (2019) Whole synthetic pathway engineering of recombinant protein production. in Biotechnology and bioengineering

Cartwright J (2018) Highly sensitive detection of mutations in CHO cell recombinant DNA using multi-parallel single molecule real-time DNA sequencing in Biotechnology and Bioengineering

Dai Z (2015) Variational Auto-encoded Deep Gaussian Processes

Dai Z. (2016) Variational auto-encoded deep Gaussian processes in 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings

González J (2015) Bayesian Optimization for Synthetic Gene Design

González J. (2016) Batch bayesian optimization via local penalization in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016

González J. (2016) GLASSES: Relieving the myopia of Bayesian optimisation in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Research Tools and Methods
Collaboration


Description	Processes and resources for the design of synthetic genetic elements that contributed to several new bioindustrial collaborations. Directly contributed to the development of a new commercial entity deriving from University of Sheffield research and development in mammalian synthetic biology for bioindustrial applications. Related publications based on knowledge generated. Several presentations/disseminations at leading bioindustrial companies and technology development conferences.
Exploitation Route	Development of a new spin-out company focussing on mammalian synthetic biology to occur Q2 2021. Improved biomanufacturing processes at industrial collaborator sites. More informed bioindustrial research and development.
Sectors	Healthcare Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology


Description	Contribution to establishment of a new spin-out company (SynGenSys Ltd) from July 2021.
First Year Of Impact	2021
Sector	Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types	Economic


Description	BBSRC CTP BB/P011608/1
Amount	£1,200,000 (GBP)
Funding ID	BB/P011608/1
Organisation	Biotechnology and Biological Sciences Research Council (BBSRC)
Sector	Public
Country	United Kingdom
Start	09/2016
End	10/2020


Description	Direct funding from Biogen Idec
Amount	£250,000 (GBP)
Organisation	Biogen Idec
Sector	Private
Country	United States
Start	06/2015
End	07/2017


Description	Direct funding from Lonza
Amount	£217,000 (GBP)
Organisation	Lonza Group
Department	Lonza Biologics
Sector	Private
Country	United States
Start	03/2017
End	03/2019


Description	Direct funding from MedImmune
Amount	£1,100,000 (GBP)
Organisation	AstraZeneca
Department	MedImmune
Sector	Private
Country	United Kingdom
Start	05/2015
End	06/2020


Description	Direct funding from Regenxbio
Amount	£270,000 (GBP)
Organisation	Regenxbio Inc
Sector	Private
Country	United States
Start	03/2017
End	04/2019


Title	Synthetic gene design based on multi-omic based modelling of mRNA translation efficiency in CHO cells
Description	A synthetic gene design process, which yields 000s of synthetic sequences varying in predicted translational efficiency and stability, represents a disruptive improvement over currently available commercial systems (e.g. Geneart, DNA 2.0) which offer only single "optimized" sequences.
Type Of Material	Model of mechanisms or symptoms - in vitro
Provided To Others?	No
Impact	University of Sheffield business development and research innovation managers are currently engaged in analysis of the potential for commercialisation of our synthetic gene design technology.


Title	CHO Cell Proteome Browser
Description	Empirically derived tool reporting the half-life and mRNA translation efficiency of CHO cell proteins
Type Of Material	Database/Collection of data
Year Produced	2015
Provided To Others?	Yes
Impact	Research still ongoing. Used as a fundamental tool for synthetic gene design
URL	http://sheffield-abc.shef.ac.uk:6166/Protein_report_app_2015_v6/


Description	Strategic partnership with Biogen
Organisation	Biogen Idec
Country	United States
Sector	Private
PI Contribution	CHO cell engineering technology
Collaborator Contribution	Project management, research materials, datasets
Impact	Johari Y, Estes S, Alves C, James DC. (2015) Integrated cell and process engineering strategies for improved production of a difficult-to-express fusion protein by CHO cells. Biotechnology and Bioengineering. In press.
Start Year	2010


Description	Strategic partnership with Lonza Biologics
Organisation	Lonza Group
Department	Lonza Biologics
Country	United States
Sector	Private
PI Contribution	Genetic vector and cell engineering technology development
Collaborator Contribution	Project management, laboratory facilities, research materials.
Impact	Grainger RG, James DC (2013). Cell line specific control and prediction of recombinant monoclonal antibody glycosylation. Biotechnology and Bioengineering. 110: 2970-2983. Davies SL, Lovelady CS, Grainger RK, Racher AJ, Young RJ, James DC. (2013) Functional heterogeneity and heritability in CHO cell populations. Biotechnology and Bioengineering 110: 260-274. Highlighted "Spotlight" paper. McLeod J, O'Callaghan PM, Pybus LP, Wilkinson SJ, Root T, Racher AJ, James DC (2011) An empirical modeling platform to evaluate the relative control discrete CHO cell synthetic processes exert over recombinant monoclonal antibody production process titer. Biotechnology and Bioengineering. 108: 2193-2204. Davies SL, McLeod J, O'Callaghan PM, Pybus LP, Sung YH, Wilkinson SJ, Rance J, Racher AJ, Young RJ, James DC. (2011) Impact of gene vector design on the control of recombinant monoclonal antibody production by CHO cells. Biotechnology Progress 27: 1689-1699. O'Callaghan PM, MacLeod J, Pybus L, Lovelady CS, Wilkinson S, Racher AJ, Porter A, James DC. (2010) Cell line specific control of recombinant monoclonal antibody production by CHO cells. Biotechnology and Bioengineering. 106: 937-951.
Start Year	2006


Description	Strategic partnership with MedImmune
Organisation	AstraZeneca
Department	MedImmune
Country	United Kingdom
Sector	Private
PI Contribution	Development of novel cell engineering technology
Collaborator Contribution	Project management, laboratory facilities, research reagents and model systems
Impact	Pybus LP, Dean G, Slidel T, Hardman C, Smith A, Daramola O, Field R, James DC (2014) Predicting the expression of recombinant monoclonal antibodies in Chinese hamster ovary cells based on sequence features of the CDR3 domain. Biotechnology Progress 30: 188-197. Pybus LP, Dean G, West NR, Smith A, Daramola O, Field R, Wilkinson SJ, James DC (2014) Model-directed engineering of "difficult-to-express" monoclonal antibody production by Chinese hamster ovary cells. Biotechnology and Bioengineering 111: 372-385. Highlighted "Spotlight" paper. Thompson BC, Segarra CRJ, Mozley O, Daramola O, Field R, Levison PL, James DC. (2012) Cell line specific control of PEI-mediated transient transfection optimised with 'Design of Experiments' methodology. Biotechnology Progress. 28: 179-187.
Start Year	2007