Linking recombinant gene sequence to protein product manufacturability using CHO cell genomic resources
Lead Research Organisation:
University of Sheffield
Department Name: Chemical & Biological Engineering
Abstract
Biopharmaceutical companies producing the new generation of recombinant DNA derived therapeutic proteins (e.g. cancer medicines such as Herceptin and Avastin) often use mammalian cells grown in culture to make the protein product. All production processes are based, fundamentally, upon the ability of the host mammalian cell factory to use a synthetic DNA genetic "code" to manufacture the complex protein product. This is a cornerstone of modern biotechnology. However, because protein synthesis is so complex, involving many cellular resources and machines, it is extremely difficult for genetic engineers to design a DNA code that will best enable the mammalian cell factory to operate most efficiently. Moreover, as individual mammalian cell factories can be very variable, they may differ substantially in their relative ability to make the product. As a consequence, a lot of time and money has to be spent by companies on the initial phases of the biopharmaceutical development process conducting intensive screening operations to find the best cell factory (out of a large population) able to use the genetic code it has been given. For a different protein product it is necessary to start the whole development process again.
In this project we will utilise recently available high information content molecular analysis technologies and computational tools to "de-convolute" the complexity of protein synthesis in mammalian cell factories. Effectively, we know that the mammalian cell factory uses its own genetic code to make thousands of its own proteins (machines) that together perform a variety of functions that enable the cell to grow and divide. The rate at which these proteins are made varies hugely, over 1000-fold, so that the cell can make each bit of protein machinery in the right quantity to do its job. We will measure how efficiently each cellular protein is made then using advanced biological information analysis (bioinformatics) and mathematics we will determine how the cell uses pieces of information embedded in each of its genes to vary the rate at which a specific protein is made.
This will enable us to create, for the first time, a usable set of "design rules" (computer programmes) that genetic engineers and cell factory developers can employ to (i) reliably design the best genetic code for any given protein product and (ii) accurately predict how much of the protein product the mammalian cell factory can make. This is important as it means that biopharmaceutical companies can design a predictable production system from scratch, enabling a more rapid transition through lengthy cell factory development processes towards (pre-)clinical trials.
In this project we will utilise recently available high information content molecular analysis technologies and computational tools to "de-convolute" the complexity of protein synthesis in mammalian cell factories. Effectively, we know that the mammalian cell factory uses its own genetic code to make thousands of its own proteins (machines) that together perform a variety of functions that enable the cell to grow and divide. The rate at which these proteins are made varies hugely, over 1000-fold, so that the cell can make each bit of protein machinery in the right quantity to do its job. We will measure how efficiently each cellular protein is made then using advanced biological information analysis (bioinformatics) and mathematics we will determine how the cell uses pieces of information embedded in each of its genes to vary the rate at which a specific protein is made.
This will enable us to create, for the first time, a usable set of "design rules" (computer programmes) that genetic engineers and cell factory developers can employ to (i) reliably design the best genetic code for any given protein product and (ii) accurately predict how much of the protein product the mammalian cell factory can make. This is important as it means that biopharmaceutical companies can design a predictable production system from scratch, enabling a more rapid transition through lengthy cell factory development processes towards (pre-)clinical trials.
Technical Summary
For any engineered production process it is highly desirable to perform as much process or component design in silico as possible. This minimises trial and error testing of component interactions in the laboratory/factory. Underpinning in silico design are computational tools that can confidently be employed to predict the functional consequences of parameter change.
Our previous first-round BBSRC BRIC funded grant clearly identified the importance of recombinant mRNA dynamics in controlling recombinant protein production by CHO cells. Accordingly, very recent genome-scale studies have highlighted the pre-eminence of mRNA (synthesis/stability and primarily, translational efficiency) in controlling the relative abundance of proteins in mammalian cell generally. This project is therefore concerned with the development and application of a computational design platform, necessarily derived from a combination of genome-scale datastreams, that can be reliably employed to speed the development of mammalian cell factories through the optimal design of synthetic genes with predictable in vivo performance during whole production processes.
This project will also provide important tools that can be employed for a variety of genome-scale applications. By confident prediction of mRNA dynamics at the genome scale we will be able to re-create whole CHO cell proteomes in silico from high-throughput RNA sequencing data. This computational "bridge" between layers of cellular functional organisation will greatly facilitate the in silico design of synthetic genetic systems with a desired proportion of functional components and predict the relative abundance of protein components of complex cellular networks for fundamental studies of CHO cell function in the engineered environment. All proteomic and transcriptomic databases and associated computational resources will be available to the BRIC community.
Our previous first-round BBSRC BRIC funded grant clearly identified the importance of recombinant mRNA dynamics in controlling recombinant protein production by CHO cells. Accordingly, very recent genome-scale studies have highlighted the pre-eminence of mRNA (synthesis/stability and primarily, translational efficiency) in controlling the relative abundance of proteins in mammalian cell generally. This project is therefore concerned with the development and application of a computational design platform, necessarily derived from a combination of genome-scale datastreams, that can be reliably employed to speed the development of mammalian cell factories through the optimal design of synthetic genes with predictable in vivo performance during whole production processes.
This project will also provide important tools that can be employed for a variety of genome-scale applications. By confident prediction of mRNA dynamics at the genome scale we will be able to re-create whole CHO cell proteomes in silico from high-throughput RNA sequencing data. This computational "bridge" between layers of cellular functional organisation will greatly facilitate the in silico design of synthetic genetic systems with a desired proportion of functional components and predict the relative abundance of protein components of complex cellular networks for fundamental studies of CHO cell function in the engineered environment. All proteomic and transcriptomic databases and associated computational resources will be available to the BRIC community.
Planned Impact
This research project clearly derives from (i) underpinning BRIC 1/1 research in DCJs lab which generated a fundamental understanding of the control of recombinant protein synthesis by CHO cells during production processes and (ii) a BRIC 2 Enabling Grant which was used to sequence the CHO cell genome. Based on this pre-competitive knowledge (bioscience underpinning bioprocessing) the proposed research is clearly focused on the creation of new tools and resources that would benefit a number of clearly defined user-groups:
1. UK bioindustry. This project will support UK companies developing biological medicines produced by mammalian cells in culture. We will provide our industrial partners with a data-rich resource as well as new, validated computational and informatic methods that can be implemented immediately to reduce time and costs spent in the creation of biomanufacturing systems - this represents a clear economic benefit and increased capability and competitiveness for UK bioindustry. All data and tools will be made available to BRIC partners as soon as they are generated.
2. BRIC/Bioprocessing researchers. We will produce large reference datasets and computational modelling resources (people and tools) dedicated to biomanufacturing systems. These represent a significant resource not just for industry but for any researcher engaged in pre-competitive research on CHO cell based manufacturing systems. We anticipate that adaptations of our modelling approaches could be applied to other cell factories (e.g. yeast, E. coli) or to other mammalian cell culture systems (e.g. human cell therapies etc). Development of the UKs ability to productively utilise genome-scale datasets to improve biomanufacturing systems is absolutely necessary.
3. Other researchers. This project directly address the BBSRC's 10-year vision "towards predictive biology" concentrating on a core problem for functional genomics; how to reliably predict cellular protein abundances from measured mRNA abundances. We anticipate that our research and development would be relevant to many projects utilising genome-scale transcriptomic data.
1. UK bioindustry. This project will support UK companies developing biological medicines produced by mammalian cells in culture. We will provide our industrial partners with a data-rich resource as well as new, validated computational and informatic methods that can be implemented immediately to reduce time and costs spent in the creation of biomanufacturing systems - this represents a clear economic benefit and increased capability and competitiveness for UK bioindustry. All data and tools will be made available to BRIC partners as soon as they are generated.
2. BRIC/Bioprocessing researchers. We will produce large reference datasets and computational modelling resources (people and tools) dedicated to biomanufacturing systems. These represent a significant resource not just for industry but for any researcher engaged in pre-competitive research on CHO cell based manufacturing systems. We anticipate that adaptations of our modelling approaches could be applied to other cell factories (e.g. yeast, E. coli) or to other mammalian cell culture systems (e.g. human cell therapies etc). Development of the UKs ability to productively utilise genome-scale datasets to improve biomanufacturing systems is absolutely necessary.
3. Other researchers. This project directly address the BBSRC's 10-year vision "towards predictive biology" concentrating on a core problem for functional genomics; how to reliably predict cellular protein abundances from measured mRNA abundances. We anticipate that our research and development would be relevant to many projects utilising genome-scale transcriptomic data.
Publications

Brown AJ
(2018)
Transcriptome-Based Identification of the Optimal Reference CHO Genes for Normalisation of qPCR Data.
in Biotechnology journal

Brown AJ
(2017)
In silico design of context-responsive mammalian promoters with user-defined functionality.
in Nucleic acids research

Brown AJ
(2019)
Whole synthetic pathway engineering of recombinant protein production.
in Biotechnology and bioengineering

Cartwright JF
(2018)
Highly sensitive detection of mutations in CHO cell recombinant DNA using multi-parallel single molecule real-time DNA sequencing.
in Biotechnology and bioengineering

Dai Z
(2015)
Variational Auto-encoded Deep Gaussian Processes

Dai Z.
(2016)
Variational auto-encoded deep Gaussian processes
in 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings

González J
(2015)
Bayesian Optimization for Synthetic Gene Design
Description | Processes and resources for the design of synthetic genetic elements that contributed to several new bioindustrial collaborations. Directly contributed to the development of a new commercial entity deriving from University of Sheffield research and development in mammalian synthetic biology for bioindustrial applications. Related publications based on knowledge generated. Several presentations/disseminations at leading bioindustrial companies and technology development conferences. |
Exploitation Route | Development of a new spin-out company focussing on mammalian synthetic biology to occur Q2 2021. Improved biomanufacturing processes at industrial collaborator sites. More informed bioindustrial research and development. |
Sectors | Healthcare Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology |
Description | Contribution to establishment of a new spin-out company (SynGenSys Ltd) from July 2021. |
First Year Of Impact | 2021 |
Sector | Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology |
Impact Types | Economic |
Description | Direct funding from Biogen Idec |
Amount | £250,000 (GBP) |
Organisation | Biogen Idec |
Sector | Private |
Country | United States |
Start | 06/2015 |
End | 07/2017 |
Description | Direct funding from Lonza |
Amount | £217,000 (GBP) |
Organisation | Lonza Group |
Department | Lonza Biologics |
Sector | Private |
Country | United States |
Start | 03/2017 |
End | 03/2019 |
Description | Direct funding from MedImmune |
Amount | £1,100,000 (GBP) |
Organisation | AstraZeneca |
Department | MedImmune |
Sector | Private |
Country | United Kingdom |
Start | 05/2015 |
End | 06/2020 |
Description | Direct funding from Regenxbio |
Amount | £270,000 (GBP) |
Organisation | Regenxbio Inc |
Sector | Private |
Country | United States |
Start | 03/2017 |
End | 04/2019 |
Title | Synthetic gene design based on multi-omic based modelling of mRNA translation efficiency in CHO cells |
Description | A synthetic gene design process, which yields 000s of synthetic sequences varying in predicted translational efficiency and stability, represents a disruptive improvement over currently available commercial systems (e.g. Geneart, DNA 2.0) which offer only single "optimized" sequences. |
Type Of Material | Model of mechanisms or symptoms - in vitro |
Provided To Others? | No |
Impact | University of Sheffield business development and research innovation managers are currently engaged in analysis of the potential for commercialisation of our synthetic gene design technology. |
Title | CHO Cell Proteome Browser |
Description | Empirically derived tool reporting the half-life and mRNA translation efficiency of CHO cell proteins |
Type Of Material | Database/Collection of data |
Year Produced | 2015 |
Provided To Others? | Yes |
Impact | Research still ongoing. Used as a fundamental tool for synthetic gene design |
URL | http://sheffield-abc.shef.ac.uk:6166/Protein_report_app_2015_v6/ |
Description | Strategic partnership with Biogen |
Organisation | Biogen Idec |
Country | United States |
Sector | Private |
PI Contribution | CHO cell engineering technology |
Collaborator Contribution | Project management, research materials, datasets |
Impact | Johari Y, Estes S, Alves C, James DC. (2015) Integrated cell and process engineering strategies for improved production of a difficult-to-express fusion protein by CHO cells. Biotechnology and Bioengineering. In press. |
Start Year | 2010 |
Description | Strategic partnership with Lonza Biologics |
Organisation | Lonza Group |
Department | Lonza Biologics |
Country | United States |
Sector | Private |
PI Contribution | Genetic vector and cell engineering technology development |
Collaborator Contribution | Project management, laboratory facilities, research materials. |
Impact | Grainger RG, James DC (2013). Cell line specific control and prediction of recombinant monoclonal antibody glycosylation. Biotechnology and Bioengineering. 110: 2970-2983. Davies SL, Lovelady CS, Grainger RK, Racher AJ, Young RJ, James DC. (2013) Functional heterogeneity and heritability in CHO cell populations. Biotechnology and Bioengineering 110: 260-274. Highlighted "Spotlight" paper. McLeod J, O'Callaghan PM, Pybus LP, Wilkinson SJ, Root T, Racher AJ, James DC (2011) An empirical modeling platform to evaluate the relative control discrete CHO cell synthetic processes exert over recombinant monoclonal antibody production process titer. Biotechnology and Bioengineering. 108: 2193-2204. Davies SL, McLeod J, O'Callaghan PM, Pybus LP, Sung YH, Wilkinson SJ, Rance J, Racher AJ, Young RJ, James DC. (2011) Impact of gene vector design on the control of recombinant monoclonal antibody production by CHO cells. Biotechnology Progress 27: 1689-1699. O'Callaghan PM, MacLeod J, Pybus L, Lovelady CS, Wilkinson S, Racher AJ, Porter A, James DC. (2010) Cell line specific control of recombinant monoclonal antibody production by CHO cells. Biotechnology and Bioengineering. 106: 937-951. |
Start Year | 2006 |
Description | Strategic partnership with MedImmune |
Organisation | AstraZeneca |
Department | MedImmune |
Country | United Kingdom |
Sector | Private |
PI Contribution | Development of novel cell engineering technology |
Collaborator Contribution | Project management, laboratory facilities, research reagents and model systems |
Impact | Pybus LP, Dean G, Slidel T, Hardman C, Smith A, Daramola O, Field R, James DC (2014) Predicting the expression of recombinant monoclonal antibodies in Chinese hamster ovary cells based on sequence features of the CDR3 domain. Biotechnology Progress 30: 188-197. Pybus LP, Dean G, West NR, Smith A, Daramola O, Field R, Wilkinson SJ, James DC (2014) Model-directed engineering of "difficult-to-express" monoclonal antibody production by Chinese hamster ovary cells. Biotechnology and Bioengineering 111: 372-385. Highlighted "Spotlight" paper. Thompson BC, Segarra CRJ, Mozley O, Daramola O, Field R, Levison PL, James DC. (2012) Cell line specific control of PEI-mediated transient transfection optimised with 'Design of Experiments' methodology. Biotechnology Progress. 28: 179-187. |
Start Year | 2007 |