Linking recombinant gene sequence to protein product manufacturability using CHO cell genomic resources

Lead Research Organisation: University of Sheffield
Department Name: Chemical & Biological Engineering


Biopharmaceutical companies producing the new generation of recombinant DNA derived therapeutic proteins (e.g. cancer medicines such as Herceptin and Avastin) often use mammalian cells grown in culture to make the protein product. All production processes are based, fundamentally, upon the ability of the host mammalian cell factory to use a synthetic DNA genetic "code" to manufacture the complex protein product. This is a cornerstone of modern biotechnology. However, because protein synthesis is so complex, involving many cellular resources and machines, it is extremely difficult for genetic engineers to design a DNA code that will best enable the mammalian cell factory to operate most efficiently. Moreover, as individual mammalian cell factories can be very variable, they may differ substantially in their relative ability to make the product. As a consequence, a lot of time and money has to be spent by companies on the initial phases of the biopharmaceutical development process conducting intensive screening operations to find the best cell factory (out of a large population) able to use the genetic code it has been given. For a different protein product it is necessary to start the whole development process again.
In this project we will utilise recently available high information content molecular analysis technologies and computational tools to "de-convolute" the complexity of protein synthesis in mammalian cell factories. Effectively, we know that the mammalian cell factory uses its own genetic code to make thousands of its own proteins (machines) that together perform a variety of functions that enable the cell to grow and divide. The rate at which these proteins are made varies hugely, over 1000-fold, so that the cell can make each bit of protein machinery in the right quantity to do its job. We will measure how efficiently each cellular protein is made then using advanced biological information analysis (bioinformatics) and mathematics we will determine how the cell uses pieces of information embedded in each of its genes to vary the rate at which a specific protein is made.
This will enable us to create, for the first time, a usable set of "design rules" (computer programmes) that genetic engineers and cell factory developers can employ to (i) reliably design the best genetic code for any given protein product and (ii) accurately predict how much of the protein product the mammalian cell factory can make. This is important as it means that biopharmaceutical companies can design a predictable production system from scratch, enabling a more rapid transition through lengthy cell factory development processes towards (pre-)clinical trials.

Technical Summary

For any engineered production process it is highly desirable to perform as much process or component design in silico as possible. This minimises trial and error testing of component interactions in the laboratory/factory. Underpinning in silico design are computational tools that can confidently be employed to predict the functional consequences of parameter change.

Our previous first-round BBSRC BRIC funded grant clearly identified the importance of recombinant mRNA dynamics in controlling recombinant protein production by CHO cells. Accordingly, very recent genome-scale studies have highlighted the pre-eminence of mRNA (synthesis/stability and primarily, translational efficiency) in controlling the relative abundance of proteins in mammalian cell generally. This project is therefore concerned with the development and application of a computational design platform, necessarily derived from a combination of genome-scale datastreams, that can be reliably employed to speed the development of mammalian cell factories through the optimal design of synthetic genes with predictable in vivo performance during whole production processes.

This project will also provide important tools that can be employed for a variety of genome-scale applications. By confident prediction of mRNA dynamics at the genome scale we will be able to re-create whole CHO cell proteomes in silico from high-throughput RNA sequencing data. This computational "bridge" between layers of cellular functional organisation will greatly facilitate the in silico design of synthetic genetic systems with a desired proportion of functional components and predict the relative abundance of protein components of complex cellular networks for fundamental studies of CHO cell function in the engineered environment. All proteomic and transcriptomic databases and associated computational resources will be available to the BRIC community.

Planned Impact

This research project clearly derives from (i) underpinning BRIC 1/1 research in DCJs lab which generated a fundamental understanding of the control of recombinant protein synthesis by CHO cells during production processes and (ii) a BRIC 2 Enabling Grant which was used to sequence the CHO cell genome. Based on this pre-competitive knowledge (bioscience underpinning bioprocessing) the proposed research is clearly focused on the creation of new tools and resources that would benefit a number of clearly defined user-groups:

1. UK bioindustry. This project will support UK companies developing biological medicines produced by mammalian cells in culture. We will provide our industrial partners with a data-rich resource as well as new, validated computational and informatic methods that can be implemented immediately to reduce time and costs spent in the creation of biomanufacturing systems - this represents a clear economic benefit and increased capability and competitiveness for UK bioindustry. All data and tools will be made available to BRIC partners as soon as they are generated.
2. BRIC/Bioprocessing researchers. We will produce large reference datasets and computational modelling resources (people and tools) dedicated to biomanufacturing systems. These represent a significant resource not just for industry but for any researcher engaged in pre-competitive research on CHO cell based manufacturing systems. We anticipate that adaptations of our modelling approaches could be applied to other cell factories (e.g. yeast, E. coli) or to other mammalian cell culture systems (e.g. human cell therapies etc). Development of the UKs ability to productively utilise genome-scale datasets to improve biomanufacturing systems is absolutely necessary.
3. Other researchers. This project directly address the BBSRC's 10-year vision "towards predictive biology" concentrating on a core problem for functional genomics; how to reliably predict cellular protein abundances from measured mRNA abundances. We anticipate that our research and development would be relevant to many projects utilising genome-scale transcriptomic data.


10 25 50
Description Processes and resources for the design of synthetic genetic elements that contributed to several new bioindustrial collaborations.
Directly contributed to the development of a new commercial entity deriving from University of Sheffield research and development in mammalian synthetic biology for bioindustrial applications.
Related publications based on knowledge generated.
Several presentations/disseminations at leading bioindustrial companies and technology development conferences.
Exploitation Route Development of a new spin-out company focussing on mammalian synthetic biology to occur Q2 2020.
Improved biomanufacturing processes at industrial collaborator sites.
More informed bioindustrial research and development.
Sectors Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

Description Knowledge transfer and new collaborations We have already begin to exploit this technology to support new collaborations with industry, e.g. 1. University of Sheffield EPSRC KTA Innovation and Knowledge Exchange Fund and MedImmune Ltd. Development of synthetic gene expression technology. 2014-2015. 2. University of Sheffield and Biogen Idec. Design of synthetic parts for CHO cell engineering, 2015-2017. 3. University of Sheffield and MedImmune. Cell engineering platform development. 2015-2020. 5. University of Sheffield EPSRC KTA Innovation and Knowledge Exchange Fund and Lonza Biologics plc. Cell factory engineering. 2015-2016. 6. Funding (£201K) received from Connecting Capabilities Fund-Northern Triangle Initiaitve for a commerical proof of concept project (2018-2020) to exploit synthetic genetic elements for next-generation mammalian expression system design. Relevance to biopharmaceutical and virus vector manufacturing systems. Dissemination to bioindustry Presentations ? Predicting Translation: A Global Approach Sheffield Advanced Biomanufacturing Centre Conference (Longworth) ? Linking Recombinant Gene Sequence to Protein Product Manufacturability Using CHO cell Genomic Resources. BRIC Meeting, Newcastle, UK (Longworth, Gonzalez). ? Gaussian Processes for Global Optimization. Gaussian Process Summer School, Sheffield, UK (Gonzalez). ? Bayesian Optimization for synthetic gene design. ICML, workshop in constructive Learning, Lille, France (Gonzalez). ? Batch Bayesian Optimization via Local Penalization. Seminar at the Centre for Computational Statistics and Machine Learning, UCL, London, UK (Gonzalez). ? Bayesian Optimization for Synthetic Gene Design. 25th Annual MASAMB Workshop, Heksinki, Finland (Gonzalez). ? Linking recombinant gene sequence to protein products. Sheffield Institute for Translational Neuroscience, The University of Sheffield, Sheffield, UK (Gonzalez). Poster Presentations ? Predicting Translation: A Global Approach Advanced Biomanufacturing Centre Conference (Longworth)) Publications ? Gonzalez J, Longworth J, James DC, Lawrence ND (2015) Bayesian optimization for synthetic gene design. arXiv:1505.01627 [stat.ML]. (NIPS 2014 Workshop on Bayesian Optimization (Gonzalez)
First Year Of Impact 2014
Sector Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

Description BBSRC CTP BB/P011608/1
Amount £1,200,000 (GBP)
Funding ID BB/P011608/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 10/2016 
End 10/2020
Description Direct funding from Biogen Idec
Amount £250,000 (GBP)
Organisation Biogen Idec 
Sector Private
Country United States
Start 07/2015 
End 07/2017
Description Direct funding from Lonza
Amount £217,000 (GBP)
Organisation Lonza Group 
Department Lonza Biologics
Sector Private
Country United States
Start 03/2017 
End 03/2019
Description Direct funding from MedImmune
Amount £1,100,000 (GBP)
Organisation MedImmune 
Department MedImmune Cambridge
Sector Private
Country United Kingdom
Start 06/2015 
End 06/2020
Description Direct funding from Regenxbio
Amount £270,000 (GBP)
Organisation Regenxbio Inc 
Sector Private
Country United States
Start 04/2017 
End 04/2019
Title Synthetic gene design based on multi-omic based modelling of mRNA translation efficiency in CHO cells 
Description A synthetic gene design process, which yields 000s of synthetic sequences varying in predicted translational efficiency and stability, represents a disruptive improvement over currently available commercial systems (e.g. Geneart, DNA 2.0) which offer only single "optimized" sequences. 
Type Of Material Model of mechanisms or symptoms - in vitro 
Provided To Others? No  
Impact University of Sheffield business development and research innovation managers are currently engaged in analysis of the potential for commercialisation of our synthetic gene design technology. 
Title CHO Cell Proteome Browser 
Description Empirically derived tool reporting the half-life and mRNA translation efficiency of CHO cell proteins 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact Research still ongoing. Used as a fundamental tool for synthetic gene design 
Description Strategic partnership with Biogen 
Organisation Biogen Idec
Country United States 
Sector Private 
PI Contribution CHO cell engineering technology
Collaborator Contribution Project management, research materials, datasets
Impact Johari Y, Estes S, Alves C, James DC. (2015) Integrated cell and process engineering strategies for improved production of a difficult-to-express fusion protein by CHO cells. Biotechnology and Bioengineering. In press.
Start Year 2010
Description Strategic partnership with Lonza Biologics 
Organisation Lonza Group
Department Lonza Biologics
Country United States 
Sector Private 
PI Contribution Genetic vector and cell engineering technology development
Collaborator Contribution Project management, laboratory facilities, research materials.
Impact Grainger RG, James DC (2013). Cell line specific control and prediction of recombinant monoclonal antibody glycosylation. Biotechnology and Bioengineering. 110: 2970-2983. Davies SL, Lovelady CS, Grainger RK, Racher AJ, Young RJ, James DC. (2013) Functional heterogeneity and heritability in CHO cell populations. Biotechnology and Bioengineering 110: 260-274. Highlighted "Spotlight" paper. McLeod J, O'Callaghan PM, Pybus LP, Wilkinson SJ, Root T, Racher AJ, James DC (2011) An empirical modeling platform to evaluate the relative control discrete CHO cell synthetic processes exert over recombinant monoclonal antibody production process titer. Biotechnology and Bioengineering. 108: 2193-2204. Davies SL, McLeod J, O'Callaghan PM, Pybus LP, Sung YH, Wilkinson SJ, Rance J, Racher AJ, Young RJ, James DC. (2011) Impact of gene vector design on the control of recombinant monoclonal antibody production by CHO cells. Biotechnology Progress 27: 1689-1699. O'Callaghan PM, MacLeod J, Pybus L, Lovelady CS, Wilkinson S, Racher AJ, Porter A, James DC. (2010) Cell line specific control of recombinant monoclonal antibody production by CHO cells. Biotechnology and Bioengineering. 106: 937-951.
Start Year 2006
Description Strategic partnership with MedImmune 
Organisation MedImmune
Country United States 
Sector Private 
PI Contribution Development of novel cell engineering technology
Collaborator Contribution Project management, laboratory facilities, research reagents and model systems
Impact Pybus LP, Dean G, Slidel T, Hardman C, Smith A, Daramola O, Field R, James DC (2014) Predicting the expression of recombinant monoclonal antibodies in Chinese hamster ovary cells based on sequence features of the CDR3 domain. Biotechnology Progress 30: 188-197. Pybus LP, Dean G, West NR, Smith A, Daramola O, Field R, Wilkinson SJ, James DC (2014) Model-directed engineering of "difficult-to-express" monoclonal antibody production by Chinese hamster ovary cells. Biotechnology and Bioengineering 111: 372-385. Highlighted "Spotlight" paper. Thompson BC, Segarra CRJ, Mozley O, Daramola O, Field R, Levison PL, James DC. (2012) Cell line specific control of PEI-mediated transient transfection optimised with 'Design of Experiments' methodology. Biotechnology Progress. 28: 179-187.
Start Year 2007