# Development of an integrated continuous process for recombinant protein production using Pichia pastoris

Lead Research Organisation: University of Cambridge
Department Name: Biochemistry

### Abstract

This project will build on the successes of our two projects in phase 1 of the Bioprocessing Research Industry Club (BRIC1) to develop a system for the growth-associated production of commercially important protein products in a commercially-important production organism - the yeast Pichia pastoris.

Proteins, particularly antibodies, are now playing a major role in the treatment of human disease. They are usually produced using animal cells in culture, a procedure that is both costly and time-consuming. An attractive alternative is to engineer a yeast cell to produce human proteins of therapeutic value. Pichia pastoris is often used for this purpose since it may be grown at high cell densities and has an efficient secretion system to release the protein product from the cells. Current processes induce protein production by repeatedly adding methanol to the yeast cultures. Our studies in BRIC1 demonstrated that this is exactly the wrong way to go about this since the cells are repeatedly stressed and produced badly folded proteins that have low biological activity and are not secreted out of the cells. What is required is a continuous process for protein production and recovery and this is what our BRIC2 project aims to achieve.

The product proteins to be studied will be guided by our contacts within the BRIC member companies and include a number of proteins used in human therapies. Growth-associated production will enable continuous processes and thus increase commercial productivity. Computer models developed in BRIC1 will be improved, extended, and used to control the novel process. Moreover, the fast protein separation methods that we also developed in BRIC1, will permit the continuous retrieval of product, thus obviating the major barrier to the commercial adoption of continuous systems - the mismatch between continuous production and batchwise downstream processing.

### Technical Summary

We aim to improve both the quality and yield of proteins secreted from Pichia pastoris. This builds on the successes of our BRIC1 projects to develop a system for the growth-associated production of commercially important proteins. We have shown that the stresses of batch-wise production are sensed by the producer organism and inevitably reduce the quality and yield of secreted proteins. Growth-associated production will enable continuous processes that do not suffer these cell-associated stresses, thus increasing throughput. This will allow us to implement the modelling systems we developed in BRIC1 to effect process control. Moreover, the fast ion-exchange methods for protein separation that we also developed in BRIC1, will permit continuous retrieval of product, thus obviating the major barrier to the adoption of continuous systems - the mismatch between continuous production and batch-wise downstream processing.

We showed in BRIC1 that, in P.pastoris, secreted yields of r-proteins depend upon their native-state stabilities. Thus, we introduced the concept of engineering proteins for secreted yield, rather than just functionality a concept adopted successfully by industry. Understanding product yield's dependency on stability affords a basis for predicting secreted yield in system-wide models.
The production of r-proteins by P.pastoris exploits the powerful AOX1 promoter with expression repeatedly induced by methanol addition in a fed-batch process. In BRIC1, we demonstrated that such processes are doomed never to produce optimal yields of r-proteins since the cells are repeatedly subjected to a stress-inducing transient which both reduces yields and promotes protein misfolding. Thus, in BRIC2, we will develop an integrated continuous process for r-protein production and recovery in which product formation is growth-linked and in which medium formulation and process control are determined by a constraint-based model of P.pastoris metabolism.

### Planned Impact

The commercial production of recombinant proteins (r-proteins) of therapeutic value would be greatly facilitated, with enormous cost and efficiency benefits, if microbial processes were to supplant those based on cultured mammalian cells. Among the microbial 'cell factories' for r-protein production, the most commercially attractive is the yeast, Pichia pastoris, since it may be grown at high cell densities and an efficient secretion system that delivers the r-protein to the culture broth. However, it is our contention (based on our data from BRIC1) that currently used regimes for the production of r-proteins by P.pastoris, which exploit the powerful AOX1 promoter with expression repeatedly induced by methanol addition in a fed-batch process, are completely wrong-headed. This is because such regimes repeatedly expose the producer organism to stress-inducing transients that both reduce yields and promote protein misfolding. Thus both the quality and quantity of the r-protein product are compromised.

We propose to completely revolutionise r-protein production by this organism by developing an integrated continuous process in which product formation is growth-linked and in which medium formulation and process control are determined by a constraint-based model of P. pastoris metabolism. We will also intend to solve two problems that often militate against the adoption of continuous fermentation by the bioprocess industry - genetic instability of the producer organism and the mismatch between continuous production and batch-wise downstream processing. The first will be solved by the identification of medium constituents whose concentration may be varied without compromising product yield or quality - this removes the constant selection pressure that leads to genetic instability. The deployment of the fast ion-exchange methods for protein separation that we developed in BRIC1 will permit not only continuous retrieval of product, but also continuous verification of product quality - this will remove the second constraint on the commercial exploitation of continuous systems.

Although we intend to use Pichia pastoris as our chosen cell factory for this project, the lessons learned, and the approach to protein, organism, and process design that we shall develop, would be applicable to any eukaryotic expression system. Moreover, the systems that we shall put in place for the continuous retrieval of protein product from culture broth, and the design and operation of processes for continuous biomanufacture with continuous verification of product quality, will have wide applicability to the benefit of the Bioprocess Industry, particularly the members of BRIC.

The UK lags international competitors in the commercial supply of materials and equipment for biomanufacturing. The technologies described here are entirely novel and will be protected by patents, commercial manufacturing is to be tested and, upon achievement of the envisaged outcomes, lead times to launch of a continuous system could be short. BRIC is a good place to start with economic impact and to apply or test approaches with proteins of interest. Thus, we will engage with BRIC member companies through visits and workshops. Once we have patent protection, we can involve academics and companies beyond BRIC. Social impacts will accrue from our engagement with clinically informed companies within BRIC, which will lead to improved availability of pharmaceutical proteins at a lower price. Cambridge Enterprise (CE) will act, on behalf of both universities, to protect IP arising from the research. A Consortium Agreement will be put in place at the start of the Grant.

### ORCID iD

Stephen George Oliver (Principal Investigator)
Nigel Slater (Co-Investigator)

### Publications

10 25 50

Dikicioglu D (2015) Biomass composition: the "elephant in the room" of metabolic modelling. in Metabolomics : Official journal of the Metabolomic Society

Kell DB (2016) The metabolome 18 years on: a concept comes of age. in Metabolomics : Official journal of the Metabolomic Society

Description Development of an expression system for continuous protein production in Pichia pastoris, employing native promoters: The first objective of the project involved construction of the strains expressing the Fab3H6 fragments in Pichia pastoris under the control of three strong native promoters (Chr1-4_0586, FragB_0052 and Chr3_0030) identified in the BRIC1 project. In order to facilitate comparisons of promoter strength between these novel promoters and the commonly used GAP promoter, single-copy integrations at the promoter locus were required. As explained in our 1st Annual Report, despite intense efforts to obtain clones with single-copy integration at a specified locus, none was identified. Therefore, in order to prevent the integration of the multiple plasmid copies, the HIS4 locus was targeted. For this, the AOX promoter sequence of the pPICZaA vector (which carries the HIS4 gene) was excised and replaced with the constitutive promoter and the transgenes cloned in frame with the leader sequence of mating-factor alpha in order to effect excretion of the r-protein from the cell. A large number of clones expressing Fab3H6 were screened for single-copy genomic integration; however, in all cases, the clones were found to contain tandem repeats. The transformation protocol was amended in order to favour single-copy integration of the plasmid within the Pichia genome using zeocin-resistance as the selectable marker. The strain construction studies were conducted in parallel for both Fab3H6 and Human Lysozyme (HuLy). After several rounds of screening, single-copy integrants at the correct locus, and expressing Fab3H6 fragments (from the FragB_0052 promoter) and HuLy (from the Chr1-4_0586 promoter) have been obtained, as have reference clones expressing these coding sequences from the GAP promoter. These constructs will allow a comparison of the strength of our novel promoters with that of the reference GAP promoter. Thus, Milestone A has been reached.
Development of new expression constructs: The problems encountered during the strain construction process led us to work on alternative construction pipelines. It was observed that the widely used approach of linearizing the plasmid within the promoter sequence in order to target integration into the Pichia genome commonly resulted in mis-integration and the generation of tandem repeats. The necessarily limited size of the homologous promoter region is likely to militate against accurate homologous recombination. In order to facilitate the use of larger flanking regions, a 1.8 kb fragment from the SPO1 gene was inserted into the plasmid and integrated into the genome at SPO1 gene locus. (The SPO1 is involved in the sporulation pathway, and is not required for mitotic cell division, and so there should be no phenotypic consequences of its disruption by the integrating plasmid.) When this strategy was used to construct strains expressing HuLy under GAP promoter, 16 clones out of 20 were identified to carry single-copy integrations at the correct locus. A similar strategy will thus be used to construct clones expressing Fab3H6.
Development of a model-driven approach to design of an efficient Komagataella (Pichia) pastoris production process: In order to develop and verify a strategy for the optimization of environmental parameters using a Genetic Algorithm, a strain producing Huly under the inducible AOX promoter was employed in a case study. In three generations, we could attain > 80 % increase in the production of the active protein. Due to the convergence observed between second and third generation, we optimized the procedure by fine-tuning of the solution space after the third generation. This was achieved by the population profiling of each component over the three generations to determine the nature of parameter convergence. This analysis revealed that some parameters (calcium, glycerol, methanol, sorbitol and pH) converged around a single value, while for some others (ammonium, potassium, magnesium and iron) converged to two different levels, indicating that those ones need fine-tuning. Based on the results obtained for the three sets of experiment, i.e. 48 different medium compositions, four candidate optimized medium compositions with different ammonium, potassium, magnesium, and iron concentrations, were determined and tested experimentally. One of the candidates resulted in more than 1.5 fold increase in the protein activity and more two-fold increase in the productivity of the strain as compared to what could be obtained using the published medium recipes for Pichia pastoris. Further scale-up studies were conducted for the verification of the optimized medium, and no significant change was observed in either the activity or the productivity. We are in the process of designing a user-friendly GUI to enable us to provide a helpful computational tool to enable industrial scientists to exploit this strategy to optimize the medium and other environmental conditions for their processes.
Following the comparison of the three available genome-scale metabolic models (GEMs) for Pichia pastoris, a consensus network has been constructed from them using, whenever possible, supporting information from Kegg, MetaCyC, YMDB and, the literature. This consensus network is composed of 979 reactions and 1163 metabolites in 8 different compartments. The analysis of the dead-end metabolites within this network revealed that there were around 200 metabolites that either were either played a role as substrates, but were not produced, or were produced but not consumed in any reaction. Manual curation was undertaken to resolve these discrepancies and to fill gaps. This process revealed that neither fatty acid nor unsaturated fatty acid metabolism were well-represented in the consensus network, leaving many dead-end metabolites. Further refinement of the network, reduced the number of dead-end metabolites to 90, resulting in a model containing 1257 reactions and 1151 metabolites in 8 different compartments. A COBRA-compliant SBML version of this model, which is the first consensus model of K. pastoris (Pp1.0), is being prepared using a very similar structure with the consensus Saccharomyces cerevisiae models providing all KEGG and CheBI ID's for the metabolites and KEGG ID's for the reactions, whenever available in order to facilitate further improvement of the model by the research community. We would also note that, in contrast to the published models, our consensus model is fully executable and can be used for both Flux Balance and Flux Variation analyses, making it a valuable tool for process and strain design.
Recently, we have analysed the impact that changes in the representation of the biomass have on the predictive power of the Saccharomyces cerevisiae metabolic model (Dikicioglu et al., 2015. Despite the fact that there is available data on the biomass composition of P.pastoris under different aeration conditions, the available GEMs are either accommodate only a single condition for their biomass representation or (perhaps worse) averaging the available data to provide their biomass equation. Considering the importance of the accurate representation of the biomass for the predictive accuracy of the model, in Pp1.0, the biomass formation reaction is represented in a context dependent-manner by making use of the available data in the literature.
Our work on environmental parameter optimization highlighted the importance of iron in the culture medium in improving productivity. Iron metabolism is poorly represented in all available metabolic models for yeast species, including both P.pastoris and S.cerevisiae, and this compromises their predictive power, particularly when energy-related pathways play an important role. In order to improve this situation, we have developed a methodology to represent iron metabolism (both transport and utilization and their regulation) in GEMs.
In addition to the optimization of the cultivation conditions, we have sought to improve yield of biologically active r-proteins, by redesigning them to enhance their stability. Using HuLy as a case study, a protocol has been developed to identify point mutations that should improve protein stability without compromising activity. For this purpose, three publically accessible computational algorithms, FoldX, Rosetta Design, and CUPSAT have been used. These three algorithms, with high predictive power for estimating the impact of point mutations on protein stability, use different features for estimating protein energetics and so yield different predictions. In our protocol, FoldX, which enables the calculation of energies of each possible single point mutants in a single run, was used as the starting point and then Rosetta Design and CUPSAT algorithms were employed to predict the changes in protein stability associated with amino-acid changes at the residues identified by FoldX. Mutations in 7 amino acids at positions that were commonly found to be stabilizing according to the energy values predicted by Rosetta Design, and both stabilizing and favourable according to the energy values and torsion angles estimated by CUPSAT, were identified. These common mutations were regarded as the candidates for improved stability of HuLy and we have embarked upon the gene syntheses and expression experiments necessary to determine the efficacy of our protocol.
Genome-scale metabolic models are valuable tools for the design of novel strains of industrial microorganisms, such as Komagataella phaffii (syn. Pichis pastoris), in order to optimise them for particular biotechnological processes. However, as is the case for many industrial microbes, there is no executable metabolic model for K. phaffiii that confirms to current standards for systems biology model providing the metabolite, reactions IDs to facilitate the reusability and extendibility of the model and gene-reaction associations to enable the in-silico target identification studies. In order to remedy this deficiency and enable the community of industrial and academic researchers working on K. phaffii to exploit metabolic modelling in strain design, we decided to reconstruct the genome-scale metabolic model of K. phaffii by reconciling the extant models and performing extensive manual curation in order to construct an executable model (Kp.1.0) that conforms to current standards and so is readily extendable and reusable. We then used this model to study the effect of biomass composition on the predictive success of the model. Twelve different biomass compositions obtained from published empirical data obtained under a range of growth conditions were employed in this investigation. We found that, the success of Kp1.0 in predicting bot gene essentiality and growth characteristics was relatively unaffected by the representation of biomass composition., However, we found that biomass composition had a profound effect on the following processes (as defined by their GO terms): distribution of the fluxes involved in lipid and secondary alcohol biosynthetic processes, cellular response to DNA damage, and protein localisation. Observing the effect of biomass content on flux distributions, we further investigated its effect on the identification of suitable target genes for strain development. The analyses revealed that around 40% of the predictions of the effect of gene overexpression or deletion changed depending on the representation of biomass composition in the model. Considering the robustness of the in silico flux distributions to the changing biomass representations enables better interpretation of experimental results, reduces the risk of wrong target identification, and so both speeds and improves the process of directed strain development.
Exploitation Route We have established a collaboration with Medimmune to apply our principles of continuous production and control to antibody production by CHO cells.
Sectors Pharmaceuticals and Medical Biotechnology

Description Tool made available that allows the research community (both academic and industrial) to perform functional annotation of the genome of the industrial yeast, Pichia pastoris. Multiple interacting factors affect the performance of engineered biological systems in synthetic biology projects. The complexity of these biological systems means that experimental design should often be treated as a multiparametric optimisation problem. However, the available methodologies are either impractical, due to a combinatorial explosion in the number of experiments to be performed, or are inaccessible to most experimentalists due to the lack of publicly available, user-friendly software. Although evolutionary algorithms may be employed as alternative approaches to optimize experimental design, the lack of simple-to-use software again restricts their use to specialist practitioners. In addition, the lack of subsidiary approaches to further investigate critical factors and their interactions prevents the full analysis and exploitation of the biotechnological system. We have addressed these problems and, here, provide a simpletouse and freely available graphical user interface to empower a broad range of experimental biologists to employ complex evolutionary algorithms to optimise their experimental designs. Our approach exploits a Genetic Algorithm to discover the subspace containing the optimal combination of parameters, and Symbolic Regression to construct a model to evaluate the sensitivity of the experiment to each parameter under investigation. We demonstrated the utility of this method using an example in which the culture conditions for the microbial production of a bioactive human protein are optimised. We have made CamOptimus available to the academic and industrial research community through: http://dx.doi.org/10.17863/CAM.700.
First Year Of Impact 2016
Sector Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

Description Continuous protein production
Geographic Reach Multiple continents/international
Policy Influence Type Membership of a guideline committee
Impact Our work on the continuous production of recombinant proteins in the industrial yeast Pichia pastoris has completely changed the attitude of industry to the continuous production of therapeutic antibodies and platform chemicals,

Description Cambridge-MedImmune Beacon Projects
Amount £421,119 (GBP)
Organisation MedImmune
Department MedImmune Cambridge
Sector Private
Country United Kingdom
Start 10/2015
End 09/2018

Description IB Catalyst
Amount £525,090 (GBP)
Funding ID BB/N02348X/1
Organisation Biotechnology and Biological Sciences Research Council (BBSRC)
Sector Public
Country United Kingdom
Start 09/2016
End 08/2021

Description Medimmune Cambridge-Medimmune Beacon Extension
Amount £62,290 (GBP)
Organisation MedImmune Ltd
Sector Private
Country United Kingdom
Start 10/2015
End 09/2018

Title CamOptimus
Description CamOptimus is a multi parameter optimisation tool. It exploits a Genetic Algorithm to discover the subspace containing the optimal combination of parameters, and Symbolic Regression to construct a model to evaluate the sensitivity of the experiment to each parameter under investigation.
Type Of Material Computer model/algorithm
Year Produced 2017
Provided To Others? Yes
Impact A number of researchers have used CamOptimus to optimise both microbial growth media for production of proteins or organelles, and for the optimisation of growth protocols to generate quantum dots.
URL https://github.com/DuyguD/CamOptimus.git

Title Kp1.0 : Genome-scale metabolic model for Komagataella phaffii
Description Kp.1.0 can also be accessible through BIOMODELS database (MODEL1703150000).
Type Of Material Database/Collection of data
Year Produced 2017
Provided To Others? Yes

Title Research data supporting "A Tool for Exploiting Complex Adaptive Evolution to Optimise Protocols for Biological Experiments"
Description CamOptimus is a tool for applying Genetic Algorithm (GA) to solve multi-parametric optimisation problems and Symbolic Regression (SR) to obtain models using the data generated during optimisation procedure to investigate the effect of individual parameters on the system of interest. The source code for the compiled software, and the Graphical User Interface (GUI) of the application are available under free licensing (GNU General Public License v3.0). The user manual is supplied in the compressed folder. $\textbf{Important information}$: access to the files for this software has been restricted as they are out of date. The software is available on Github, where updated documentation and new releases are available. $\href{https://github.com/DuyguD}{\text{https://github.com/DuyguD}}$.
Type Of Material Database/Collection of data
Year Produced 2017
Provided To Others? Yes

Description A Self-Adapting Continuous Bioprocessing system for Recombinant Protein Production
Organisation MedImmune
Department MedImmune Cambridge
Country United Kingdom
Sector Private
PI Contribution Commercial in confidence.
Collaborator Contribution Commercial in confidence
Impact Commercial in confidence
Start Year 2015