Development of statistical machine learning algorithms for the manufacturing of mRNA vaccines

Lead Research Organisation: Durham University
Department Name: Engineering

Abstract

mRNA vaccines have been widely used to resist covid-19 since the outbreak of the pandemic with a remarkable safety profile. A new manufacturing facility of a traditional vaccine can take years and a huge amount of money to become operational. In contrast, RNA vaccines are manufactured by a standardised procedure with minor adaptations to account for variations in RNA sequence length and composition. However, over the last few periods, companies such as Pfizer and Moderna have faced setbacks due to manufacturing disruptions. To address the challenges in vaccine manufacturing, the UK government has invested almost £215 million in the Vaccines Manufacturing and Innovation Centre since 2018 with further investment in the CPI's RNA Centre of Excellence.
The manufacturing of vaccines is a complex process with a plethora of inputs and quality measures. In mRNA vaccines, the starting materials are the plasmid, the host bacteria, and the master cell bank of the recombinant microbial cells. The vaccine undergoes quality controls such as integrity of the nucleic acids, content, potency, product and process-related impurities, sterility, endotoxin, and physicochemical tests like pH and osmolality. Due to its chemistry, RNA is generally unstable, and hence the manufacturing of mRNA vaccines involves stability-indicating parameters such as RNA integrity, content and potency, supplemented by pH, appearance, and microbiological status. Finally, the vaccine undergoes stress testing that can include temperature shifts, pH shifts, photostability, humidity, or numerous freeze-thaw cycles. We hypothesise that data science methods are currently overlooked in such a complex manufacturing process. In particular, we wish to focus on understanding the temporal dynamics of processes during the manufacturing of mRNA vaccines.
First, the programme concentrates on developing methods to integrate online and offline temporal data to predict process outcome and shed light on process productivity. Methodologically, we will build on the toolbox of sparse regression to discover the critical process parameters (CPPs) and identify how those parameters influence critical quality attributes (CQA). Our goal is to integrate the mathematical modelling with iterations of the modelling/manufacturing cycles.
Second, the work programme builds on sparse methods developed by Steve Brunton and Nathan Kutz to identify ordinary and partial differential equations from data. Brunton and Kutz's algorithm aims to extract symbolic dynamical systems from a data stream using the toolbox of sparse regression. Their method takes a time series of a predetermined set of state-space variables supposed to describe an unknown dynamical system and identifies the coefficients of the terms on the right-hand side of the ODEs or PDEs that describe the system. Those terms are selected from a given library of candidate basis functions. To do this, the authors combine numerical methods to estimate the derivatives from the time series with a sequential thresholded least squares algorithm (SINDy) that performs variable selection on the library of basis functions. Their SINDy algorithm returns numerical coefficients for each basis function. If the dataset can be represented as a sparse dynamical system in the library of candidate basis functions, most of the coefficients are zero, and SINDy identifies the non-zero coefficients. The outcome is a description of the input dataset as a symbolic ODE or PDE expressed on the chosen state-space variables and as a function of the candidate library functions, whose numerical coefficients are found by SINDy.
We will analyse datasets from bioreactors provided by the National Biologics Manufacturing Centre and its RNA Centre of Excellence, part of the Centre for Process Innovation (CPI). The work plan resides within the EPSRC Healthcare Technologies, Manufacturing the Future and Mathematical Sciences themes.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/W524426/1 30/09/2022 29/09/2028
2744980 Studentship EP/W524426/1 30/09/2022 30/03/2026 Yuzheng Zhang