GREET: Generative Recombinant Enzyme Engineering for Therapeutics

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Biological Sciences


Enzymes are proteins catalysing almost all reactions required for cellular life and, when defective, they can cause severe pathologies. For example, in humans, alpha-galactosidase (a-GAL) deficiency, a condition affecting up to 1 in 3000 newborn known as Fabry's disease (FD), causes life threatening damage to heart and kidneys. Since these diseases are usually caused by inherited genomic mutations, they cannot be cured, but they can be treated using Enzyme Replacement Therapies (ERTs), which consist of the injection of a recombinant version of the affected enzymes into patients.
Unfortunately, ERTs have limitations; recombinant enzymes have lower enzymatic activity compared to the human wild-type versions, are unstable in blood, are poorly absorbed by human cells, and often trigger an immune response. Moreover, manufacturing therapeutic enzymes is extremely expensive because standard mammalian cell-based expression systems have low yield.

Developing effective therapeutic enzymes requires design methods able to discover new amino acid sequences that can encode the same catalytic function, while optimising the therapeutic properties of the molecule. Then, these enzymes must be converted into highly optimised DNA triplets, called codons, to maximise expression and yield in host organisms that can grow in inexpensive media. With the increasing incidence of enzymatic deficiencies and current treatments costing up to £400K per year per patient, it is crucial to establish effective methods to perform these tasks and implement a platform for effective and sustainable production of therapeutic enzymes.

Through the EPSRC fellowship, I will develop the computational and experimental methods required for engineering and manufacturing designer enzymes. I will use deep generative machine learning (ML) to design and codon optimise new enzymes, which will then be rapidly built and tested at scale using the lab automation platform available at the University of Edinburgh (UoE). As a proof of concept, I will build a library of designer human a-GAL enzymes using P. pastoris, a high-yield expression system used in the pharmaceutical industry.

To deliver this ambitious project, I have set four objectives over the 4 years of my fellowship :
1. Developing deep generative learning models for enzyme design.
2. Developing deep generative learning models for codon optimisation.
3. Building a library of designer human a-GAL enzymes in P. pastoris.
4. Developing a computer aided design (CAD) software for enzyme engineering.

Each objective addresses current limitations in enzyme engineering and manufacturing. ML avoids the need for accurate biophysical models by learning design rules directly from existing enzymes. Thus, by reverse engineering Nature's design principles, it will be possible to engineer functional designer enzymes at unprecedented scale. Coupling in-silico design with a robotic platform will allow building and testing thousands of different variants, thus minimising the time required for identifying a functional enzyme. Here I will test this new approach by engineering the human a-GAL enzyme, which is currently difficult to manufacture and optimise for therapeutic treatment; this effort will not only provide experimental evidence for the effectiveness of my platform but could also identify new potential treatments for FD.

The project is supported by a strong network of experts in synthetic biology and machine learning, in the UK and the US, industrial biopharmaceutical and biotechnology partners, such as Fujifilm Diosynth Biotechnologies UK (FDBK) and the Industrial Biotechnology Innovation Centre (IBioIC), and unique research facilities available at UoE, such as the Edinburgh Genome Foundry.

With this fellowship, I will lay the foundation for data-driven biological engineering and deliver enabling computational and experimental technologies to rapidly design, build and test new therapeutic molecules.
Title GREET: Generative Recombinant Enzyme Engineering For Therapeutics 
Description The animation provides an accessible introduction to lysosomal storage diseases (LSDs) and Fabry disease, and shows how we are using AI to engineer better therapies. Since LSDs affect mostly children and young adults, we thought that publishing a video on YouTube would have been the best way to share our work. 
Type Of Art Film/Video/Animation 
Year Produced 2023 
Impact The video has been shared with the Edinburgh Kidney Research group, which includes Fabry patients, and has been featured in the newsletter and website of the School of Biological Sciences, Center for Engineering Biology and College of Science and Engineering at UoE. 
Description 21EBTA Engineering Biology for Cell and Gene Therapy Applications
Amount £1,518,259 (GBP)
Funding ID BB/W014610/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 01/2022 
End 01/2024
Description Edinburgh Kidney research initiative 
Organisation University of Edinburgh
Department Renal Medicine Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution I have been invited to joined a network of researchers and clinicians at University of Edinburgh, who work on renal diseases which include also Fabry disease. My role here is to promote the use of AI and engineering biology in drug discovery and to engage with patients to make them aware on the progress enabled by these technologies. The collaboration is very productive and we are working on joint UKRI proposal.
Collaborator Contribution Partner provide expertise into the clinical implications of my fellowship work, and has allowed me to bridge my research with patients in the clinic.
Impact The collaboration is interdisciplinary since it involves work with clinicians.
Start Year 2023
Title PROTON: PROtein engineering by TempOral convolutional Networks 
Description The PROtein engineering by TempOral convolutional Networks (PROTON) is a deep learning software to design protein libraries using sequence information of protein families. PROTON it implements a generative model, called Temporal Dirichlet Variational Auto Encoder (TDVAE), which maps a protein family design space into a discrete mathematical space and uses temporal convolution to output new, unseen protein sequences. The software offers to design options: prior sampling design, which generates sequences using information learned by the entire protein family, or posterior sampling design, which generates variants of a user- defined protein. PROTON can performs biochemical characterisation of the designed sequences, and can rank and prioritise sequences for downstream experimental testing using two new analyses, namely coverage and confidence analysis: the former estimates the amount of data supporting the predicted amino acid, the latter estimates how confident the model is about its prediction. PROTON can also optimise the training process by performing sequence clustering, and similarly create highly diverse protein libraries by using sequence clustering methods like MMseq2, as already shown in our preprint: this step is completely optional or can be replaced by any other clustering software. PROTON is designed to work in high-performance computing environments and exploits parallelism to minimise the computational burden. PROTON is licenses through TTO at University of Edinburgh under the new technology disclosure "TEC1104509 - PROTON: PROtein engineering by TempOral convolutional Networks". 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2023 
Impact PROTON is enabling the