Mapping the overlapping fitness landscapes of a superfamily of promiscuous enzymes: strategies for directed evolution?

Lead Research Organisation: University of Cambridge
Department Name: Biochemistry

Abstract

Proteins are Nature's all-purpose functional molecules that work with unsurpassed precision under mild conditions: their selectivity allows them to recognise one molecule out of thousands in a cell. Their efficacy - tight binding and efficient catalytic turnover - makes them reagents that can catch onto target molecules and neutralize, cleave or process them. Being able to emulate Nature's ability to create tailor-made molecules, in the laboratory would bring transformational change to the way we live: e.g. via 'green' industrial production lines, resource-efficient bioprocessing or more selective therapeutic intervention.

However, understanding of enzyme catalysis remains a daunting challenge, despite intense research efforts in basic and applied research. Our understanding certainly fails the most severe test - that of making catalysts that meet the efficiency of natural enzymes. Directed evolution is a new approach to this problem: we make collections of molecules and test each of them to see whether any one in this collection is the proverbial 'needle in a haystack'. The more tests we do, the better are the chances of finding useful catalysts: this is how Nature has gradually evolved new molecules. We have developed a testing system that can do more tests normally carried out in a lab: in microfluidic devices we can test more than 10 million mutants in a day. This gives us a technological advantage and we hope to be faster in directed evolution and get better catalysts out. But we also have to choose where in 'sequence space' (a function of all possible amino acid randomisations of a protein) we can go. To probe this, we use a technology we have recently developed ('UMIC-Seq': Nat Commun 2020, 11 (1), 6023) that allows us to obtain a full-length sequence of > 10,000 sequence per round of evolution (at a price of less than 1 penny per sequence). This kind of mapping will help us to see where we are going in sequence space and sets us up for computational help in understanding evolution (using correlation analysis and machine learning), to understand the cooperative interaction patterns that characterise intra-gene epistasis. Evolution will be carried out slow and steady (via multiple rounds of error-prone PCR) or with dispruptive yet functionally innovative insertion-deletion (InDel) libraries (made by our method TRIAD: Nat Commun 2020, 11 (1), 3469 & Proc Natl Acad Sci U S A 2020, 117 (44), 27307-27318) to probe the determinants of successful evolution of efficiency and specificity. Specifically we are interested in follwing evolutionary trajectories of promiscuous enzymes (enzymes with multiple functions), because they are beieved to be springboards of evolution, so tracking their emergence promises to yield particularly useful insights into how enzymes change their function in evolution. In addition to a fundamental interest in a mechanism fundamental to life, we hope to demonstrate that an understanding of evolution can inform protein engineering by directed evolution.

Technical Summary

Our project will create synergy between ultrahigh-throughput screening, large scale sequencing and classical physical-organic, structural and mechanistic approaches enabling cross-comparison to steer further development.

The novel technologies that form the basis of this proposal have been developed in our group and are combined for the first time

(1) UMIC-seq, a straightforward unique molecular identifier (UMI)-linked consensus sequencing workflow (UMIC-seq) that simplifies mapping of evolutionary trajectories from full-length sequences. Attaching UMIs to gene variants allows accurate consensus generation for closely related genes with nanopore sequences and allows easy sequencing at low cost (<1 p per sequence), so that thousands of genes (outputs of rounds of evolution) can be analysed. [Nat Commun 2020, 11 (1), 6023.]

(2) Ultrahigh screening of enzyme library members in monodisperse oil-in-water compartments ('microdroplets') that are generated at kHz frequencies in microfluidic devices, a technology that allows selection of libraries with > 10e7 members per day, with nM sensitivity and excellent precision, so that multiple rounds (>10) can be easilly carried out. [Nat Commun 2015, 6, 10008.]

(3) A transposon-based library synthesis method that generates insertions and deletions (InDels) will be used to create innovative libraries that may lreset the protein architecture by laltering the amino acid backbone [Nat Commun 2020, 11 (1), 3469 & Proc Natl Acad Sci U S A 2020, 117 (44), 27307-27318]

Evolutionary trajectories will be analysed in terms of intra-gene epistasis, also using machine learning and correlation analysis. Mutants emerging from the trajectories will be analysed by kinetics (in particular linear-free energy relationships to pin-point catalytic features) and X-ray structural analysis.

Publications

10 25 50
 
Description Exhibition at Love Nature / Christchurch Mansion, Ipswich 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Exhibition booth to demonstrate library screening with ultrahigh throughput tools - with a focus on sustainable biocatalysis
Year(s) Of Engagement Activity 2023
URL https://hollfelder.bioc.cam.ac.uk/outreach