Machine Learning for Predictable Design of Yeast Promoters

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Engineering

Abstract

Promoter sequences sit upstream of genes and function to control under what conditions, at what time and how much of the protein a gene codes for are expressed. An important step in the emergence of synthetic biology as a fully-fledged engineering discipline is to provide yeast promoters which result in well characterised and predictable protein expression. Unfortunately current efforts to predict patterns of protein expression from DNA sequences of promoters do not yet yield sufficiently accurate results. We propose to apply machine learning techniques to uncover what features within promoter sequences contribute to particular expression patterns based on a large dataset of activity assays provided by the Edinburgh Genome Foundry. The end goal of these efforts would be to reverse this process allowing the predictable design of yeast promoters fine tuned with the desired parameters for a variety of genetic engineering applications from Biofuels (Azhar et. al. 2017) to manufacturing of biopharmaceuticals (Nielsen et. al. 2012).
This work will build on the findings of Cuperus et. al. 2017 who utilised linear regression models and convolutional neural networks to predict the effects on protein expression of any 5' untranslated region in yeast. Similar work, again using convolutional neural networks, was carried out by Umarov et. al. 2017 in humans, mice, Arabidopsis, E.coli and B.subtilis. The use of convolutional neural networks using unsupervised learning in both of these studies provides the benefit of allowing the model to take into account previously unevaluated features. Convolutional neural networks do however have the drawback that spatial relationships between features are of far less importance than their presence. Recent developments in machine learning in the form of capsule networks (Sabour et. al. 2017) may provide a solution to this through capsule structures which learn to recognise a feature over a small range of different conditions and deformations. Cuperus et. al. 2017 indicates the importance of taking into account spatial orientation of features through the improved performance of their position-dependent linear regression models compared to position-independent implementations due to the position dependence of many key features in yeast protein expression.
In order to achieve the end goal of predictable design of yeast promoters we will again expand on Cuperus et. al. 2017's work on in silico evolution of 5' untranslated sequences by using a hybrid approach of a genetic algorithm with a similarity measure between the prediction of our promoter parameter model and the desired promoter properties acting as a surrogate for the fitness function. A similar technique has been utilised by Dias et. al. 2014 to improve clinical practice in the placement of beams in Radiotherapy.
By utilising machine learning to understand patterns in yeast promoter design we hope to contribute to the advent of faster and less error prone workflows in synthetic biology facilitating a new generation of environmentally friendly, highly efficient manufacturing techniques.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513209/1 01/10/2018 30/09/2023
2274382 Studentship EP/R513209/1 01/09/2019 28/02/2023 Frederick Starkey