AI-assisted DBTL cycle for synthetic yeast promoters

Lead Research Organisation: The University of Manchester
Department Name: Chemistry

Abstract

Biotechnology promises us the ability to leverage biology in order to address issues in society by enabling medicines, food, and other products to be created. Greater control over an organism's genes is one of the keys to unlocking this potential.

Within an organism's genome, the promoters upstream from genes are where transcription begins. Whether or not that gene produces any mRNA molecules and how much is produced is greatly affected by these promoter. Downstream products from these mRNA molecules include proteins and enzymes , which in turn have the power to control metabolic pathways. Thus, understanding how promoters work would be a big step to this understanding.

Their importance have not escaped the imagination of other researchers. In the past, researchers have either looked at the core elements within the promoters to understand what properties they share, or examined the promoter sequences themselves by downloading publicly available data, for example. Some of their works have taken advantage of the growing interest in artificial intelligence and the doors that open when large enough data sets exist for machine learning algorithms to learn from.

Like these earlier works, we also intend to use machine learning algorithms such as deep learning to better understand promoter sequences. However, we plan to link deep learning with synthetic biology in an iterative fashion - the sequences identified as promoter sequences in silico are synthesized using synthetic biology, which are then validated on the benchtop. Those that pass and confirmed to be promoter sequences can be fed back in, so that the machine learning algorithm can be re-trained.

Our approach builds on the fact that artificial intelligence not only needs a lot of data, but a lot of good, high-quality annotated data. Such data has to be verified before being used to re-train any future model - if not, then the performance of the model will degrade, deviating from the original model.

Within the timeframe of the project, we will demonstrate our methodology through one iteration as a proof of principle. Successful application of our method will not only benefit biotechnology in the long run, but in the short term, demonstrate that our approach is sound and can be applied to future projects involving artificial intelligence and synthetic biology.