Bayesian machine learning and Self-Assembly

Lead Research Organisation: University of Oxford
Department Name: Oxford Chemistry

Abstract

This project falls within the Physical Sciences, Mathematical Sciences and Information and Communication Technologies EPSRC research themes, being specifically relevant to the Artificial intelligence technologies, Computational and theoretical chemistry, and Biophysics and soft matter physics EPSRC research areas. The project will involve training deep neural networks (DNNs) to predict the outcome of self-assembling processes (e.g. polycubes [6] and DNA origami [1,2,3]), where standard algorithms either do not exist, or do not scale well. There will be also be investigating the source of generalisation in DNNs - continuing a line of work which has already produced several novel results (see e.g. [7]). This work justifies a Bayesian approach to DNNs that has already produced methods that lead to better generalisation (e.g. Multi-SWAG and ensemble methods in general). DNA origami has shown an impressive ability to assemble nanoscale structures and devices with high fidelity. DNA origami exploits the specificity of Watson-Crick base pairing to fold up a long single-stranded DNA "scaffold" (typically the genome of the M13 virus) by the addition of hundreds of different short DNA "staple" strands that bind to two or more specific scaffold domains. One limitation is that if one wants to make structures made from multiple types of origami each has to be first assembled separately because of the common M13 scaffold, before mixing the assembled origamis to further assemble into the final structure. Although now a very widely-used technique, a detailed understanding of the mechanisms of DNA origami self-assembly is still lacking. Nor is it clear how to control or direct the assembly process. One of the problems in achieving this control is the large space spanned by the variables that could potentially influence the assembly, not only those associated with the conditions (e.g. annealing schedule, relative strand concentrations) but those of the design space (e.g. staple binding patterns, sequence). Machine learning potentially provides a means to navigate such large spaces, learning the design rules to control assembly as it goes.
In this project we intend to combine this domain-level model with machine learning to explore how to control origami assembly. A key target will be to learn how to direct assembly to one of multiple origami products in a one-pot assembly system. The aim would be to produce a DNN-based algorithm that can predict the annealing schedule, relative strand concentrations and the control of other relevant parameters that speeds up the production of DNA origami, without sacrificing yield. This is expected to be the first model of its kind and may significantly aid the practical production of DNA origami. Other self-assembling systems (e.g. polycubes [6]) are simpler environments and are likely to be useful environments to understand precisely what a DNN has learned - interpretability being an important current question. Bayesian methods (e.g. deep ensemble learning) will allow us to see the confidence with which the model makes its predictions - and is likely to allow us to determine what class of polyominoes the network struggles with. In order to better understand the predictions of various models, studying how they arrive at their predictions will be crucial. We will continue work from [5,7], where the question of why DNNs generalise well is studied with a Bayesian perspective.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S513842/1 01/10/2018 30/09/2024
2451633 Studentship EP/S513842/1 01/10/2020 30/09/2024 Christopher Mingard