Study of Complex Sugars using Machine Learning Potentials

Lead Research Organisation: University of St Andrews
Department Name: Chemistry

Abstract

This project will involve the computational study of complex carbohydrates, beginning with the arabinoxylans found in Plantago Ovata. These highly branched macromolecules consist of B-1,4 and B-1,3 linked xylosyl units decorated with arabinosyl- and xylosyl-containing side-chains of 1-3 residues. The unique rheological and aqueous gel-forming properties of these and similar carbohydrates are exploited by the food industry. With this in mind, we desire detailed insight into the nature of the various interactions in aqueous solution including the chain-chain, chain-water, and chain self-interactions (secondary structure). The role of water and hydrogen bonding is thought to be very important in mediating the interactions between chains. Also of great importance is the decoration pattern and nature of the side chains.

To interrogate these aspects, we will run large scale Molecular Dynamics (MD) simulations involving thousands of atoms. These will cover a wide variety of chain branching and decoration patterns. With this we hope to gain insight into the solvation structure and thermodynamics involved, as a function of suchsaid variations, as well as the effect of other species in solution which may modify the hydrogen bonding. The results of these studies will be used to shed light on the data obtained by experimental colleagues.

In the case of very large systems, one is typically limited to simple Force Field methods which do not capture the subtleties we may wish to observe. For this reason, we intend to apply Machine Learning Potentials (MLPs). These models are trained on reference data obtained from a high level method (e.g. Density Functional Theory (DFT)) and seek to interpolate the potential energy surface between these reference points. They are formally linear cost-scaling. The training of such a model will be a large part of the project as it requires the (likely iterative) assembly of a large amount of reference data. The level of theory used for this purpose will be carefully chosen considering benchmark trials on solvated carbohydrate fragments against high level ab-initio methods (CCSD(T)) as well as computational cost. The bulk of our reference data will be obtained from single point energy evaluations of snapshots from initial Force Field MD simulations, but we will also seek to use any pre-existing data we can obtain.obtain..

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513337/1 01/10/2018 30/09/2023
2449090 Studentship EP/R513337/1 01/09/2020 31/08/2024 Peter Starrs
EP/T518062/1 01/10/2020 30/09/2025
2449090 Studentship EP/T518062/1 01/09/2020 31/08/2024 Peter Starrs