Study of Complex Sugars using Machine Learning Potentials
Lead Research Organisation:
University of St Andrews
Department Name: Chemistry
Abstract
This project will involve the computational study of complex carbohydrates, beginning with the arabinoxylans found in Plantago Ovata. These highly branched macromolecules consist of B-1,4 and B-1,3 linked xylosyl units decorated with arabinosyl- and xylosyl-containing side-chains of 1-3 residues. The unique rheological and aqueous gel-forming properties of these and similar carbohydrates are exploited by the food industry. With this in mind, we desire detailed insight into the nature of the various interactions in aqueous solution including the chain-chain, chain-water, and chain self-interactions (secondary structure). The role of water and hydrogen bonding is thought to be very important in mediating the interactions between chains. Also of great importance is the decoration pattern and nature of the side chains.
To interrogate these aspects, we will run large scale Molecular Dynamics (MD) simulations involving thousands of atoms. These will cover a wide variety of chain branching and decoration patterns. With this we hope to gain insight into the solvation structure and thermodynamics involved, as a function of suchsaid variations, as well as the effect of other species in solution which may modify the hydrogen bonding. The results of these studies will be used to shed light on the data obtained by experimental colleagues.
In the case of very large systems, one is typically limited to simple Force Field methods which do not capture the subtleties we may wish to observe. For this reason, we intend to apply Machine Learning Potentials (MLPs). These models are trained on reference data obtained from a high level method (e.g. Density Functional Theory (DFT)) and seek to interpolate the potential energy surface between these reference points. They are formally linear cost-scaling. The training of such a model will be a large part of the project as it requires the (likely iterative) assembly of a large amount of reference data. The level of theory used for this purpose will be carefully chosen considering benchmark trials on solvated carbohydrate fragments against high level ab-initio methods (CCSD(T)) as well as computational cost. The bulk of our reference data will be obtained from single point energy evaluations of snapshots from initial Force Field MD simulations, but we will also seek to use any pre-existing data we can obtain.obtain..
To interrogate these aspects, we will run large scale Molecular Dynamics (MD) simulations involving thousands of atoms. These will cover a wide variety of chain branching and decoration patterns. With this we hope to gain insight into the solvation structure and thermodynamics involved, as a function of suchsaid variations, as well as the effect of other species in solution which may modify the hydrogen bonding. The results of these studies will be used to shed light on the data obtained by experimental colleagues.
In the case of very large systems, one is typically limited to simple Force Field methods which do not capture the subtleties we may wish to observe. For this reason, we intend to apply Machine Learning Potentials (MLPs). These models are trained on reference data obtained from a high level method (e.g. Density Functional Theory (DFT)) and seek to interpolate the potential energy surface between these reference points. They are formally linear cost-scaling. The training of such a model will be a large part of the project as it requires the (likely iterative) assembly of a large amount of reference data. The level of theory used for this purpose will be carefully chosen considering benchmark trials on solvated carbohydrate fragments against high level ab-initio methods (CCSD(T)) as well as computational cost. The bulk of our reference data will be obtained from single point energy evaluations of snapshots from initial Force Field MD simulations, but we will also seek to use any pre-existing data we can obtain.obtain..
Organisations
People |
ORCID iD |
Tanja Van Mourik (Primary Supervisor) | |
Peter Starrs (Student) |
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/R513337/1 | 01/10/2018 | 30/09/2023 | |||
2449090 | Studentship | EP/R513337/1 | 01/09/2020 | 31/08/2024 | Peter Starrs |
EP/T518062/1 | 01/10/2020 | 30/09/2025 | |||
2449090 | Studentship | EP/T518062/1 | 01/09/2020 | 31/08/2024 | Peter Starrs |