Geometric and Physics-Informed Generative Modeling for Biomedicine and Biochemistry
Lead Research Organisation:
University of Oxford
Abstract
This project falls within the EPSRC Artificial Intelligence and Robotics research area.
Current generative models, while robust in various applications, struggle to represent the manifold complexities and transient dynamics of biological and chemical systems. This is particularly evident in biomedical applications, such as single-cell RNA sequencing, where the destructive nature of cross-sectional samples necessitates the inference of cell dynamics from static, sparse data points [1]. Similarly, in biochemistry, the simulation of transient states in chemical reactions is limited to physically plausible paths beyond simple straight-line trajectories [2].
This project proposes the development of novel generative modeling frameworks that integrate geometric learning and physical laws directly into the model architecture. The expected outcome is more effective generative models for understanding dynamic systems in biomedical research and chemistry, better reflecting (and thus understanding) the underlying dynamics of these systems and speeding up simulations.
Geometric Models
We propose a method called Metric Flow Matching (MFM) [3]-a novel generative framework designed for trajectory inference. It utilizes data- induced Riemannian metrics to learn approximate geodesics. By adopting geodesic paths, MFM adheres more closely to the data's underlying geometry and demonstrates state-of-the-art performance in predicting single-cell trajectories.
Further improvements to MFM could enhance model specificity and accuracy for single-cell trajectories-encoding task-specific biases [4], addressing data imbalance by incorporating optimal transport plans, or refining simulation-free couplings to respect the manifold assumption by designing transport plan with heat kernels.
Physics-Informed
Our next objective is integrating physical constraints (e.g., Langevin dynamics [5]) to enhance molecular transition path sampling [2], Molecular Dynamics [6], or Docking [7]. These constraints could be incorporated into generative models similarly to MFM, replacing geometric bias with physical ones, akin to methods explored in Physics-Informed Neural Networks and Neural Operators instead of Geometric Metrics proposed in MFM.
For instance, in Molecular Dynamics, ensuring balanced conditions characteristic of these dynamics are preserved, along with the corresponding Fokker-Planck equation, has the potential to significantly speed up chemical simulations, as initially presented by [6]. Furthermore,
by designing suitable biases, these models could also improve capturing the full coverage of complex distributions dictated by Boltzmann-type energies [8].
References
[1] Lavenant, Hugo, et al. "Towards a mathematical theory of trajectory inference." arXiv preprint arXiv:2102.09204 (2021)
[2] Holdijk, Lars, et al. "Stochastic optimal control for collective variable free sampling of molecular transition paths." Advances in Neural Information Processing Systems 36 (2024).
[3] Kapusniak, Kacper, et al. "Metric Flow Matching for Smooth Interpolations on the Data Manifold." arXiv preprint arXiv:2405.14780 (2024).
[4] Neklyudov, Kirill, et al. "A computational framework for solving Wasserstein Lagrangian flows." arXiv preprint arXiv:2310.10649 (2023).
[5] Bussi, Giovanni, and Michele Parrinello. "Accurate sampling using Langevin dynamics." Physical Review E-Statistical, Nonlinear, and Soft Matter Physics 75.5 (2007): 056707.
[6] Klein, Leon, et al. "Timewarp: Transferable acceleration of molecular dynamics by learning time-coarsened dynamics." Advances in Neural Information Processing Systems 36 (2024).
[7] Corso, Gabriele, et al. "Diffdock: Diffusion steps, twists, and turns for molecular docking." arXiv preprint arXiv:2210.01776(2022).
[8] Akhound-Sadegh, Tara, et al. "Iterated denoising energy matching for sampling from Boltzmann densities." arXiv preprint arXiv:2402.06121 (2024).
Current generative models, while robust in various applications, struggle to represent the manifold complexities and transient dynamics of biological and chemical systems. This is particularly evident in biomedical applications, such as single-cell RNA sequencing, where the destructive nature of cross-sectional samples necessitates the inference of cell dynamics from static, sparse data points [1]. Similarly, in biochemistry, the simulation of transient states in chemical reactions is limited to physically plausible paths beyond simple straight-line trajectories [2].
This project proposes the development of novel generative modeling frameworks that integrate geometric learning and physical laws directly into the model architecture. The expected outcome is more effective generative models for understanding dynamic systems in biomedical research and chemistry, better reflecting (and thus understanding) the underlying dynamics of these systems and speeding up simulations.
Geometric Models
We propose a method called Metric Flow Matching (MFM) [3]-a novel generative framework designed for trajectory inference. It utilizes data- induced Riemannian metrics to learn approximate geodesics. By adopting geodesic paths, MFM adheres more closely to the data's underlying geometry and demonstrates state-of-the-art performance in predicting single-cell trajectories.
Further improvements to MFM could enhance model specificity and accuracy for single-cell trajectories-encoding task-specific biases [4], addressing data imbalance by incorporating optimal transport plans, or refining simulation-free couplings to respect the manifold assumption by designing transport plan with heat kernels.
Physics-Informed
Our next objective is integrating physical constraints (e.g., Langevin dynamics [5]) to enhance molecular transition path sampling [2], Molecular Dynamics [6], or Docking [7]. These constraints could be incorporated into generative models similarly to MFM, replacing geometric bias with physical ones, akin to methods explored in Physics-Informed Neural Networks and Neural Operators instead of Geometric Metrics proposed in MFM.
For instance, in Molecular Dynamics, ensuring balanced conditions characteristic of these dynamics are preserved, along with the corresponding Fokker-Planck equation, has the potential to significantly speed up chemical simulations, as initially presented by [6]. Furthermore,
by designing suitable biases, these models could also improve capturing the full coverage of complex distributions dictated by Boltzmann-type energies [8].
References
[1] Lavenant, Hugo, et al. "Towards a mathematical theory of trajectory inference." arXiv preprint arXiv:2102.09204 (2021)
[2] Holdijk, Lars, et al. "Stochastic optimal control for collective variable free sampling of molecular transition paths." Advances in Neural Information Processing Systems 36 (2024).
[3] Kapusniak, Kacper, et al. "Metric Flow Matching for Smooth Interpolations on the Data Manifold." arXiv preprint arXiv:2405.14780 (2024).
[4] Neklyudov, Kirill, et al. "A computational framework for solving Wasserstein Lagrangian flows." arXiv preprint arXiv:2310.10649 (2023).
[5] Bussi, Giovanni, and Michele Parrinello. "Accurate sampling using Langevin dynamics." Physical Review E-Statistical, Nonlinear, and Soft Matter Physics 75.5 (2007): 056707.
[6] Klein, Leon, et al. "Timewarp: Transferable acceleration of molecular dynamics by learning time-coarsened dynamics." Advances in Neural Information Processing Systems 36 (2024).
[7] Corso, Gabriele, et al. "Diffdock: Diffusion steps, twists, and turns for molecular docking." arXiv preprint arXiv:2210.01776(2022).
[8] Akhound-Sadegh, Tara, et al. "Iterated denoising energy matching for sampling from Boltzmann densities." arXiv preprint arXiv:2402.06121 (2024).
Organisations
People |
ORCID iD |
| Kacper Kapusniak (Student) |
Studentship Projects
| Project Reference | Relationship | Related To | Start | End | Student Name |
|---|---|---|---|---|---|
| EP/S02428X/1 | 31/03/2019 | 29/09/2027 | |||
| 2873903 | Studentship | EP/S02428X/1 | 30/09/2023 | 30/03/2028 | Kacper Kapusniak |