📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Geometric and Physics-Informed Generative Modeling for Biomedicine and Biochemistry

Lead Research Organisation: University of Oxford

Abstract

This project falls within the EPSRC Artificial Intelligence and Robotics research area.
Current generative models, while robust in various applications, struggle to represent the manifold complexities and transient dynamics of biological and chemical systems. This is particularly evident in biomedical applications, such as single-cell RNA sequencing, where the destructive nature of cross-sectional samples necessitates the inference of cell dynamics from static, sparse data points [1]. Similarly, in biochemistry, the simulation of transient states in chemical reactions is limited to physically plausible paths beyond simple straight-line trajectories [2].
This project proposes the development of novel generative modeling frameworks that integrate geometric learning and physical laws directly into the model architecture. The expected outcome is more effective generative models for understanding dynamic systems in biomedical research and chemistry, better reflecting (and thus understanding) the underlying dynamics of these systems and speeding up simulations.
Geometric Models
We propose a method called Metric Flow Matching (MFM) [3]-a novel generative framework designed for trajectory inference. It utilizes data- induced Riemannian metrics to learn approximate geodesics. By adopting geodesic paths, MFM adheres more closely to the data's underlying geometry and demonstrates state-of-the-art performance in predicting single-cell trajectories.
Further improvements to MFM could enhance model specificity and accuracy for single-cell trajectories-encoding task-specific biases [4], addressing data imbalance by incorporating optimal transport plans, or refining simulation-free couplings to respect the manifold assumption by designing transport plan with heat kernels.
Physics-Informed
Our next objective is integrating physical constraints (e.g., Langevin dynamics [5]) to enhance molecular transition path sampling [2], Molecular Dynamics [6], or Docking [7]. These constraints could be incorporated into generative models similarly to MFM, replacing geometric bias with physical ones, akin to methods explored in Physics-Informed Neural Networks and Neural Operators instead of Geometric Metrics proposed in MFM.
For instance, in Molecular Dynamics, ensuring balanced conditions characteristic of these dynamics are preserved, along with the corresponding Fokker-Planck equation, has the potential to significantly speed up chemical simulations, as initially presented by [6]. Furthermore,
by designing suitable biases, these models could also improve capturing the full coverage of complex distributions dictated by Boltzmann-type energies [8].
References
[1] Lavenant, Hugo, et al. "Towards a mathematical theory of trajectory inference." arXiv preprint arXiv:2102.09204 (2021)
[2] Holdijk, Lars, et al. "Stochastic optimal control for collective variable free sampling of molecular transition paths." Advances in Neural Information Processing Systems 36 (2024).
[3] Kapusniak, Kacper, et al. "Metric Flow Matching for Smooth Interpolations on the Data Manifold." arXiv preprint arXiv:2405.14780 (2024).
[4] Neklyudov, Kirill, et al. "A computational framework for solving Wasserstein Lagrangian flows." arXiv preprint arXiv:2310.10649 (2023).
[5] Bussi, Giovanni, and Michele Parrinello. "Accurate sampling using Langevin dynamics." Physical Review E-Statistical, Nonlinear, and Soft Matter Physics 75.5 (2007): 056707.
[6] Klein, Leon, et al. "Timewarp: Transferable acceleration of molecular dynamics by learning time-coarsened dynamics." Advances in Neural Information Processing Systems 36 (2024).
[7] Corso, Gabriele, et al. "Diffdock: Diffusion steps, twists, and turns for molecular docking." arXiv preprint arXiv:2210.01776(2022).
[8] Akhound-Sadegh, Tara, et al. "Iterated denoising energy matching for sampling from Boltzmann densities." arXiv preprint arXiv:2402.06121 (2024).

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S02428X/1 31/03/2019 29/09/2027
2873903 Studentship EP/S02428X/1 30/09/2023 30/03/2028 Kacper Kapusniak