RS Fellow - EPSRC grant (2016): Algebraic and topological approaches for genomic data in molecular biology
Lead Research Organisation:
University of Oxford
Department Name: Mathematical Institute
Abstract
Modern science generates data at an unprecedented rate, often including the measurement of genetic sequence information in time. One aim in molecular biology is to understand the processes that generate these data; this can be achieved by exploring different hypotheses that are translated into mathematical equations called models. The main outcome of my research will be a range of new methods to understand models in different scenarios with varying amounts of data. The focus of this proposal is genetic data.
The molecular interactions at the genetic level often involve enzymes and therefore can be described as biochemical reactions (known and hypothesised). In DNA, a family of proteins called recombinases rearrange DNA sequences. The focus here will be on the class of site-specific recombinases, which only bind to the DNA at certain sites. Biochemically, the DNA is the substrate and the recombinase is the enzyme that catalyses the change.
The mathematical models that study DNA either focus on the changes of the DNA at the nucleotide level or the global structure. Since DNA can be thought of as a string, when a recombinase acts on the DNA, it can also change the knotting of the DNA. The local level analysis mathematically employs algebra, while the global level analysis using topology, a field of mathematics that studies shapes. With recent work by a current PhD student, we have preliminary results that ribbon categories and new theory is required to merge between the local and global view of DNA.
The aim of this project is to develop the mathematical theory and methods further, develop a database of known site-specific recombinases and resulting DNA knots (which exists for a different class of enzymes called topoisomerases) and then create prediction software. Final extensions are how to take into account uncertainty/noise in either the sequence level data or the global structure experimental image data.
The second part of this project is to consider how a knot's configuration relates to its energy. Understanding the knot energies relates to unknots, which relates to a large unsolved problem in knot theory: Is there a polynomial-time algorithm to detect the unknot.
The methods that I will develop require marrying ideas from pure mathematics (in particular from algebra and topology) with computing, statistics, and techniques from applied mathematics. To combine ideas and techniques from different fields that traditionally do not intersect is an exciting opportunity for interdisciplinary research, and the development of new mathematical ideas. I have experience conducting research projects at this intersection, and employing new methods to gain a new understanding of biological systems.
The advances in mathematical methods and algorithms that result from this project, in combination with data-generating technologies, will enable to approach and understand real-world biological systems in new ways.
The molecular interactions at the genetic level often involve enzymes and therefore can be described as biochemical reactions (known and hypothesised). In DNA, a family of proteins called recombinases rearrange DNA sequences. The focus here will be on the class of site-specific recombinases, which only bind to the DNA at certain sites. Biochemically, the DNA is the substrate and the recombinase is the enzyme that catalyses the change.
The mathematical models that study DNA either focus on the changes of the DNA at the nucleotide level or the global structure. Since DNA can be thought of as a string, when a recombinase acts on the DNA, it can also change the knotting of the DNA. The local level analysis mathematically employs algebra, while the global level analysis using topology, a field of mathematics that studies shapes. With recent work by a current PhD student, we have preliminary results that ribbon categories and new theory is required to merge between the local and global view of DNA.
The aim of this project is to develop the mathematical theory and methods further, develop a database of known site-specific recombinases and resulting DNA knots (which exists for a different class of enzymes called topoisomerases) and then create prediction software. Final extensions are how to take into account uncertainty/noise in either the sequence level data or the global structure experimental image data.
The second part of this project is to consider how a knot's configuration relates to its energy. Understanding the knot energies relates to unknots, which relates to a large unsolved problem in knot theory: Is there a polynomial-time algorithm to detect the unknot.
The methods that I will develop require marrying ideas from pure mathematics (in particular from algebra and topology) with computing, statistics, and techniques from applied mathematics. To combine ideas and techniques from different fields that traditionally do not intersect is an exciting opportunity for interdisciplinary research, and the development of new mathematical ideas. I have experience conducting research projects at this intersection, and employing new methods to gain a new understanding of biological systems.
The advances in mathematical methods and algorithms that result from this project, in combination with data-generating technologies, will enable to approach and understand real-world biological systems in new ways.
Planned Impact
Please refer to attached Royal Society application.
People |
ORCID iD |
Heather Harrington (Principal Investigator / Fellow) |
Publications
Yeung E
(2020)
Inference of Multisite Phosphorylation Rate Constants and Their Modulation by Pathogenic Mutations.
in Current biology : CB
Vipond O
(2021)
Multiparameter persistent homology landscapes identify immune cell spatial patterns in tumors.
in Proceedings of the National Academy of Sciences of the United States of America
Thorne T
(2022)
Topological approximate Bayesian computation for parameter inference of an angiogenesis model.
in Bioinformatics (Oxford, England)
Stolz B
(2022)
Multiscale topology characterizes dynamic tumor vascular networks
in Science Advances
Stolz B
(2021)
Topological data analysis of task-based fMRI data from experiments on schizophrenia
in Journal of Physics: Complexity
Seigal A
(2021)
Principal Components along Quiver Representations
Seigal A
(2022)
Principal Components Along Quiver Representations
in Foundations of Computational Mathematics
Description | Barbensi defined the Reidemeister Graph for a knot type K as the collection of planar knot diagrams representing K, connected by Reidemeister moves, and showed that its isomorphism type is a complete knot invariant. We then extended this framework to include combinatorial representations of knot diagrams, known as grid diagrams. This allowed us to define a computational model to simulate the unknotting action of certain enzymes on circular DNA molecules. Parallel to this, we have been focusing on the concept of knotoids, a generalisation of knots dealing with characterising entanglement in open curves, often used to study the topology and geometry of proteins and of other spatial curves. We developed new mathematical theory to detect and distinguish the topological type of knotoids. We then used our results to build new methods for studying the structure of knotted proteins. Some applications of our pipeline include distinguish knotted proteins sharing the same global topological type but differing by geometric features, and obstructing certain folding pathways arising from simulation for a family of knotted proteins. More recently, we have been working in combining these knot-theoretical methods with topological data analysis techniques to study the geometry and topology of spatial curves with noisy input data. We developed a persistent homology based pipeline to study the structure of knotted proteins. We showed that we can cluster trefoil proteins by structural homology class and by their geometric features, and we were able to detect local topological differences in homologous proteins. |
Exploitation Route | All URLs: https://arxiv.org/abs/1801.03313 https://arxiv.org/abs/1811.09121 https://arxiv.org/abs/1909.05937 https://arxiv.org/abs/1909.08556 We expect that others will use the computational grid diagram tool and knotoid structure for studying other biological processes. |
Sectors | Chemicals,Pharmaceuticals and Medical Biotechnology |
URL | https://arxiv.org/abs/1801.03313 |
Description | Application driven Topological Data Analysis |
Amount | £2,847,110 (GBP) |
Funding ID | EP/R018472/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2018 |
End | 08/2024 |
Description | Royal Society Enhancement award |
Amount | £90,000 (GBP) |
Organisation | The Royal Society |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 03/2020 |
End | 04/2021 |
Title | Knoto-EMD, tool for comparing entanglement of open curves |
Description | Improvements to research infrastructure |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | A statistical distance to compare entanglement of open curves. We were able to obstruct folding pathways for a family of trefoil proteins |
URL | https://www.mdpi.com/2073-8994/13/9/1670 |
Description | Deborah Olayide Ajayi |
Organisation | University of Ibadan |
Country | Nigeria |
Sector | Academic/University |
PI Contribution | Collaboration to study geometry of piece-wise linear curves |
Collaborator Contribution | Collaboration to study geometry of piece-wise linear curves, in particular suggesting directions related to knot theory, topological data analysis and higher-order networks. |
Impact | We are currently writing a manuscript |
Start Year | 2021 |
Description | Dimos Goundaroulis |
Organisation | Baylor College of Medicine |
Country | United States |
Sector | Hospitals |
PI Contribution | We're looking at simplification pathways on the graph associated to the knotoid spectrum of a knotted protein. |
Collaborator Contribution | DG provides software and collaborates with mathematical ideas |
Impact | f-distance of knotoids and protein structure https://arxiv.org/abs/1909.08556 |
Start Year | 2019 |
Description | Dorothy Buck |
Organisation | University of Bath |
Department | Department of Mathematical Sciences |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We developed a collaboration with DNA topologist expert. |
Collaborator Contribution | Dorothy contributed ideas, resulting in two joint papers (one published, another to appear). |
Impact | We have written two papers together. |
Start Year | 2017 |
Description | University of Melbourne |
Organisation | University of Melbourne |
Country | Australia |
Sector | Academic/University |
PI Contribution | With the group of Michael Stumpf (including a DPhil student Christian Madsen), we are studying the geometry of curves from biology, specifically chromatin. |
Collaborator Contribution | This collaboration has contributed biological expertise as well as a student to carry out computations. |
Impact | We are currently working on a series of manuscripts towards geometry and piecewise-linear curves in biology. |
Start Year | 2021 |
Title | GridPythonModule |
Description | Barbensi and Celoria have developed a python package to handle knotted grid diagrams, directly inspired by the mathematical model of DNA topoisomerases developed in Barbensi, Agnese, et al. "Grid diagrams as tools to investigate knot spaces and topoisomerase-mediated simplification of DNA topology." Science advances 6.9 (2020): eaay1458. |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
Impact | This software has potential use and implications both in applied and pure topology. |
URL | https://github.com/agnesedaniele/GridPythonModule |
Description | Minicourse for postgrads |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Four hour minicourse on Persistent Homology for the EUTOPIA network |
Year(s) Of Engagement Activity | 2021 |
Description | TDA minicourse |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | We ran a 6 hour mini-course on Topological Data Analysis, with Agnese Barbensi (funded on this grant) as one of the two lecturers. Approximately 60 participants joined to learn about topological data analysis using only linear algebra (very accessible), which has led to widespread interest within the university (physics, medical sciences, engineering) as well as beyond (eg Durham, industry in Oxford, a couple researchers in Germany). |
Year(s) Of Engagement Activity | 2021 |