RS Fellow - EPSRC grant (2016): Algebraic and topological approaches for genomic data in molecular biology

Lead Research Organisation: University of Oxford
Department Name: Mathematical Institute

Abstract

Modern science generates data at an unprecedented rate, often including the measurement of genetic sequence information in time. One aim in molecular biology is to understand the processes that generate these data; this can be achieved by exploring different hypotheses that are translated into mathematical equations called models. The main outcome of my research will be a range of new methods to understand models in different scenarios with varying amounts of data. The focus of this proposal is genetic data.

The molecular interactions at the genetic level often involve enzymes and therefore can be described as biochemical reactions (known and hypothesised). In DNA, a family of proteins called recombinases rearrange DNA sequences. The focus here will be on the class of site-specific recombinases, which only bind to the DNA at certain sites. Biochemically, the DNA is the substrate and the recombinase is the enzyme that catalyses the change.

The mathematical models that study DNA either focus on the changes of the DNA at the nucleotide level or the global structure. Since DNA can be thought of as a string, when a recombinase acts on the DNA, it can also change the knotting of the DNA. The local level analysis mathematically employs algebra, while the global level analysis using topology, a field of mathematics that studies shapes. With recent work by a current PhD student, we have preliminary results that ribbon categories and new theory is required to merge between the local and global view of DNA.

The aim of this project is to develop the mathematical theory and methods further, develop a database of known site-specific recombinases and resulting DNA knots (which exists for a different class of enzymes called topoisomerases) and then create prediction software. Final extensions are how to take into account uncertainty/noise in either the sequence level data or the global structure experimental image data.

The second part of this project is to consider how a knot's configuration relates to its energy. Understanding the knot energies relates to unknots, which relates to a large unsolved problem in knot theory: Is there a polynomial-time algorithm to detect the unknot.

The methods that I will develop require marrying ideas from pure mathematics (in particular from algebra and topology) with computing, statistics, and techniques from applied mathematics. To combine ideas and techniques from different fields that traditionally do not intersect is an exciting opportunity for interdisciplinary research, and the development of new mathematical ideas. I have experience conducting research projects at this intersection, and employing new methods to gain a new understanding of biological systems.

The advances in mathematical methods and algorithms that result from this project, in combination with data-generating technologies, will enable to approach and understand real-world biological systems in new ways.

Planned Impact

Please refer to attached Royal Society application.

Publications

10 25 50

publication icon
Barbensi A (2021) f -distance of knotoids and protein structure in Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences

publication icon
Beers D (2023) Barcodes distinguishing morphology of neuronal tauopathy in Physical Review Research

publication icon
Benjamin K (2023) Homology of homologous knotted proteins in Journal of The Royal Society Interface

publication icon
Bick C (2023) What Are Higher-Order Networks? in SIAM Review

publication icon
COvid-19 Multi-Omics Blood ATlas (COMBAT) Consortium. Electronic Address: Julian.knight@well.ox.ac.uk (2022) A blood atlas of COVID-19 defines hallmarks of disease severity and specificity. in Cell

publication icon
Gross E (2020) Joining and decomposing reaction networks. in Journal of mathematical biology

 
Description Barbensi defined the Reidemeister Graph for a knot type K as the collection of planar knot diagrams representing K, connected by Reidemeister moves, and showed that its isomorphism type is a complete knot invariant. We then extended this framework to include combinatorial representations of knot diagrams, known as grid diagrams. This allowed us to define a computational model to simulate the unknotting action of certain enzymes on circular DNA molecules. Parallel to this, we have been focusing on the concept of knotoids, a generalisation of knots dealing with characterising entanglement in open curves, often used to study the topology and geometry of proteins and of other spatial curves. We developed new mathematical theory to detect and distinguish the topological type of knotoids. We then used our results to build new methods for studying the structure of knotted proteins. Some applications of our pipeline include distinguish knotted proteins sharing the same global topological type but differing by geometric features, and obstructing certain folding pathways arising from simulation for a family of knotted proteins.
More recently, we have been working in combining these knot-theoretical methods with topological data analysis techniques to study the geometry and topology of spatial curves with noisy input data. We developed a persistent homology based pipeline to study the structure of knotted proteins. We showed that we can cluster trefoil proteins by structural homology class and by their geometric features, and we were able to detect local topological differences in homologous proteins.
Exploitation Route All URLs:
https://arxiv.org/abs/1801.03313
https://arxiv.org/abs/1811.09121
https://arxiv.org/abs/1909.05937
https://arxiv.org/abs/1909.08556

We expect that others will use the computational grid diagram tool and knotoid structure for studying other biological processes.
Sectors Chemicals,Pharmaceuticals and Medical Biotechnology

URL https://arxiv.org/abs/1801.03313
 
Description Application driven Topological Data Analysis
Amount £2,847,110 (GBP)
Funding ID EP/R018472/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 08/2018 
End 08/2023
 
Description Royal Society Enhancement award
Amount £90,000 (GBP)
Organisation The Royal Society 
Sector Charity/Non Profit
Country United Kingdom
Start 03/2020 
End 04/2021
 
Title Knoto-EMD, tool for comparing entanglement of open curves 
Description Improvements to research infrastructure 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact A statistical distance to compare entanglement of open curves. We were able to obstruct folding pathways for a family of trefoil proteins 
URL https://www.mdpi.com/2073-8994/13/9/1670
 
Description Deborah Olayide Ajayi 
Organisation University of Ibadan
Country Nigeria 
Sector Academic/University 
PI Contribution Collaboration to study geometry of piece-wise linear curves
Collaborator Contribution Collaboration to study geometry of piece-wise linear curves, in particular suggesting directions related to knot theory, topological data analysis and higher-order networks.
Impact We are currently writing a manuscript
Start Year 2021
 
Description Dimos Goundaroulis 
Organisation Baylor College of Medicine
Country United States 
Sector Hospitals 
PI Contribution We're looking at simplification pathways on the graph associated to the knotoid spectrum of a knotted protein.
Collaborator Contribution DG provides software and collaborates with mathematical ideas
Impact f-distance of knotoids and protein structure https://arxiv.org/abs/1909.08556
Start Year 2019
 
Description Dorothy Buck 
Organisation University of Bath
Department Department of Mathematical Sciences
Country United Kingdom 
Sector Academic/University 
PI Contribution We developed a collaboration with DNA topologist expert.
Collaborator Contribution Dorothy contributed ideas, resulting in two joint papers (one published, another to appear).
Impact We have written two papers together.
Start Year 2017
 
Description University of Melbourne 
Organisation University of Melbourne
Country Australia 
Sector Academic/University 
PI Contribution With the group of Michael Stumpf (including a DPhil student Christian Madsen), we are studying the geometry of curves from biology, specifically chromatin.
Collaborator Contribution This collaboration has contributed biological expertise as well as a student to carry out computations.
Impact We are currently working on a series of manuscripts towards geometry and piecewise-linear curves in biology.
Start Year 2021
 
Title GridPythonModule 
Description Barbensi and Celoria have developed a python package to handle knotted grid diagrams, directly inspired by the mathematical model of DNA topoisomerases developed in Barbensi, Agnese, et al. "Grid diagrams as tools to investigate knot spaces and topoisomerase-mediated simplification of DNA topology." Science advances 6.9 (2020): eaay1458. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact This software has potential use and implications both in applied and pure topology. 
URL https://github.com/agnesedaniele/GridPythonModule
 
Description Minicourse for postgrads 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Four hour minicourse on Persistent Homology for the EUTOPIA network
Year(s) Of Engagement Activity 2021
 
Description TDA minicourse 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We ran a 6 hour mini-course on Topological Data Analysis, with Agnese Barbensi (funded on this grant) as one of the two lecturers. Approximately 60 participants joined to learn about topological data analysis using only linear algebra (very accessible), which has led to widespread interest within the university (physics, medical sciences, engineering) as well as beyond (eg Durham, industry in Oxford, a couple researchers in Germany).
Year(s) Of Engagement Activity 2021