Software for experimentally driven macromolecular modelling

Lead Research Organisation: Durham University
Department Name: Chemistry

Abstract

Every natural phenomenon, including life itself, is determined by the way simple atoms associate to form molecules. Molecules are intrinsically flexible: depending on local physical conditions they can change shape, associate into stable complexes, or vice versa dissociate. By studying a molecular atomic arrangement (i.e. structure) and its associated dynamics, direct insights into its function can be obtained. Acquiring this knowledge is an essential driver for the development of new biochemical and medical applications. A large palette of experimental techniques has been designed to provide information ranging from low resolution data about molecular shape, to the coordinates and dynamics of individual atoms.

Nevertheless, a single experimental technique can rarely capture all aspects of a system under study. Furthermore, experiments can be sometimes too expensive, complex or dangerous to perform. In this context, computational approaches represent an essential tool. By combining information from different experimental techniques and physical knowledge of atomic interactions, models describing a system can be generated. Such models can help rationalize experimental data, and produce new testable hypotheses to guide further experiments.

The overarching goal of my work is the creation of tools for interpreting and exploiting experimental data targeting the modelling of molecular systems at an atomistic level. My main focus will be an important class of biologically relevant molecules: proteins. Importantly, instead of representing them as a unique structure, I will treat them as ensembles of possible structures (a.k.a. conformations). My specific objectives are:
- assess and visualize experimental data against an ensemble of possible protein conformations. This will allow a subset of alternate conformations consistent with experimental data to be determined;
- predict the arrangement of multiple flexible proteins into a complex so that the produced model is consistent with experimental data. To reach this goal I will develop a powerful optimization engine running on high performance computing, and a new robust method to represent and assess electrostatic interactions;

These new methodologies will be implemented in software allowing complex experimental data to be interpreted with unprecedented accuracy and clarity, and guide the design of new experiments. I will strive to make my software not only easy to access, but also easy to use by creating graphical user interfaces.

The resulting methods will be applied to the study of integrins. These proteins assemble into complexes playing an essential role in cells' adhesion, migration and controlled death. Their malfunction leads to a wide range of diseases, most notably autoimmiunity and cancer. A wealth of experimental data is available. However, owing to inconsistencies between them, no agreement has been reached about integrin exact mechanism of action. Our methods will contribute to rationalizing the available data on the basis of molecular flexibility, shedding a new light on integrin mechanisms and ultimately paving the way to new therapeutic approaces.

My research will take place in Durham University's chemistry department. There I will profit from interactions with a range of experimental collaborators studying molecular conformations with a variety of different techniques, and from the excellent local high performance computing centre. My workload will be shared with a postdoctoral research assistant.

Planned Impact

Molecular modelling allows performing in silico experiments that would be too expensive, complex or dangerous. I will make my software free, easy to access and to use. The objective is to allow non-computational scientists to operate it. I am convinced that achieving this goal will lead to considerable use of my methodology amongst a wide range of bioscientists in both academia and industry. I believe that this will greatly contribute to shedding light into important biological targets, connected to potential biotechnological and biomedical applications. This is well represented by the potential outcomes of my planned collaborations. The structural characterization of enzymes involved in Apicomplexan sphingolipid biosynthesis, for instance, has the potential to open the door for the development of new therapeutic approaches targeting illnesses such as malaria or toxoplasmosis. As another example, small Heat Shock protein mechanism has been linked to a wide range of diseases including cataract, cardiomyopathies, motor neuropathies and degenerative diseases caused by ageing. This research has therefore the potential of positively impacting national and international health.

Publications

10 25 50
 
Description Proteins are class of biopolymers directly responsible for the vast majority of essential cellular functions. Proteins are all composed of the same 20 amino acid building blocks, assembled into sequences that can fold into specific shapes. The shape of a protein determines its capacity of interacting with specific binding partners as different as ions, DNA, lipids, drugs or other proteins. These tightly controlled interactions are key for life as we know it. Molecules should however not be considered as a single atomic arrangement, as their relationship with the environment (e.g. temperature, pressure, pH, binding to other molecules) results in specific conformational dynamics. Those states, as well as the interconversion between them, are connected to specific protein functions in the organism. A full understanding of the function of any protein in an organism thus requires an accurate knowledge of its conformational space, a difficult task for existing experimental techniques. I have been developing and applying computational methods to describe protein dynamics and predict their assembly into biologically functional complexes. We have obtained the following results:

- We have demonstrated that a deep neural network can be trained to generate protein conformations (WP 1 of this award). We have demonstrated that the intermediates generated by the network can be predictive of a transition state, and that newly generated conformations can be leveraged in a protein docking scenario to predict large rearrangements upon binding (WP2).

- We have developed what, at time of publication, was the most accurate protein-protein assembly prediction method for difficult targets (i.e. cases where binding partners are flexible). The method leverages a new volumetric descriptor of molecular structure, dynamics and electrostatics (WP2), and is embodied in the freely available software JabberDock. JabberDock was then extended from soluble to integral membrane proteins, becoming the most accurate method for this class of problems. The method was used to validate results of collaborators, using mass photometry to study dimeric protein BO3 Oxydase.

- We have applied our protein docking methodologies to study small heat shock protein (sHSP), responsible of preventing disastrous protein denaturation in an organism subjected to stress conditions such as heat, oxidation or low/high pH. We have demonstrated that several plant sHSP form tetrahedral assemblies. This has required developing methodologies to assemble proteins according to arbitrary symmetries (WP2).
Exploitation Route All our software is freely available for academia, along with tutorials and user manuals, at https://github.com/Degiacomi-Lab. All data presented in any publication is available in Durham University repository (DRO-DATA).
Sectors Digital/Communication/Information Technologies (including Software),Pharmaceuticals and Medical Biotechnology

 
Title Biobox 
Description Biobox is a Python toolbox for biomolecular modelling. Its unique characteristics are the ability of accounting for multiple conformations, handling different levels of granularity, and enabling calculations of key experimental quantities on each of these (including some measurement type unique to Biobox, e.g. x-linking physical distances). 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact Biobox underpins virtually all the modelling work carried out in the group. 
URL https://degiacomi.org/software/biobox/
 
Title DynamXL 
Description This software allows measuring accurate cross-linking distances between amino acids on a given molecular structure. Its notable aspects is that it measures distances a accessible paths, and not straight lines. Furthermore, it accounts for the fact that the side chain it links to is flexible, and may therefore rearrange. Finally, the software allows measuring such distances on ensembles of alternative conformations, produced by techniques such as crystallography, NMR or via molecular dynamics simulations. The software can be used via a graphical user interface, allowing facile comparison against available experimental data, or via terminal. Furthermore, its backend (programmed in Python) can be imported as a package in external software. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Since it distribution, end of September 2017, DynamXL has been downloaded 29 times by international academic researchers. It is early to directly assess its impact, as no publication using this software has been published yet. I am personally preparing three, due in 2018. 
URL http://dynamxl.chem.ox.ac.uk
 
Title JabberDock 
Description JabberDock tackles the problem of protein-protein docking while accommodating for rearrangements upon binding including side chain re-orientations and backbone flexibility. To this end, JabberDock leverages Spatial and Temporal Influence Density (STID) maps, a representation of proteins surface, electrostatics and local dynamics. Proteins represented as STID maps are docked by maximising their surface complementarity using the POW optimization engine. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact
URL https://degiacomi.org/software/jabberdock/
 
Title Molearn 
Description Molearn implements a convolutional neural network architecture trainable with ensembles of protein atomic conformations generated by experiments or molecular simulation. The trained network can generate new structures consistent with previous examples and physical laws, as defined by a molecular dynamics force field. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact Software distribution started in February 2021, assessment of impacts is premature 
URL https://github.com/degiacom/molearn
 
Description Pint of Science 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Pint of Science event in Durham, attended by a general audience. I delivered a 20 minutes presentation, followed by a Q/A session, on the one of the central topics of this award (studying protein structure and dynamics with machine learning).
Year(s) Of Engagement Activity 2022
 
Description Scientist Next Door 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Scientist Next Door was founded in March 2020 by Dr. Valentina Erastova, Dr. Matteo Degiacomi and Dr. Basile Curchod, in response to the COVID-19 lockdown in the UK. Lockdowns and homeschooling removed the opportunity for children to learn through activities and to engage with peers, overall, inevitably affecting their education. We have organised a pool of over 50 volunteer scientists holding video calls with children from 5 to 15 years old. During calls, scientists and families talk about science that impacts our daily lives, and share ideas and resources. The activity received funding from the Royal Society of Chemistry and Durham University's biosciences Institute. Up not now, we have reached more than 50 families and carried out more than 100 calls.
Year(s) Of Engagement Activity 2020,2021
URL https://www.scientist-next-door.org/
 
Description SoftComp network newsletter 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact The SoftComp network is a European network involving institutions carrying out research in computational soft matter. The yearly newsletter is distributed in both printed and virtual formats to hundreds of stakeholders (academia and industry) across Europe. Each newsletter highlights ~8 research results produced by network members. This included, in the 2022 edition, our work in the application of neural networks to sample protein conformational spaces.
Year(s) Of Engagement Activity 2022
URL https://eu-softcomp.net/wp-content/uploads/2022/03/20220317_Soft-Comp-Newsletter-online.pdf