Automated modelling of protein-nucleotide complexes using X-ray data and AlphaFold models

Lead Research Organisation: University of York
Department Name: Chemistry

Abstract

Background:
Building an atomic model into an electron density map is a key stage in the solution of 3D structures by X-ray or EM methods. The new AlphaFold AI software from Google provides theoretical models which can be used to start the model building process for simple protein structures, but not complexes involving other proteins and nucleic acids. YSBL have a history of software for automated model building which may be able to fill this gap.
Objectives:
The aim of the project is to investigate how to combine AlphaFold models for different protein components of a complex with experimental observations using the YSBL-developed 'Buccaneer' and 'Nautilus' software to build atomic models for large complexes.
Novelty:
Previously, X-ray crystallography relied upon either homologous models or complex additional experiments to solve the 'crystallographic phase problem'. In 2021, the AlphaFold AI software was released which provides accurate theoretical models for a wide range of protein molecules based only on their known sequence; while there is great variance in their results, their best models are on-par with those obtained experimentally. Furthermore, the AlphaFold Protein Structure Database provides pre-calculated models for 20,000 human
proteins and many of 19 other biologically relevant organisms. Both methods and data are so new that we are only beginning to discover how best to use them, and so this provides fertile ground for method development with immediate and wide-ranging impact.
Timeliness:
X-ray crystallographic structure solution is increasingly conducted by non-specialists, who often rely on software to produce an accurate structure with limited manual validation. It is therefore increasingly important that the software produces the most complete and accurate model possible. The possibilities recently opened by the AlphaFold method and its associated model database present a timely opportunity for a PhD student to make several world-leading contributions in a time scale commensurate of a PhD program.
Experimental Approach:
The first step will be to assemble a library of solved protein-nucleotide test structures from public resources including the Protein Data Bank, and pick appropriate AlphaFold models for the protein components. The existing model building software will be tested on these structures to build the missing components, in order to identify where new work is required. Algorithm development and optimisation will focus on improving these areas. Experience in computer programming is a prerequisite.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/T007222/1 01/10/2020 30/09/2028
2741770 Studentship BB/T007222/1 01/10/2022 30/09/2026