Pattern Recognition for Protein Crystallisation Strategies (AstraZeneca Crystal Atlas)

Lead Research Organisation: University of York
Department Name: Mathematics

Abstract

AstraZeneca Crystal Atlas will implement deep learning and other machine learning methods to utilise all the data and knowledges from historical and ongoing crystallisation experiments to give insights into the complex relationships between compound, experiment conditions and outcomes. It will lead to more efficient crystallisation strategies for accelerating the drug discovery in AstraZeneca and the pharmaceutical industry.
Crystallisation is a trial and error process, scientists do not have practical tools to easily correlate the hidden relationships between crystallogenesis conditions and their outcomes. In this proposal, we aim to create AstraZeneca crystal Atlas to reveal their complex relationships and improve future experiments, thus accelerate the drug discovery process.
AstraZeneca Crystal Atlas is a comprehensive AI driven data and knowledge warehouse combing crystallisation inspection images, outcome labels, protein information and crystallogenesis experiment conditions. By using novel deep learning enabled Knowledge Graph and other data mining methods in AstraZeneca Crystal Atlas, scientists can identify the optimised pathway and reveal the patterns and graph relationships between entities to make better informed decisions for crystallisation strategies with high success rates.
To achieve these, we will first aim to implement deep learning method (DL) to automate the annotations of crystallisation images. AstraZeneca crystallography team has accumulated a large amount of historical crystallisation images which are not systematically labelled. Visually inspecting and labelling these images is a time-consuming work and can be subjective and inconsistent.
Therefore, we propose such an AI driven capability to auto-identify crystals. Deep Neural Networks (DNN) has achieved better accuracies than human experts in many image recognition tasks. MARCO, an DNN prototype from Google [1] shows the feasibility of automated crystallisation image classification. Nonetheless, our evaluation suggested its Page | 26
Studentship Agreement
poor classification accuracy on AstraZeneca generated images, which indicates the importance of training a better DNN model using AstraZeneca datasets.
Our primary study of creating a transfer learning model using DenseNet has achieved better image classification results than MACRO. We propose to adopt Active Learning (AL) [2] into the data training process, which will achieve better performance with only a fraction of the cost or time for data labelling. We will also utilise the rich information from UV images to achieve higher accuracy.
The successful project would lead to an AstraZeneca Crystal Atlas which contains annotated crystallisation images, compounds or proteins, experiment conditions and the knowledge of their relationships. It will be searchable, explorable, and inferable using novel graph-based network [3, 4]. AstraZeneca Crystal Atlas will improve crystallisation success rate and accelerating the drug discovery process, and lead to high impact publications in i) active learning, ii) multiple-modality deep learning for crystallography profile exploration and iii) Graph inference on AstraZeneca Crystal Atlas to be published in scientific journals.
Key References
1. Bruno, Andrew E., et al. "Classification of crystallization outcomes using deep convolutional neural networks." PLOS one 13.6 (2018): e0198883.
2. Zhou, Shusen, Qingcai Chen, and Xiaolong Wang. "Active deep learning method for semi-supervised sentiment classification." Neurocomputing 120 (2013): 536-546.
3. Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv:1609.02907, ICLR 2017.
4. Yang, Zhilin, William W. Cohen, and Ruslan Salakhutdinov. "Revisiting semi-supervised learning with graph embeddings." Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016.

People

ORCID iD

Jamie Milne (Student)

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/V519807/1 01/10/2020 30/09/2025
2440749 Studentship EP/V519807/1 01/10/2020 30/09/2024 Jamie Milne