Particle classification and identification in cryoET of crowded cellular environments

Lead Research Organisation: STFC - LABORATORIES
Department Name: Scientific Computing Department

Abstract

In situ cryogenic electron tomography (cryoET) promises to reveal the distribution and structures of macromolecular complexes across the cell with minimal disturbance to their native context. There have been several proof-of-principle studies but the routine application of this technology is limited by the relatively noisy data, the crowded cellular environment, and the size of the datasets that can be collected. The problem is ideally suited to AI which can learn from the large datasets and give bias-free interpretations of tomograms. There are nevertheless issues with generalisability of trained models and useability by research scientists.

In this proposal, we aim to look into AI techniques for 3D particle classification and identification from in situ tomograms. Specifically, we wish to establish a collaboration with the group of Min Xu at Carnegie Mellon University, who has worked in this area for more than 10 years. We will benchmark a selection of his methods on simulated and real datasets, considering factors from accuracy through to ease-of-use. Within the CCP-EM project, we are developing software pipelines for cryoET, and so we are particularly looking for AI tools that can enhance these pipelines. Part of our evaluation will be to quantify the improvement in downstream results, for example higher resolution sub-tomogram averages, providing essential feedback to Xu.

We also aim to strengthen our collaboration with Zachary Freyberg at the University of Pittsburgh, with whom we are processing in situ cryoET data on disease-associated cell lines and tissues. These datasets will be used to help benchmark the AI tools, while potentially leading to important research outcomes in their own right. By integrating novel AI tools in our CCP-EM tomography pipelines, this work will have a much larger impact. This depends partly on practicalities such as the robustness of the software and the ease with which we can make trained models available, and this will form an important part of the project.

Briefly, we will carry out three tasks: (1) Install selected modules from Xu's AITom package and benchmark on simulated and real datasets, (2) Integrate these tools into the CCP-EM tomography pipeline, and investigate how to optimise the tools in the context of a full investigation, and (3) look into the practicalities of making the software available for general usage, compare with similar tools, and host a workshop for dissemination.

There is obviously a significant amount of work needed to develop in situ cryoET into a routine techqniue. This proposal focusses on one specific aspect, namely the application and adaptation of AI approaches to improve the quality of information that can be obtained. As a proposal to the IPAP scheme, we look to expand our existing network of UK and European collaborators to bring in leading US groups. While the CCP-EM consortium is also developing AI tools, the expertise of Xu's group is complementary, covering different specific AI approaches and with a stronger focus on in situ tomography.

Publications

10 25 50
 
Description The overall aim of this award is to look into AI techniques for 3D particle classification and identification from in situ tomograms. The need is clear - imaging via electron tomography is revealing many details of biological cells, including the distribution of biological macromolecules, and identifying the latter would provide invaluable insights into the inner working of cells. In the everyday world, AI excels at identifying objects, benefitting from the large amounts of training data available online. For biological electron tomography, the situation is less favourable due to a combination of low resolution, noisy images and relatively low quantities of training data.

In this award, we have collated several machine learning models and their software implementations, as well as datasets for training and benchmarking. This has revealed several blockers to the routine use of AI, which we are attempting to address. Firstly, there is a lack of well annotated training data, meaning tomograms with particle positions and identities marked. We have collated and tidied data from several sources, and will make it available (and ML-ready) on the Open Datasets at the Franklin site.

We have tested our in-house software as well as software from our US collaborator. Some of this software has become out-of-date, with standard machine learning software libraries evolving rapidly. We have re-written one algorithm using state-of-the-art tools, and are getting good results for this. We will disseminate updated code via github/gitlab.

Finally, while the software/models available can be used in individual studies, and published, they are far from being user friendly. We are in the process of creating user-orientated tutorials to broaden the application of these methods.
Exploitation Route The datasets, code and tutorials will be maintained by CCP-EM. They will be suitable for extension, and allow others to contribute additional items.
Sectors Digital/Communication/Information Technologies (including Software)

Healthcare

 
Description Co-development of in situ cryogenic electron tomography 
Organisation Carnegie Mellon University
Country United States 
Sector Academic/University 
PI Contribution We have developed computational pipelines for processing cryogenic electron tomograms obtained from cellular samples, extracting target particles and generating sub-tomogram averages. The workflows have made use of CCP-EM's Pipeliner technology, wrapping Relion software and other 3rd party programs. We have run these pipelines on datasets provided by Zach Freyburg at the University of Pittsburgh as part of their research. This provides us with challenging case studies, and returns in situ structures for their research. We are also working with Min Xu at Carnegie Mellon University on the development of AI tools for locating and identifying particles in tomograms. We have worked on the portability of software from Xu and are helping to update and benchmark some of these tools. Freyburg and Xu are Project Partners on a current BBSRC-funded grant.
Collaborator Contribution Zach Freyburg at the University of Pittsburgh has made available several tilt series from cryogenic electron tomography of cellular samples. They have provided the metadata necessary for processing, and have provided feedback on the results of our processing. Min Xu at Carnegie Mellon University has provided early implementations of some AI tools, which we are now working to update and test. He is providing advice on some AI approaches.
Impact Outputs are in situ structures of macromolecular complexes of interest, which will be published, and also updated methods which are implemented in the CCP-EM software suite. This is a collaboration between our computational group, the experimental group of Zachary Freyberg at Pitt, and the machine learning group of Min Xu at Carnegie Mellon University. As part of this collaboration, Min Xu visited us in April 2024 and gave a seminar to the Research Complex at Harwell, and Zach Freyburg visited us for a week in October 2024.
Start Year 2023
 
Description Co-development of in situ cryogenic electron tomography 
Organisation University of Pittsburgh
Country United States 
Sector Academic/University 
PI Contribution We have developed computational pipelines for processing cryogenic electron tomograms obtained from cellular samples, extracting target particles and generating sub-tomogram averages. The workflows have made use of CCP-EM's Pipeliner technology, wrapping Relion software and other 3rd party programs. We have run these pipelines on datasets provided by Zach Freyburg at the University of Pittsburgh as part of their research. This provides us with challenging case studies, and returns in situ structures for their research. We are also working with Min Xu at Carnegie Mellon University on the development of AI tools for locating and identifying particles in tomograms. We have worked on the portability of software from Xu and are helping to update and benchmark some of these tools. Freyburg and Xu are Project Partners on a current BBSRC-funded grant.
Collaborator Contribution Zach Freyburg at the University of Pittsburgh has made available several tilt series from cryogenic electron tomography of cellular samples. They have provided the metadata necessary for processing, and have provided feedback on the results of our processing. Min Xu at Carnegie Mellon University has provided early implementations of some AI tools, which we are now working to update and test. He is providing advice on some AI approaches.
Impact Outputs are in situ structures of macromolecular complexes of interest, which will be published, and also updated methods which are implemented in the CCP-EM software suite. This is a collaboration between our computational group, the experimental group of Zachary Freyberg at Pitt, and the machine learning group of Min Xu at Carnegie Mellon University. As part of this collaboration, Min Xu visited us in April 2024 and gave a seminar to the Research Complex at Harwell, and Zach Freyburg visited us for a week in October 2024.
Start Year 2023
 
Title Affinity-VAE 
Description Affinity-VAE is a framework for automatic clustering and classification of objects in multidimensional image data based on their similarity. The method expands on the concept of ß-VAEs with an informed similarity-based loss component driven by an affinity matrix. The affinity-VAE is able to create rotationally-invariant, morphologically homogeneous clusters in the latent representation, with improved cluster separation compared with a standard ß-VAE. The method has been implemented in a freely available software package.There is a tutorial on how to run Affinity-VAE on the MNIST dataset. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact The software was implemented by a team across Science and Technology Facilities Council, the Rosalind Franklin Institute and the Alan Turing Institute. It has mainly been used for projects in these institutes, both applications and as a reference for the development of other methods. 
URL https://github.com/ccpem/affinity-vae
 
Title Cryoincept 
Description As part of a new collaboration with Min Xu (Carnegie Mellon University, US), we are testing and benchmarking a number of methods for identifying and classifying macromolecules imaged by electron cryo-tomography. Typically we are working with 3D tomograms containing multiple species and high levels of noise. One such method, based on supervised machine learning, was published in Xu et al. (2017) Bioinformatics, 33, i13-i22. The original software is now quite old, and based on the Keras and Tensorflow packages for machine learning. We have therefore re-implemented the published CNN model, which is based on an Inception3D network. We have re-written the code from scratch, using the PyTorch library. This gives a more robust and efficient implementation. The method is now being tested against a separately developed benchmark set. The code is currently in a private gitlab repository. It will be made available under an Open Source licence when it is fully tested. 
Type Of Technology Software 
Year Produced 2024 
Impact None yet.