Handling complexity in huge heterogeneous cryo electron microscopy datasets: visualising biological processes at quasi-atomic resolution

Lead Research Organisation: Imperial College London
Department Name: Life Sciences

Abstract

Biological molecules in solution are the basis of life. Biological macromolecules interact with each other in an aqueous environment, forming complexes that have different conformational states in performing their biological function. Rapid freezing of biological solutions to liquid nitrogen temperatures solidifies the solution to a glassy phase of water ('vitrified' water) that perfectly preserves biological complexes. Vitrified solutions of biological complexes can be imaged directly in the cryogenic specimen holder of the electron microscope. From the images obtained one then calculates three-dimensional (3D) structures of the embedded complexes. The whole procedure is known as 'cryo-EM'. However, in spite of the excellent specimen presentation inside the vitrified solution the raw images produced by cryo-EM are extremely noisy and difficult to process. This is because the biological samples are highly radiation sensitive and start decaying after receiving an electron exposure of only 10el/Å2. At such low exposure levels the 'quantum noise' in the images exceeds the molecular information by orders of magnitude. The group of the applicant has been one of the leading groups in developing the 3D cryo-EM methodology. Cryo-EM has recently yielded impressive results at resolution levels better than ~10Å, with some highly symmetrical viral capsids structures revealed up to ~4Å resolution. Single-particle techniques are thus now approaching resolution levels previously only achievable by X-ray crystallography. A reliable de-novo atomic interpretation of a structure is possible at ~3Å, resolutions hitherto achieved mainly by X-ray crystallography, where the biological molecules are necessarily confined to the rigidity of a crystal. Single-particle cryo-EM methods, in contrast, give a direct window into the full biological complexity of what is happening in solution. Cryo-EM images can contain a plethora of different complexes, in different functional states. Extracting that structural information from the extremely noisy data, at quasi-atomic resolution is a challenge in modern biology. Cryo-EM has the potential to provide information at resolution levels of ~3Å. To achieve this new level of resolution for observing biological machinery in action, we first need excellent state-of-the-art EM instrumentation and effective data-collection pipelines to harvest the required huge data sets. Above all we need to create fast, massively-parallel new and improved computational tools. Our new astigmatic data-collection strategy significantly increases the high-resolution information content of single-particle images and has already lead to a ~4Å resolution structure. Astigmatic data collection will also allow us to process small ~100kD proteins, a size hitherto not amenable to single-particle processing. Our new parallel eigenvector-eigenvalue 'MSA' methodology, capable of handling terabyte-sized data sets, will be exploited in various new procedures including our new 'total data set' transfer-function (CTF) correction. The new parallel MSA pattern recognition programs will also be exploited for separating the different complexes in their different conformational states from their noisy images in the vitrified solution. By the parallelisation of the new overall procedures and by mapping them onto fast new computing hardware, we want to greatly improve the sheer computing power available. The specialist team of method developers are already in place. Advanced EM instrumentation is in place centred in the CBEM national cryo-EM facility which houses the CM 300 liquid Helium instrument. For testing our procedures on the latest cryo-EM technology we have access to one of the first FEI Titan Krios instruments available world-wide. The direct visualization, by cryo-EM, of complex biological machinery in action may open a new chapter in our understanding of biology.

Technical Summary

Impressive cryo electron microscopy (cryo-EM) results have been obtained recently by various groups elucidating single-particle structures of biological complexes at resolution level of ~10Å; with a few very recent exceptions for structures of highly-symmetrical complexes of up to ~4Å resolution (icosahedral viral capsids). Atomic-resolution structures (~3Å) are hitherto mainly solved by X-ray crystallography, with the biological molecules confined to a rigid crystal. The diffraction data yields the average information over all molecules. In cryo-EM, information from each individual macromolecular complex is available digitally in very noisy form. Cryo-EM gives a direct window into the solution, potentially revealing a plethora of different complexes in different functional states. The challenge is to systematically extract that formation. Several new ideas are proposed for achieving that. Using extreme astigmatism for the data collection yields strong high-frequency information contents in the raw data ensuring efficient data collection. Our new parallel eigenvector-eigenvalue 'MSA' methodology, capable of handling terabyte-sized datasets, is to be used as an integral part of all processing steps. This new approach allows one to optimize the data processing based on the principal information content of the whole data set for the: CTF correction; image alignments; Euler angle assignments; and the simultaneous analysis of 3D structures co-existing in the solution. In all cases, huge data sets are a prerequisite for revealing the subtle differences between complexes. The datasets will be collected traditionally on film and with the newest generation of automated EMs: the FEI Krios with new CMOS cameras. This application focuses on new methods, improved software infrastructure, and mapping the parallel software onto fast computing hardware. The expert scientists necessary for achieving the next sophistication level in single-particle cryo-EM are already in place.

Publications

10 25 50

publication icon
Van Heel M (2013) Finding trimeric HIV-1 envelope glycoproteins in random noise. in Proceedings of the National Academy of Sciences of the United States of America