36175
Comment:
|
44096
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
## page was renamed from e2tomo = EMAN2 Tomography mini-Workflow Tutorial = This version of the EMAN2 Tomography Pipeline tutorial is designed to run on well equipped laptops or standard workstations, unlike the [[EMAN2/e2tomo|full tutorial]] which requires a well-equipped tomography workstation. It should be possible to complete this tutorial in a reasonable time on a computer with 16 GB of ram and 4 cores, but resolution will be limited to ~15 A, not the subnanometer resolution provided by the main tutorial. * This tutorial is best suited for EMAN2 built after 09/27/2018. Not everything described in the tutorial was functioning yet in the 2.22 release. * This version of the tutorial is based on a subset of reduced sampling data from the same EMPIAR data set as the main tutorial. This should be downloaded from the EMAN2 website. The pixel size for this data is 3.93 A/pix. |
= EMAN2 Tomography and Subtomogram/Subtilt Averaging Workflow Tutorial = '''''Important note: Throughout the tutorial you will see a split between (small) and (large) instructions. The (small) tutorial is designed for laptops or basic desktops and can achieve ~12 Å resolution in a time compatible with live tutorial sessions. The (large) tutorial really requires a proper tomography workstation, but can achieve subnanometer resolution. Do not intermix (small) and (large) options below!''''' * This tutorial requires EMAN 2.91 at a minimum. The newer pipeline, which will become the primary pipeline in EMAN3, is also discussed. To use those instructions you should have a current EMAN2.99 snapshot version. * Some additional documentation for the newer pipeline for integrated subtomogram and subtilt refinement can be found in [[https://blake.bcm.edu/emanwiki/EMAN2/e2tomo_new|TomoNew]] * Documentation of some other recently developed tools can be found in [[https://blake.bcm.edu/emanwiki/EMAN2/e2tomo_more|TomoMore]]. |
Line 10: | Line 9: |
* Documentation of some newly developed tools can be found in [[https://blake.bcm.edu/emanwiki/EMAN2/e2tomo_more | TomoMore]]. * There is now a newer pipeline for integrated subtomogram and subtilt refinement. Some documentation can be found in [[https://blake.bcm.edu/emanwiki/EMAN2/e2tomo_new | TomoNew]] (frequently updated). |
|
Line 14: | Line 11: |
* This reduced version of the tutorial can be completed on a well equipped laptop or standard desktop workstation. * Minimum recommended configuration (timing estimates based on single quad-core computer): * 16 GB RAM * 4 cores @ >2 ghz * 40 gb free disk space |
* Minimum recommended configuration: * 16 GB RAM (small), 64 GB RAM (large) * 4 cores @ >2 ghz (small), 16 cores (large) * ~40 gb free disk space (small), ~100 gb free disk space, due to larger raw data (large) |
Line 21: | Line 17: |
''Note: Anyplace in EMAN2 where you are requested to enter the number of threads to use, you should specify the number of cores your machine has. Computers are often advertised as 4 core/8 thread or 8 core/16 thread. Trying to run image processing using this advertised number of threads will usually make processing run slower, not faster. You may optionally increase the number of cores by ~25%, ie - on a 4 core machine, 5 may be a reasonable number to specify.'' | ''Note: Any place in EMAN2 where you are requested to enter the number of threads to use, you should specify the number of '''cores''' your machine has. Computers are often advertised as 4 core/8 thread or 8 core/16 thread. Trying to run image processing using this advertised number of threads will usually make processing run slower, not faster. You may optionally increase the number of cores by ~25%, ie - on a 4 core machine, specifying 5 may give a 5-10% speedup over 4'' |
Line 24: | Line 20: |
* Download the data from the [[http://eman2.org/Tutorials|EMAN2 Tutorials page]] | * (Large) Download only the 4 '''mixed CTEM''' tilt series from EMPIAR: [[https://www.ebi.ac.uk/pdbe/emdb/empiar/entry/10064|EMPIAR 10064]] * (Small) Download the downsampled data from the [[http://eman2.org/Tutorials|EMAN2 Tutorials page]] |
Line 29: | Line 26: |
Line 33: | Line 31: |
* You may use "Edit Project" from the Project menu to set default values for the project. | * You may use "Edit Project" from the Project menu to set default values for the project. |
Line 35: | Line 33: |
* For this project use 3000 kDa, 2.7 mm Cs, 300 keV and 3.93 A/pix. | * For this project use 300 kDa, 2.7 mm Cs, 300 keV and 3.93 (Small) or 1.97 (Large) A/pix. |
Line 39: | Line 37: |
{{attachment:e2pm.png|Project Manager|width=600}} | {{attachment:e2pm.png|Project Manager|width="600"}} |
Line 42: | Line 40: |
* ''Files'' -> select the 3 provided .hdf files | * ''Files'' -> select the 3 (Small) or 4 (Large) provided .hdf files |
Line 45: | Line 43: |
* ''apix'' = 3.93, in later steps you can use -1, which tells it to use the known value | * ''apix'' = 3.93 (Small) or 1.97 (Large), in later steps you can use -1, which tells it to use the known value |
Line 53: | Line 51: |
* It is critical that the filenames for your data not contain any spaces (replace with underscore) or periods (other than the final period used for the file extension). * "__" (double underscore) is also reserved for describing modified versions of the same file, and should not be used in your image filenames. |
* It is critical that the filenames for your data not contain any spaces (replace with underscore) or periods (other than the final period used for the file extension). * {{{"__"}}} (double underscore) is also reserved for describing modified versions of the same file, and should not be used in your image filenames. |
Line 57: | Line 55: |
== Tiltseries Alignment and Tomogram Reconstruction (10 min) == | == Tiltseries Alignment and Tomogram Reconstruction (~10 min Small) == |
Line 61: | Line 59: |
Line 65: | Line 64: |
* ''tltax'' = -4.0 | |
Line 66: | Line 66: |
* ''clipz'' = 64 * ''threads'' = number of physical cores on your machine, optionally *1.25. * If you wish to look at the intermediate aligned tilt-series and other files, uncheck ''notmp'', but note that this will significantly increase disk requirements * This is not required for the remaining steps in the tutorial, but can be used to help understand how the tomogram alignment works. This requires significant additional disk space. You may consider doing this for only one tomogram. * In each ''tomorecon_XX'' folder * ''landmark_0X.txt'' has the location of the landmarks (which may be fiducials if present) in each iteration * ''samples_0X.hdf'' shows the top and side view of those landmarks * ''ptclali_0X.hdf'' has the trace of each landmark throughout the tilt series (they should stay at the center of image all the time if the alignment is good) * ''tomo_0X.hdf'' is the reconstruction after each iteration |
* ''clipz'' = 96 * ''threads'' = number of physical cores on your machine, optionally *1.25. * ''notmp'' = checked by default. If you wish to look at the intermediate aligned tilt-series and other files, uncheck this, but note that this will significantly increase disk requirements * These files are not required for the remaining steps in the tutorial, but can be used to help understand how the tomogram alignment works. You may consider doing this for only one tomogram. * patchtrack = 2 (not really necessary for the tutorial, but shouldn't hurt. This is 3-D patchtracking, not the typical 2-D patchtracking) * extrapad = selected (this will make the tomogram reconstructions take a little longer, but look a bit nicer visually. No impact on the final subtomogram averages) |
Line 77: | Line 74: |
{{attachment:tomorecon.png| Tomogram reconstruction |width=600}} | If you opted to run without ''notmp'' on one or more tilt series: * You will see a ''tomorecon_XX'' folder for each tilt series, containing: * ''landmark_0X.txt'' has the location of the landmarks (which may be fiducials if present) in each iteration * ''samples_0X.hdf'' shows the top and side view of those landmarks * ''ptclali_0X.hdf'' has the trace of each landmark throughout the tilt series (they should stay at the center of image all the time if the alignment is good) * ''tomo_0X.hdf'' is the reconstruction after each iteration * Again, note that these files are not required by any downstream processing. If you have difficulty getting a particular tilt series to reconstruct, and the ''patchtrack=2'' option doesn't help, this may help debug the problem. However, if you do encounter problems, we would like to invite you to ask about it either publicly on the mailing list or privately to the developers. {{attachment:tomorecon.png|Tomogram reconstruction|width="600"}} |
Line 80: | Line 85: |
* We strongly recommend reconstructing at most 5-10 tilt series to start. Then completing the handedness check in the next step before returning to this step and reconstructing the full set of tilt series! This will allow you to specify the correct ''tltax'' value. | |
Line 81: | Line 87: |
* While the program can automatically compute the orientation of the tilt axis, it can lead to a handedness ambiguity in the tomogram (it happens to be correct in the tutorial data). For your own data, it is recommended to confirm the handedness in a few good tomograms, then provide the correct ''tltax'' value for the reconstruction of all tomograms. To determine the handedness computationally, try the [[https://blake.bcm.edu/emanwiki/EMAN2/e2tomo_more#Determine_the_handedness_of_a_tomogram | tutorial here]] for EMAN2 build after 05/23/2019 (or EMAN>=2.31). | |
Line 84: | Line 89: |
* The graphical interface only permits 1k or 2k reconstruction sizes. In our experience this is normally sufficient for segmentation or particle picking. * When the sample is thick, some grid-like tiling pattern can be seen in the reconstruction. Checking '''extrapad''' can largely reduce the artifacts. In versions after 2/3/2020, there is also a '''moretile''' option that further eliminates them. Note these artifacts will NOT impact the subtomogram averaging results because the particles are extracted in a separate process. Checking these options can make the reconstruction process more memory consuming, and up to 5 times slower. |
* The graphical interface only permits 1k or 2k reconstruction sizes. In our experience this is normally sufficient for segmentation or particle picking. * When the sample is thick, some grid-like tiling pattern can be seen in the reconstruction. Checking '''extrapad''' can largely reduce the artifacts. In versions after 2/3/2020, there is also a '''moretile''' option that further eliminates them. Note these artifacts will NOT impact the subtomogram averaging results because the particles are extracted in a separate process. Checking these options can make the reconstruction process more memory consuming, and up to 5 times slower. |
Line 88: | Line 93: |
== Handedness Check (can be skipped in the tutorial) == EMAN2 includes a novel procedure for determining the correct tilt axis for a tilt series based on defocus estimates across the tilted images in a tilt series. The tutorial data set comes out correctly without running this check, but when working with your own data, this step is highly recommended. Once you know the correct tilt axis direction to use for a given microscope/camera, you shouldn't need to run this test on every data set, but it may not be a bad idea even then, as there are various possible configuration/software errors on the instrument which could potentially cause inconsistent results. For the tutorial tilt-series: |
* There is a new ''patchtrack'' option which may help with refinements that otherwise don't seem to be aligning well. Note that this is not conventional 2-D patchtracking, but a novel 3-D patchtracking routine. This acts as a preprocessing step if specified (2 is recommended). == Handedness Check == EMAN2 will automatically locate the tilt axis in a tilt series if it is not provided, but there is a 180° ambiguity in this determination. An incorrect choice will lead to structures with the incorrect handedness, and may produce suboptimal CTF correction. In some data sets (not the tutorial) this may lead to particles with mixed handedness. Since we specified ''tltax'' above, this step isn't necessary for the tutorial, but you can run it to see what the results look like. EMAN2 includes a novel procedure for resolving this ambiguity from a tilt series based on defocus estimates across the tilted images. The tutorial data set comes out correctly without running this check, but when working with your own data, this step is highly recommended. Once you know the correct tilt axis direction to use for a given microscope/camera, you shouldn't need to run this test on every data set, but it may not be a bad idea even then, as there are various possible configuration/software errors on the instrument which could potentially cause inconsistent results, particularly with a change of magnification. For the tutorial tilt-series: |
Line 97: | Line 106: |
* ''dfrange'' = 1.0,4.0,0.02 | |
Line 108: | Line 118: |
If you run this check on multiple images and it seems that they indicate a tilt axis/handedness error, then you need to return to the previous step (Tomogram Reconstruction) and run this again for all of your tomograms, with the correct tilt axis entered in the corresponding box. The same tilt axis should be used for all tilt series collected under the same conditions on the same instrument. ''Note: This method removes __almost__ all of the ambiguity about particle handedness. The one potential issue is that the MRC file format uses a non-conventional origin for images. If the data collection software doesn't take this into account, the images may be flipped when written to disk. The easiest way to check the software would be to collect 2 images of the same target and save them directly into different file formats, then checking (in different software) whether the two images appear to have the same handedness'' == CTF Estimation (<10 min) == For the tutorial tilt-series: |
If you run this check on multiple images and it seems that they consistently indicate a flipped tilt axis/handedness, then you need to return to the previous step (Tomogram Reconstruction) and redo the reconstruction for all tomograms, with the correct tilt axis entered in the corresponding box. The same tilt axis should be used for all tilt series collected under the same conditions on the same instrument. The automatic value may vary a little among micrographs, just compute the approximate average or median value. ''Note: This method removes __almost__ all of the ambiguity about particle handedness. The one potential issue is that the MRC file format uses a non-conventional origin for images. If the data collection software doesn't take this into account, the images may be flipped when written to disk. The easiest way to check the software would be to collect 2 images of the same target and save them directly into different file formats, then checking (in EMAN2) whether the two images appear to have the same handedness''. If not, it is likely that the MRC files are incorrect. == CTF Estimation (<5 min) == '''Do NOT forget this step!''' This step will determine the defocus as a function of location for each tilt in each tilt series. This information is stored in the headers of particles as they are extracted from the tilt series, and used for CTF correction during subtomogram averaging. If you forget to do this, you will need to re-run the particle extraction step, which is quite time consuming. For the tutorial tilt-series: |
Line 118: | Line 131: |
* ''checkhand'' = not selected | * ''dfrange'' = 1.0,4.0,0.02 * ''' ''checkhand'' = not selected ''' |
Line 122: | Line 136: |
* The first two options, ''dfrange'' and ''psrange'' indicate the defocus and phase shift range to search. They take the format of “start, end, step”, so “2, 5, .1” will search defocus from 2 to 5 um with a step size of 0.1. Units for phase shift is degrees. | * The first two options, ''dfrange'' and ''psrange'' indicate the defocus and phase shift range to search. It is critical that the actual defocus be within the search range (obviously). They take the format of “start, end, step”, so “2, 5, .1” will search defocus from 2 to 5 um with a step size of 0.1. Units for phase shift is degrees. |
Line 127: | Line 142: |
''Note: In >2022 snapshots of EMAN2 it is possible after CTF correction to return to the 3-D reconstruction step and produce CTF corrected whole tomograms, but this does nothing useful when following the EMAN2 pipeline. If you wish to compare EMAN2 tomograms with other software doing CTF correction, this could potentially be useful'' == Tomogram reconstruction evaluation (optional) == {{attachment:tomo_evaluation.png| Tomogram evaluation |width=600}} |
''Note: In >=2022 snapshots of EMAN2 it is possible after CTF correction to return to the 3-D reconstruction step and produce CTF corrected whole tomograms, but this does nothing useful when following the EMAN2 pipeline. If you wish to compare EMAN2 tomograms with other software doing CTF correction, this could potentially be useful'' == Tomogram reconstruction evaluation == {{attachment:tomo_evaluation.png|Tomogram evaluation|width="600"}} |
Line 135: | Line 149: |
* On the left is a list of tomograms in the project. | The ''correctrot'' option often does not work well on tutorial tomo3. If you go through the tomograms you should see ribosomes spanning the plane of the image for all of the tomograms. If tomo3 shows only a narrow band containing ribosomes, return to the tomogram reconstruction step above, and re-run the process for only that tomogram (uncheck ''alltiltseries'', and select tiltseries/*tomo3.hdf. Also uncheck ''correctrot''). * On the left is a list of tomograms in the project. * Selecting a tomogram will show a thumbnail in the image pane on the right. |
Line 145: | Line 162: |
* Please note that most tomograms include some out-of-plane tilt (the actual rotation isn't a simple tilt along a single axis), which is taken into account during alignment. This may make it visually appear that the tilt series alignment is not as robust as it actually is. | * Please note that most tomograms include some out-of-plane tilt (the actual rotation isn't a simple tilt along a single axis), which is taken into account during alignment. This may make it visually appear that the tilt series alignment is not as robust as it actually is. |
Line 149: | Line 166: |
* ''Tiltparams'' is a bit more complicated. It plots a point list with 6 columns and a number of rows corresponding to the images in the selected tilt series. These are the alignment parameters for the tilt series. | * ''Tiltparams'' is a bit more complicated. It plots a point list with 6 columns and a number of rows corresponding to the images in the selected tilt series. These are the alignment parameters for the tilt series. |
Line 160: | Line 177: |
== Tomogram annotation (optional alternative to process below, GPU recommended) == {{attachment:annotation.png| 2D particle picking |width=600}} * Since the tutorial data set is purified ribosomes, this step can be skipped for the tutorial data, and you can move on to template-based particle picking. For cells or other types of complex specimens, tomogram annotation can be used to produce locations of different types of objects. There is a newer tool which can be used for deep-learning based particle picking, which is really a different task than annotation. If you have a GPU and prefer this over the reference based approach outlined in the next section, see: [[EMAN2/e2tomo_more#Automated_particle_selection|New automatic particle picking]]. The annotation tool is still functional and available, but is targeted more at annotation of cellular features: [[http://eman2.org/Programs/tomoseg|TomoSeg]] This is a brief summary of the annotation-based approach: |
== Particle Picking Choices == There are 4 different tools you can use for particle picking in EMAN2 as of Feb 2022: 1. A new deep-learning based 3-D picker. Not available in 2.91, must use a recent (2022+) snapshot. See: [[EMAN2/e2tomo_more#Automated_particle_selection|New automatic particle picking]] 1. Abusing the deep-learning based segmentation tool for particle picking purposes 1. Manual particle picking 1. Template based picking (usually seeded with some manual picking results) For live versions of this tutorial, we use the older manual+template based approach as it requires no specific hardware, and is a good learning experience, but the deep learning 3-D picker is a good choice for typical situations. For cellular tomograms, the annotation tool approach may still be a good choice. === (1) Tomogram annotation (GPU recommended) === {{attachment:annotation.png|2D particle picking|width="600"}} * Since the tutorial data set is purified ribosomes, this step can be skipped for the tutorial data, and you can move on to template-based particle picking. For cells or other types of complex specimens, tomogram annotation can be used to more easily distinguish locations of different types of objects. For a detailed description of how to use the annotation tool, see: [[http://eman2.org/Programs/tomoseg|TomoSeg]] Here is a brief summary of the annotation-based approach: |
Line 171: | Line 197: |
* This step is not always necessary for tomograms reconstructed in EMAN2, but may slightly improve results. | * This step is not always necessary for tomograms reconstructed in EMAN2, but may slightly improve results. |
Line 173: | Line 199: |
* This is a newer interface than previously used for this step. Select a few "Good" (regions containing the feature of interest) and "Bad" (regions not containing the feature of interest) boxes. | * This is a newer interface than previously used for this step. Select a few "Good" (regions containing the feature of interest) and "Bad" (regions not containing the feature of interest) boxes. |
Line 183: | Line 209: |
* Input both the tomogram and its corresponding segmentation, and the particles coordinates will be written into the metadata file. * Slightly tweaking the threshold parameters may yield better results. |
* Input both the tomogram and its corresponding segmentation, and the particles coordinates will be written into the metadata file. * Slightly tweaking the threshold parameters may yield better results. |
Line 187: | Line 213: |
== Particle picking (10-15 min) == {{attachment:ptclpicking.png| 3D particle picking |width=600}} |
== Manual particle picking (10-15 min) == {{attachment:ptclpicking.png|3D particle picking|width="600"}} |
Line 197: | Line 223: |
* The box size can be set in the main window at the left bottom corner, for this tutorial, use 32 for ribosomes (the unbinned box size is 128). | * The box size can be set in the main window at the left bottom corner, for this tutorial, use 32 for ribosomes (the unbinned box size is 128). |
Line 200: | Line 226: |
* Hold down Shift when clicking to delete existing boxes. * Boxes are shown as circles, which vary in size depending on the Z distance from the center of the particle. * If you accidentally include one or more particles with nearby gold fiducials or other high contrast artifacts, it may cause some issues with your generated model (do not do that). |
* Hold down Shift when clicking to delete existing boxes. * Boxes are shown as circles, which vary in size depending on the Z distance from the center of the particle. * If you accidentally include one or more particles with nearby gold fiducials or other high contrast artifacts, it may cause some issues with your generated model (do not do that). |
Line 205: | Line 231: |
* The interface supports different box types within a single tomogram. Each type has a label. Make sure the label is consistent if selecting the same feature in different tomograms. * If you skipped the tomogram annotation step, we will pick a few particles here to generate an initial model, and use the initial model as a reference for template matching. * Select 30-50 particles from one tomogram, then close the boxer window. * Do '''not''' use the ''Save'' button in the Options window or any of the menu items related to saving data. Those are available for special purposes. When you change the boxes, your changes are saved immediately and automatically. When you are done, simply close the main window. |
* The interface supports different box types within a single tomogram. Each type has a label. Make sure the label is consistent if selecting the same feature in different tomograms. * If you skipped the tomogram annotation step, we will pick a few particles here to generate an initial model, and use the initial model as a reference for template matching. * Select 30-50 particles from one tomogram, then close the boxer window. * Do '''not''' use the ''Save'' button in the Options window or any of the menu items related to saving data. Those are available for special purposes. When you change the boxes, your changes are saved immediately and automatically. When you are done, simply close the main window. |
Line 214: | Line 240: |
In this pipeline, the full 1k or 2k tomograms are used only as a reference to identify the location of the objects to be averaged. Now that we have particle locations, the software returns to the original tilt-series, extracts a per-particle tilt-series, and reconstructs each particle in 3-D independently. For the tutorial tilt-series: |
In this pipeline, the full 1k or 2k tomograms are used only as a reference to identify the location of the objects to be averaged. Now that we have particle locations, the software returns to the original tilt-series, extracts a per-particle tilt-series, and reconstructs each particle in 3-D independently. For the tutorial tilt-series: |
Line 220: | Line 246: |
* set ''boxsz_unbin'' to 128. | * set ''boxsz_unbin'' to 128. |
Line 227: | Line 253: |
* '''Subtomogram Averaging -> Build Sets''' | * '''Subtomogram Averaging -> Build Sets''' |
Line 231: | Line 257: |
Line 233: | Line 259: |
Line 237: | Line 264: |
Line 239: | Line 266: |
While intuitively it seems like (since the particles are already in 3-D) the concept of an "initial model" should not be necessary. Unfortunately, due to the missing wedge, and the low resolution of one individual particle (particularly from cells), it is actually critical to make a good starting average. Historically it has been challenging to get a good starting model, depending on the shape of the molecule. This new procedure based on stochastic gradient descent has proven to be quite robust, but it is difficult for the computer to tell when it has converged sufficiently. For this reason, the default behavior is to run much longer than is normally required, and have a human decide when it's "good enough" and terminate the process. If you use a small ''shrink'' value and let it run to completion, it can take some time to run. This is harmless, but unnecessary. While the section below the solid line remains fully functional, a new program available since 2021 does a much more efficient job of making initial models. It hasn't been integrated into e2projectmanager yet, but it is enough of an improvement, we will go ahead and use it here regardless. The original instructions are preserved below the horizontal line if you prefer the older approach. If you didn't launch e2projectmanager with an & at the end of the line, you will need to exit it (close the windows) to run the following command. Replace the 4 in thread:4 with the appropriate number of threads for your computer. If you called your set something other than initribo, you may need to change that as well. {{{ e2spt_sgd_new.py sets/initribo.lst --res 50 --parallel thread:4 }}} |
While intuitively it seems like (since the particles are already in 3-D) the concept of an "initial model" should not be necessary. Unfortunately, due to the missing wedge, and the low resolution of one individual particle (particularly from cells), it is actually critical to make a good starting average. Historically it has been challenging to get a good starting model, depending on the shape of the molecule. This new procedure based on stochastic gradient descent has proven to be quite robust, but it is difficult for the computer to tell when it has converged sufficiently. For this reason, the default behavior is to run much longer than is normally required, and have a human decide when it's "good enough" and terminate the process. If you use a small ''shrink'' value and let it run to completion, it can take some time to run. This is harmless, but unnecessary. While the section below the solid line remains fully functional, a new program available since 2021 does a much more efficient job of making initial models. It hasn't been integrated into e2projectmanager yet, but it is enough of an improvement, we will go ahead and use it here regardless. The original instructions are preserved below the horizontal line if you prefer the older approach. === New Initial Model Generator === {{attachment:initial_model.png|Initial model generation|width="600"}} The new initial model generator was only added to e2projectmanager in Feb 2022, so if you have an older version, you may need to run it from the command-line. If it isn't available from the command-line either, you will need to use the older tool below. * '''Subtomogram Averaging -> New initial model generator''' * ''particles'' = ''Browse'' for sets/initribo.lst (or whatever you called your set) * ''res'' = 50 This is a low resolution initial model. No reason to push it. * ''niter'' = 10 You may make this up to 100, and simply kill the job when the model seems good enough. 10 is fine for this tutorial, and often you can even stop after ~5 * ''shrink'' = 2 For speed during live tutorials * ''parallel'' = thread:4 Replace this with the correct number of threads for your machine as above. If you have limited disk space on /tmp, you can also specify a temporary folder: thread:4:/home/username/tmp * ''Launch'' If you do not see ''New initial model generator'' in e2projectmanager, you can run it from the command line, replacing appropriate options: {{{ e2spt_sgd_new.py sets/initribo.lst --res 50 --niter 10 --shrink 2 --parallel thread:4 }}} |
Line 249: | Line 287: |
Line 252: | Line 291: |
iter 0, class 0: | iter 0, class 0: |
Line 254: | Line 293: |
iter 1, class 0: | iter 1, class 0: |
Line 256: | Line 295: |
iter 2, class 0: | iter 2, class 0: |
Line 259: | Line 298: |
Once it gets past 3-4 iterations, you can use the browser to look in ''sptsgd_00'', and double-click on ''output_cls0.hdf''. This file will change as more iterations complete. It contains the results of the most recent iteration. If you double click on it again later, it will load another map into the same 3-D display. You can then open the control-panel for the 3-D display (middle-click) and use the ''Seq'' slider to cycle through the maps. When you are satisfied with the quality of the initial model, press ctrl-C, which will kill the initial model generating job. At this point you can also close the browser window and relaunch ''e2projectmanager.py''. ---- {{attachment:initial_model.png| Initial model generation | width=600}} This section is the older program, which is still functional, and is integrated into the project manager. If you completed the section above the line, you can skip to the Template Matching section. For the tutorial tilt-series: |
Once it gets past 3-4 iterations, you can use the browser to look in ''sptsgd_00'', and double-click on ''output_cls0.hdf''. This file will change after each iteration completes. It contains the results of the most recent iteration. When you are satisfied with the quality of the initial model, you can kill it with the task manager in e2projectmanager. For your own data: * If your particle has known ''symmetry'', specify that [[EMAN2/Symmetry]] * setting ''shrink'' to something in the range of 2-4 will make the runtime faster but, depending on the shape, could potentially cause problems. * using more than the minimal 30-50 particles is fine. If you have a very large set of selected particles, go ahead and use them all. This will not slow the process down at all, since it's stochastic. * it is critical that the full sampling box size of the extracted particles divided by ''shrink'' be divisible by 2. If not, the program will crash. === Old Initial Model Generator === This section is the older program, which is still functional, and is integrated into the project manager For the tutorial tilt-series: |
Line 273: | Line 317: |
* The default ''niter'' of 5 is typically much more than is required | * The default ''niter'' of 5 is typically much more than is required |
Line 278: | Line 322: |
Line 285: | Line 330: |
In this step, we will use the initial model you just produced as a template for finding all of the ribosomes in all 4 tomograms. If you completed the '''Tomogram Annotation''' step above, and have already extracted a full set of 1000+ particles, then you can skip this step, as we already have all of the particles. Note that here, and everywhere else in the tomography pipeline, reconstructed particles have positive contrast (look white in projection) and tomograms/tilt series have negative contrast (look dark in projection). If you wish to use a reference volume from the PDB or somesuch, then it should have positive contrast as is normal in the single particle CryoEM field. |
In this step, we will use the initial model you just produced as a template for finding all of the ribosomes in all 3 tomograms. If you completed the '''Tomogram Annotation''' step above, and have already extracted a full set of 1000+ particles, then you can skip this step, as we already have all of the particles. Note that here, and everywhere else in the tomography pipeline, reconstructed particles have positive contrast (look white in projection) and tomograms/tilt series have negative contrast (look dark in projection). If you wish to use a reference volume from the PDB or somesuch, then it should have positive contrast as is normal in the single particle CryoEM field. |
Line 292: | Line 336: |
* ''nptcl'' = 150 | * ''nptcl'' = 150 |
Line 296: | Line 340: |
* This threshold is in terms of the number of standard deviations above the mean. The definition was slightly different when the default was 10. | |
Line 299: | Line 344: |
* when this finishes, you can use the same '''Manual Boxing''' tool you used before to look at the particles which were selected. You may wish to manually remove any bad particles it selected. For the tutorial data set or other tomograms of purified protein, this process should work pretty well. For cells you might wish to use the '''Tomogram Annotation''' method above. | * Note: if you see an error, with some mention of zero, it is likely you forgot to reduce vthr to 5. * when this finishes, you can use the same '''Manual Boxing''' tool you used before to look at the particles which were selected. You may wish to manually remove any bad particles it selected. For the tutorial data set or other tomograms of purified protein, this process should work pretty well. |
Line 302: | Line 348: |
For your own data: * Frankly, in most cases the deep learning based picker will do a better job. For cells, you may consider the full Tomogram Annotation tool. This program works with highly downsampled tomograms and references for speed. * Picking a good ''vthr'' and a reasonable ''nptcl'' maximum may take a little trial and error. Test on 1-3 tomograms before the full run. |
|
Line 303: | Line 353: |
Again, if you already did '''Tomogram Annotation''' above, this step isn't necessary. It is only required if you just did '''Template Matching'''. Since this involves several thousand particles instead of 30-50, it will take quite a lot longer to run. The actual time will depend partially on the speed of your storage. For the tutorial tilt-series: |
If you already extracted a complete set of particles (not just the few initial references) above, you don't need to repeat it again here. Since this involves several hundred particles instead of 30-50, it will take quite a lot longer to run. For the tutorial tilt-series: |
Line 311: | Line 361: |
* set ''boxsz_unbin'' to 128. * set ''label'' to "ribo" * Launch * '''Subtomogram Averaging -> Build Sets''' |
* ''boxsz_unbin'' = 128 * ''label'' = ribo * ''compressbits'' = 5 * Launch == Build Sets (again) == * '''Subtomogram Averaging -> Build Sets''' |
Line 317: | Line 369: |
* You can leave ''label'' blank. It will just regenerate all of the .lst files. | |
Line 319: | Line 372: |
* Generally just takes a few seconds. If you have a large project with hundreds of tomograms it will take a bit longer. | |
Line 321: | Line 375: |
There is a new refinement program which implements both traditional subtomogram averaging and subtilt refinement in a single program. Like the other new software referenced above, this new program is not yet integrated into e2projectmanager, and must be run from the command line. This is an alternative to the next two major sections (Subtomogram Refinement and Subtilt Refinement). The full tutorial on the new program is [[https://blake.bcm.edu/emanwiki/EMAN2/e2tomo_new | here]]. You may need to replace ''sets/ribo.lst'' with whatever you named your set. Replace both ''4''s with the number of threads for your machine. If you didn't use the "new" style initial model generation above, you may also need to alter --ref. Run the following with necessary changes: {{{ e2spt_refine_new.py --ptcls sets/ribo.lst --ref sptsgd_01/output_cls0.hdf --iters p,p,p,t,p,t,r --parallel thread:4 --threads 4 }}} == Subtomogram refinement (~1 hr/iteration) == {{attachment:refinement.png| 3D refinement | width=600}} |
There is a new refinement program which implements both traditional subtomogram averaging and subtilt refinement in a single program. This is an alternative to the next two major sections (Subtomogram Refinement and Subtilt Refinement). The full tutorial on the new program is [[https://blake.bcm.edu/emanwiki/EMAN2/e2tomo_new|here]]. Integration of this program into e2projectmanager isn't quite complete yet (but should be by mid-March 2022). For the moment, we will run this command directly from the command-line. We will do this as 2 sequential refinement runs for efficiency, and to show how you can continue a refinement. The initial refinement should turn the initial model into something ribosome shaped, but at low resolution. The second refinement should get to ~12 Å resolution. * You may need to replace ''sets/ribo.lst'' with whatever you named your set. * Replace both ''4''s with the number of threads for your machine. * If you didn't use the "new" style initial model generation above, you may also need to alter --ref. {{{ e2spt_refine_new.py --ptcls sets/ribo.lst --ref sptsgd_00/output_cls0.hdf --iters p,p,t --goldstandard --startres 50 --tophat local --parallel thread:4 --threads 4 }}} When that's done, you should have a ''spt_00'' folder containing the results of the 3 iterations we requested. You may take a look at: * fsc_maskedtight_XX.txt - These are the masked gold-standard resolution curves after each iteration. Double click in the EMAN2 browser to plot them. * threed_XX.hdf - These are the 3D maps after each iteration. Again, double-click in the browser to visualize. The 3rd map should look considerably more like a ribosome than the initial model (threed_00.hdf), but the resolution will still be limited. In the next run, we will do more subtilt refinement to push the resolution. We could have done this all in a single run, but simply including the additional ''--iters'' letters except for one key addition. The ''--maxres 12'' argument indicates the highest resolution information to consider in the refinement. If this is not specified, it will be determined automatically based on the results of the previous iteration. This will make the iterations run faster, but it may make them progress towards the final resolution more slowly. So, we use the automatic method for 3 iterations to get the shape correct quickly, then in the second run, push the resolution. * note that ''--ref'' points at the final map from the previous run * ''--loadali2d'' and ''--loadali3d'' load the final alignment parameters from the first run. Note that aliptcls3d files are only produced for 'p' iterations. * ''--goldcontinue'' tells the refinement to continue with the even/odd pairs from the previous refinement, rather than phase randomizing the data at high resolution again * ''--maxres'' specifies the maximum resolution information to consider during alignment. If omitted, it uses an automatic scheme. * ''--tophat local'' enables local resolution measurement and filtration, it adds ~2 minutes per iteration. ''global'' will filter everything uniformly. ''e2help.py tophat'' {{{ e2spt_refine_new.py --ptcls sets/ribo.lst --ref spt_00/threed_03.hdf --loadali2d spt_00/aliptcls2d_03.lst --loadali3d spt_00/aliptcls3d_02.lst --goldcontinue --iters t,p,t,r,d --keep 0.95 --tophat local --parallel thread:4 --threads 4 --maxres 12 }}} That's it. You hopefully have a ~12 Å resolution map. If you wish to try and push the resolution further, you can download the original 4k tilt series from EMPIAR and pick all of the particles (instead of just ~450), and go through the process again. Skip the next 2 sections about the old refinement. Other notes are below that. == Old Subtomogram refinement (~1 hr/iteration) == {{attachment:refinement.png|3D refinement|width="600"}} As an alternative to the new integrated tool above, the older pair of programs is still available. You shouldn't need to do both approaches. This step is similar to the "p" iterations above, though it uses an older algorithm. |
Line 335: | Line 417: |
Line 343: | Line 426: |
Results will gradually appear in spt_XX/ Feel free to look at intermediate results with the EMAN2 file browser as they appear. |
Results will gradually appear in spt_XX/ Feel free to look at intermediate results with the EMAN2 file browser as they appear. |
Line 347: | Line 429: |
Line 351: | Line 434: |
== Subtilt refinement (~9 hr/iteration) == {{attachment:subtlt_dir.png| Subtilt refinement directory |width=600}} |
== Old Subtilt refinement (~9 hr/iteration) == {{attachment:subtlt_dir.png|Subtilt refinement directory|width="600"}} This is the second half of the old refinement strategy. It is conceptually similar to the t,p and r iterations in the newer integrated program above. |
Line 358: | Line 441: |
Line 366: | Line 450: |
Line 369: | Line 454: |
Line 373: | Line 458: |
{{attachment:refinement_evaluation.png| Refinement evaluation |width=600}} This tool helps visualize and compare results from multiple subtomogram refinement runs. |
{{attachment:refinement_evaluation.png|Refinement evaluation|width="600"}} This tool helps visualize and compare results from multiple subtomogram refinement runs. |
Line 378: | Line 461: |
* In the GUI, you can look at all ''spt_XX'' or ''sptsgd_XX'' folders and compare the parameters which were used for each, as well as the resulting maps. * Switch between folder types using the menu at top right. * Columns can be sorted by clicking on the corresponding header. |
* In the GUI, you can look at all ''spt_XX'' or ''sptsgd_XX'' folders and compare the parameters which were used for each, as well as the resulting maps. * Switch between folder types using the menu at top right. * Columns can be sorted by clicking on the corresponding header. |
Line 382: | Line 465: |
* ''!ShowBrowser'' will bring up the ''[[http://blake.bcm.edu/emanwiki/EMAN2/Programs/e2display|e2display.py]]'' browser in the folder of the selected row. | * ''!ShowBrowser'' will bring up the ''[[http://blake.bcm.edu/emanwiki/EMAN2/Programs/e2display|e2display.py]]'' browser in the folder of the selected row. |
EMAN2 Tomography and Subtomogram/Subtilt Averaging Workflow Tutorial
Important note: Throughout the tutorial you will see a split between (small) and (large) instructions. The (small) tutorial is designed for laptops or basic desktops and can achieve ~12 Å resolution in a time compatible with live tutorial sessions. The (large) tutorial really requires a proper tomography workstation, but can achieve subnanometer resolution. Do not intermix (small) and (large) options below!
- This tutorial requires EMAN 2.91 at a minimum. The newer pipeline, which will become the primary pipeline in EMAN3, is also discussed. To use those instructions you should have a current EMAN2.99 snapshot version.
Some additional documentation for the newer pipeline for integrated subtomogram and subtilt refinement can be found in TomoNew
Documentation of some other recently developed tools can be found in TomoMore.
- To cite:
- Chen, M., Bell, J.M., Shi, X. et al. A complete data processing workflow for cryo-ET and subtomogram averaging. Nat Methods 16, 1161–1168 (2019)
Computer Requirements
- Minimum recommended configuration:
- 16 GB RAM (small), 64 GB RAM (large)
4 cores @ >2 ghz (small), 16 cores (large)
- ~40 gb free disk space (small), ~100 gb free disk space, due to larger raw data (large)
- a high performance disk (SSD or RAID) will significantly reduce runtimes
Note: Any place in EMAN2 where you are requested to enter the number of threads to use, you should specify the number of cores your machine has. Computers are often advertised as 4 core/8 thread or 8 core/16 thread. Trying to run image processing using this advertised number of threads will usually make processing run slower, not faster. You may optionally increase the number of cores by ~25%, ie - on a 4 core machine, specifying 5 may give a 5-10% speedup over 4
Download Data
(Large) Download only the 4 mixed CTEM tilt series from EMPIAR: EMPIAR 10064
(Small) Download the downsampled data from the EMAN2 Tutorials page
Prepare input files (~2 minutes)
- Make a new empty folder for the project and 'cd' into that folder
run e2projectmanager.py:
e2projectmanager.py&
- Make sure any EMAN2 commands you run are executed from within this folder (not any subfolder)
- You may use "Edit Project" from the Project menu to set default values for the project.
If you downloaded our prepared data set, it will already contain an info folder containing the project settings, so you should not need to change anything.
- For this project use 300 kDa, 2.7 mm Cs, 300 keV and 3.93 (Small) or 1.97 (Large) A/pix.
- The mass need not be precise, it is only used to keep isosurface values roughly self-consistent.
- Make sure the workflow mode is set to "TOMO" not "SPR"
Raw Data -> Import tilt series
Files -> select the 3 (Small) or 4 (Large) provided .hdf files
rawtlt, mdoc -> leave these blank
invert should not be selected
apix = 3.93 (Small) or 1.97 (Large), in later steps you can use -1, which tells it to use the known value
import_tiltseries = selected
importation = copy
compressbits = 5 (8 is fine as well, but will make file sizes slightly larger)
Once the options are set, press Launch
When working with your own data:
- It is critical that the filenames for your data not contain any spaces (replace with underscore) or periods (other than the final period used for the file extension).
"__" (double underscore) is also reserved for describing modified versions of the same file, and should not be used in your image filenames.
If your tilt series isn't a single stack file, but is many individual images instead, you will need to use Generate tiltseries to build an image stack
Tiltseries Alignment and Tomogram Reconstruction (~10 min Small)
Alignment of the tilt-series is performed iteratively in conjunction with tomogram reconstruction. Tomograms are not normally reconstructed at full resolution, generally limited to 1k x 1k or 2k x 2k, but the tilt-series are aligned at full resolution. For high resolution subtomogram averaging, the raw tilt-series data is used, based on coordinates from particle picking in the downsampled tomograms. On a typical workstation reconstruction takes about 4-5 minutes per tomogram.
For the tutorial tilt-series:
3D Reconstruction -> Reconstruct Tomograms
alltiltseries = selected
alternatively you can select one or more tilt series from the tiltseries folder
correctrot = selected
tltax = -4.0
tltstep = 2
clipz = 96
threads = number of physical cores on your machine, optionally *1.25.
notmp = checked by default. If you wish to look at the intermediate aligned tilt-series and other files, uncheck this, but note that this will significantly increase disk requirements
- These files are not required for the remaining steps in the tutorial, but can be used to help understand how the tomogram alignment works. You may consider doing this for only one tomogram.
- patchtrack = 2 (not really necessary for the tutorial, but shouldn't hurt. This is 3-D patchtracking, not the typical 2-D patchtracking)
- extrapad = selected (this will make the tomogram reconstructions take a little longer, but look a bit nicer visually. No impact on the final subtomogram averages)
- Launch
If you opted to run without notmp on one or more tilt series:
You will see a tomorecon_XX folder for each tilt series, containing:
landmark_0X.txt has the location of the landmarks (which may be fiducials if present) in each iteration
samples_0X.hdf shows the top and side view of those landmarks
ptclali_0X.hdf has the trace of each landmark throughout the tilt series (they should stay at the center of image all the time if the alignment is good)
tomo_0X.hdf is the reconstruction after each iteration
Again, note that these files are not required by any downstream processing. If you have difficulty getting a particular tilt series to reconstruct, and the patchtrack=2 option doesn't help, this may help debug the problem. However, if you do encounter problems, we would like to invite you to ask about it either publicly on the mailing list or privately to the developers.
When working with your own data:
We strongly recommend reconstructing at most 5-10 tilt series to start. Then completing the handedness check in the next step before returning to this step and reconstructing the full set of tilt series! This will allow you to specify the correct tltax value.
Either specify the correct tltstep if the tilt series is in order from one extreme to the other, or specify the name of a rawtlt file (as produced by serialem/IMOD).
In most cases, the default npk should be fine. If fiducials are present, it is not necessary to adjust this number to match the number of fiducials. The program will use any high contrast areas it finds as potential landmarks.
bytile should normally be selected, as it will normally produce better quality reconstructions at higher speed. If 2k or larger tomograms are created, memory consumption may be high, and you should check the program output for the anticipated RAM usage.
- The graphical interface only permits 1k or 2k reconstruction sizes. In our experience this is normally sufficient for segmentation or particle picking.
When the sample is thick, some grid-like tiling pattern can be seen in the reconstruction. Checking extrapad can largely reduce the artifacts. In versions after 2/3/2020, there is also a moretile option that further eliminates them. Note these artifacts will NOT impact the subtomogram averaging results because the particles are extracted in a separate process. Checking these options can make the reconstruction process more memory consuming, and up to 5 times slower.
When the sample is thin (purified protein, not cells), it is useful to check correctrot to automatically position tomograms flat in ice
It can also be helpful with thin ice to specify a clipz value to generate thinner tomograms (perhaps 64 or 96 for a 1k tomogram).
There is a new patchtrack option which may help with refinements that otherwise don't seem to be aligning well. Note that this is not conventional 2-D patchtracking, but a novel 3-D patchtracking routine. This acts as a preprocessing step if specified (2 is recommended).
Handedness Check
EMAN2 will automatically locate the tilt axis in a tilt series if it is not provided, but there is a 180° ambiguity in this determination. An incorrect choice will lead to structures with the incorrect handedness, and may produce suboptimal CTF correction. In some data sets (not the tutorial) this may lead to particles with mixed handedness. Since we specified tltax above, this step isn't necessary for the tutorial, but you can run it to see what the results look like.
EMAN2 includes a novel procedure for resolving this ambiguity from a tilt series based on defocus estimates across the tilted images. The tutorial data set comes out correctly without running this check, but when working with your own data, this step is highly recommended. Once you know the correct tilt axis direction to use for a given microscope/camera, you shouldn't need to run this test on every data set, but it may not be a bad idea even then, as there are various possible configuration/software errors on the instrument which could potentially cause inconsistent results, particularly with a change of magnification.
For the tutorial tilt-series:
Subtomogram Averaging -> CTF Estimation
tiltseries = select any one tilt series
alltiltseries = not selected
voltage and cs (double check that values are correct)
dfrange = 1.0,4.0,0.02
checkhand = selected
- Launch
You will need to look at the console where you launched e2projectmanager to see the results of the test. It should look something like:
Average score: Current hand - 4.133, flipped hand - 3.290 Defocus std: Current hand - 0.110, flipped hand - 0.165 Current hand is better than the flipped hand in 86.4% tilt images The handedness (--tltax=-4.1) seems to be correct. Rerun CTF estimation without the checkhand option to finish the process.
If you run this check on multiple images and it seems that they consistently indicate a flipped tilt axis/handedness, then you need to return to the previous step (Tomogram Reconstruction) and redo the reconstruction for all tomograms, with the correct tilt axis entered in the corresponding box. The same tilt axis should be used for all tilt series collected under the same conditions on the same instrument. The automatic value may vary a little among micrographs, just compute the approximate average or median value.
Note: This method removes almost all of the ambiguity about particle handedness. The one potential issue is that the MRC file format uses a non-conventional origin for images. If the data collection software doesn't take this into account, the images may be flipped when written to disk. The easiest way to check the software would be to collect 2 images of the same target and save them directly into different file formats, then checking (in EMAN2) whether the two images appear to have the same handedness. If not, it is likely that the MRC files are incorrect.
CTF Estimation (<5 min)
Do NOT forget this step!
This step will determine the defocus as a function of location for each tilt in each tilt series. This information is stored in the headers of particles as they are extracted from the tilt series, and used for CTF correction during subtomogram averaging. If you forget to do this, you will need to re-run the particle extraction step, which is quite time consuming.
For the tutorial tilt-series:
Subtomogram Averaging -> CTF Estimation
alltiltseries = selected, note that doing this will override anything present in the tiltseries field
dfrange = 1.0,4.0,0.02
checkhand = not selected
- Launch
When working with your own data:
The first two options, dfrange and psrange indicate the defocus and phase shift range to search. It is critical that the actual defocus be within the search range (obviously). They take the format of “start, end, step”, so “2, 5, .1” will search defocus from 2 to 5 um with a step size of 0.1. Units for phase shift is degrees.
For images taken with volta phase plate, we usually have dfrange of “0.2,2,0.1” and psrange of “60,120,2”.
Note: this program is only estimating CTF parameters, taking tilt into account. It is not performing any phase-flipping corrections on whole tomograms. CTF correction is performed later as a per-particle process. This process requires metadata determined during tilt-series alignment, so it cannot be used with tomograms reconstructed using other software packages.
Note: In >=2022 snapshots of EMAN2 it is possible after CTF correction to return to the 3-D reconstruction step and produce CTF corrected whole tomograms, but this does nothing useful when following the EMAN2 pipeline. If you wish to compare EMAN2 tomograms with other software doing CTF correction, this could potentially be useful
Tomogram reconstruction evaluation
Analysis and visualization -> Evaluate tomograms can be used to evaluate the quality of your tilt series alignments and tomogram reconstructions. This tool will show more information as you progress through the tutorial, but can be used already at this point to make various assessments of your tomograms. Note that some of this information may not be available if you had notmp checked during the reconstruction.
The correctrot option often does not work well on tutorial tomo3. If you go through the tomograms you should see ribosomes spanning the plane of the image for all of the tomograms. If tomo3 shows only a narrow band containing ribosomes, return to the tomogram reconstruction step above, and re-run the process for only that tomogram (uncheck alltiltseries, and select tiltseries/*tomo3.hdf. Also uncheck correctrot).
- On the left is a list of tomograms in the project.
- Selecting a tomogram will show a thumbnail in the image pane on the right.
- Clicking the header of any column will sort the table by that attribute.
#box is the number of boxes in the tomogram
loss is the average landmark uncertainty in nm. You should not try to compare this number to, for example, the fiducial alignment error in IMOD, as it is computed in a completely different way. This number can be useful to identify specific tilt series within a project which aren't aligning as well as others, but the absolute number is not a useful value to report/analyze. Even if this number were >5 nm, it is still quite possible to achieve a subnanometer resolution subtomogram average.
defocus is the average defocus of the tilt series.
- On the right
- The image at the top is the central slice through the tomogram
the show2d button displays the selected tomogram in a slice-wise view.
ShowTilts shows the corresponding raw tilt series
- Please note that most tomograms include some out-of-plane tilt (the actual rotation isn't a simple tilt along a single axis), which is taken into account during alignment. This may make it visually appear that the tilt series alignment is not as robust as it actually is.
Boxer opens the 3D particle picker
PlotLoss will plot the fiducial error for each tilt
PlotCtf plot the defocus and phase shift at the center of each tilt image
Tiltparams is a bit more complicated. It plots a point list with 6 columns and a number of rows corresponding to the images in the selected tilt series. These are the alignment parameters for the tilt series.
You can adjust X Col and Y Col in the plot control panel (middle click the plot). The columns represent:
- 0 - tilt ID
- 1 - translation along x
- 2 - translation along y
- 3 - tilt angle around y
- 4 - tilt angle around x
- 5 - tilt angle around z
- The first panel below the buttons are the types of particles and how many of that type are in the project
- The last box is reserved for comments for each tomogram. You can fill in any comments you have on a specific tomogram and it will be saved for future reference.
Particle Picking Choices
There are 4 different tools you can use for particle picking in EMAN2 as of Feb 2022:
A new deep-learning based 3-D picker. Not available in 2.91, must use a recent (2022+) snapshot. See: New automatic particle picking
- Abusing the deep-learning based segmentation tool for particle picking purposes
- Manual particle picking
- Template based picking (usually seeded with some manual picking results)
For live versions of this tutorial, we use the older manual+template based approach as it requires no specific hardware, and is a good learning experience, but the deep learning 3-D picker is a good choice for typical situations. For cellular tomograms, the annotation tool approach may still be a good choice.
(1) Tomogram annotation (GPU recommended)
- Since the tutorial data set is purified ribosomes, this step can be skipped for the tutorial data, and you can move on to template-based particle picking. For cells or other types of complex specimens, tomogram annotation can be used to more easily distinguish locations of different types of objects.
For a detailed description of how to use the annotation tool, see: TomoSeg
Here is a brief summary of the annotation-based approach:
Segmentation -> Preprocess tomogram
- This step is not always necessary for tomograms reconstructed in EMAN2, but may slightly improve results.
Segmentation -> Box Training References
- This is a newer interface than previously used for this step. Select a few "Good" (regions containing the feature of interest) and "Bad" (regions not containing the feature of interest) boxes.
- "~" and "1" on the keyboard can be used to move along the Z axis.
- The new interface permits different types of features to be identified in a single session and in the same tomogram.
If the different features of interest have very different scale, it is always better to keep the box size at 64, and instead rescale the tomogram. As long as the rescaling is done using EMAN2 utilities, the program will correctly keep track of the geometry relative to the original tomogram & tilt series.
- if you are doing this with the tutorial data, you would only have 2 classes of particles "ribo_good" and "ribo_bad".
When pressing Save all visible particles (box checked next to the class name) will be saved
The rest of the annotation process remain unchanged from the original tutorial, except that now, all trained neural networks and training results are saved in the neuralnets folder, and all segmented maps are in the segmentations folder. You now only specify the label of the output file instead of the full file name.
Segmentation -> Find particles from segmentation to turn segmented maps into particle coordinates.
- Input both the tomogram and its corresponding segmentation, and the particles coordinates will be written into the metadata file.
- Slightly tweaking the threshold parameters may yield better results.
featurename will become the label of particles generated. Those particles can be viewed in the particle picking step and processed in the following protocols.
Manual particle picking (10-15 min)
- Time above is to manually select 30-50 reference particles.
- You can launch the particle picker in two equivalent ways:
Subtomogram averaging -> Manual boxing, select tomogram, Launch
Analysis and visualization -> Evaluate tomograms as above, press the "Boxer" button
two windows will appear Main Window and Options. If you don't see Options it is probably hiding behind the main window.
in the Options window, rename the set of boxes to "initribo". This will be used as the label in later stages
- The box size can be set in the main window at the left bottom corner, for this tutorial, use 32 for ribosomes (the unbinned box size is 128).
This is NOT the same as the size listed near the word erase in another window, which is the size of the eraser.
- left click and drag to place and reposition boxes in any of the 3 views
- Hold down Shift when clicking to delete existing boxes.
- Boxes are shown as circles, which vary in size depending on the Z distance from the center of the particle.
- If you accidentally include one or more particles with nearby gold fiducials or other high contrast artifacts, it may cause some issues with your generated model (do not do that).
Go through slices along z-axis using ‘~’ and ‘1’ on the keyboard, or using the slider in the lower right of the window
It may be easier to locate particles if you adjust the Filt slider to ~70, but it will slow things down
- The interface supports different box types within a single tomogram. Each type has a label. Make sure the label is consistent if selecting the same feature in different tomograms.
- If you skipped the tomogram annotation step, we will pick a few particles here to generate an initial model, and use the initial model as a reference for template matching.
- Select 30-50 particles from one tomogram, then close the boxer window.
Do not use the Save button in the Options window or any of the menu items related to saving data. Those are available for special purposes. When you change the boxes, your changes are saved immediately and automatically. When you are done, simply close the main window.
- If you did the previous optional annotation step above, you will be able to see the selected particles here, and if you like, manually update them.
Particle extraction (2 min)
In this pipeline, the full 1k or 2k tomograms are used only as a reference to identify the location of the objects to be averaged. Now that we have particle locations, the software returns to the original tilt-series, extracts a per-particle tilt-series, and reconstructs each particle in 3-D independently.
For the tutorial tilt-series:
Subtomogram Averaging -> Extract Particles
check alltomograms
set boxsz_unbin to 128.
- If you had the correct size in the previous step this should be the same as leaving the default -1
- It is fine to use a different (usually larger) box size here if you find it easier to select particles with a smaller box size. For the tutorial, stick with 128, though.
- enter the label you used when picking particles ("initribo" if you followed the instructions above)
threads = value for your machine
- Launch
Subtomogram Averaging -> Build Sets
check allparticles
- Launch
- This will generate particle sets, which are virtual particle stacks that consist of particles with the same label from different tomograms.
For your own data
If the box size is correct when you select particles from the GUI, you can leave boxsz_unbin as -1, so the program will keep that box size (scaled to the original tilt series)
If your particles are deeply buried in other densities, using a bigger padtwod may help, but doing so may significantly increase the memory usage and slow down the process.
With CTF information present, it generally does not hurt to check wiener, which filters the 2D particles by SSNR before reconstructing them in 3D.
Specify a binning factor in shrink to produce downsampled particles if your memory/storage/CPU time is limited, but it will also limit the resolution you can achieve.
Initial model generation (10 - 60 min)
While intuitively it seems like (since the particles are already in 3-D) the concept of an "initial model" should not be necessary. Unfortunately, due to the missing wedge, and the low resolution of one individual particle (particularly from cells), it is actually critical to make a good starting average. Historically it has been challenging to get a good starting model, depending on the shape of the molecule. This new procedure based on stochastic gradient descent has proven to be quite robust, but it is difficult for the computer to tell when it has converged sufficiently. For this reason, the default behavior is to run much longer than is normally required, and have a human decide when it's "good enough" and terminate the process. If you use a small shrink value and let it run to completion, it can take some time to run. This is harmless, but unnecessary. While the section below the solid line remains fully functional, a new program available since 2021 does a much more efficient job of making initial models. It hasn't been integrated into e2projectmanager yet, but it is enough of an improvement, we will go ahead and use it here regardless. The original instructions are preserved below the horizontal line if you prefer the older approach.
New Initial Model Generator
The new initial model generator was only added to e2projectmanager in Feb 2022, so if you have an older version, you may need to run it from the command-line. If it isn't available from the command-line either, you will need to use the older tool below.
Subtomogram Averaging -> New initial model generator
particles = Browse for sets/initribo.lst (or whatever you called your set)
res = 50 This is a low resolution initial model. No reason to push it.
niter = 10 You may make this up to 100, and simply kill the job when the model seems good enough. 10 is fine for this tutorial, and often you can even stop after ~5
shrink = 2 For speed during live tutorials
parallel = thread:4 Replace this with the correct number of threads for your machine as above. If you have limited disk space on /tmp, you can also specify a temporary folder: thread:4:/home/username/tmp
Launch
If you do not see New initial model generator in e2projectmanager, you can run it from the command line, replacing appropriate options:
e2spt_sgd_new.py sets/initribo.lst --res 50 --niter 10 --shrink 2 --parallel thread:4
The second program will produce output like:
Gathering metadata... 69/69 iter 0, class 0: 17 jobs on 4 CPUs iter 1, class 0: 17 jobs on 4 CPUs iter 2, class 0: 17 jobs on 4 CPUs
Once it gets past 3-4 iterations, you can use the browser to look in sptsgd_00, and double-click on output_cls0.hdf. This file will change after each iteration completes. It contains the results of the most recent iteration. When you are satisfied with the quality of the initial model, you can kill it with the task manager in e2projectmanager.
For your own data:
If your particle has known symmetry, specify that EMAN2/Symmetry
setting shrink to something in the range of 2-4 will make the runtime faster but, depending on the shape, could potentially cause problems.
- using more than the minimal 30-50 particles is fine. If you have a very large set of selected particles, go ahead and use them all. This will not slow the process down at all, since it's stochastic.
it is critical that the full sampling box size of the extracted particles divided by shrink be divisible by 2. If not, the program will crash.
Old Initial Model Generator
This section is the older program, which is still functional, and is integrated into the project manager
For the tutorial tilt-series:
Subtomogram Averaging -> Generate Initial Model
particles should be set to the sets/initribo.lst file you just created (or whatever name you used).
set shrink to 2, 3 or 4
- 2 will run slowly but will produce a more detailed initial model (not really necessary)
increasing batchsize will use more cores (if you have more than 12), and may cause it to converge to the correct answer in fewer iterations, but each iteration will not become faster.
The default niter of 5 is typically much more than is required
- Launch
You can terminate the job as soon as sptsgd_00/output.hdf looks reasonable. If you display the progress monitor (4th icon on the right side of the project manager), you can easily kill the job when you're happy. Usually this will take about 10 minutes for the tutorial data.
For your own data:
If your particle has known symmetry, specify that EMAN2/Symmetry
The symmetry you specify will not be imposed on the map unless you also check applysym, but the map will be rotationally aligned so the symmetry axes are in the correct direction, which will make it easier to apply symmetry in later steps. We do not generally recommend checking this box in this step.
setting shrink to something in the range of 2-4 will make the runtime faster but, depending on the shape, could potentially cause problems.
- using more than the minimal 30-50 particles is fine. If you have a very large set of selected particles, go ahead and use them all. This will not slow the process down at all, since it's stochastic.
it is critical that the full sampling box size of the extracted particles divided by shrink be divisible by 2. If not, the program will crash.
Template matching (5 min)
In this step, we will use the initial model you just produced as a template for finding all of the ribosomes in all 3 tomograms. If you completed the Tomogram Annotation step above, and have already extracted a full set of 1000+ particles, then you can skip this step, as we already have all of the particles. Note that here, and everywhere else in the tomography pipeline, reconstructed particles have positive contrast (look white in projection) and tomograms/tilt series have negative contrast (look dark in projection). If you wish to use a reference volume from the PDB or somesuch, then it should have positive contrast as is normal in the single particle CryoEM field.
Subtomogram Averaging -> Reference Based Boxing
tomograms -> browse and select all 3 tomograms.
reference = the initial model you produced in the previous step
label = ribo
nptcl = 150
IMPORTANT NOTE: There are more particles than this in these images. We limit this to 150 from each tomogram so the tutorial runs faster. If you are unconcerned with speed, you can increase this number, but if you're doing that, you may consider running the full tutorial instead.
vthr = 5
- if using an older version of EMAN2, you may wish to use the default of 10, but with the current version, 10 may not find any particles and crash.
- This threshold is in terms of the number of standard deviations above the mean. The definition was slightly different when the default was 10.
threads = the usual number
- Launch
- Note: if you see an error, with some mention of zero, it is likely you forgot to reduce vthr to 5.
when this finishes, you can use the same Manual Boxing tool you used before to look at the particles which were selected. You may wish to manually remove any bad particles it selected. For the tutorial data set or other tomograms of purified protein, this process should work pretty well.
- note that this process stores 3-D particle locations in the appropriate info/* files, but does not extract particles from the micrographs
For your own data:
- Frankly, in most cases the deep learning based picker will do a better job. For cells, you may consider the full Tomogram Annotation tool. This program works with highly downsampled tomograms and references for speed.
Picking a good vthr and a reasonable nptcl maximum may take a little trial and error. Test on 1-3 tomograms before the full run.
Particle extraction (~15 min)
If you already extracted a complete set of particles (not just the few initial references) above, you don't need to repeat it again here.
Since this involves several hundred particles instead of 30-50, it will take quite a lot longer to run.
For the tutorial tilt-series:
Subtomogram Averaging -> Extract Particles
check alltomograms
boxsz_unbin = 128
label = ribo
compressbits = 5
- Launch
Build Sets (again)
Subtomogram Averaging -> Build Sets
check allparticles
You can leave label blank. It will just regenerate all of the .lst files.
- Launch
- This will generate particle sets, which are virtual particle stacks that consist of particles with the same label from different tomograms.
- Generally just takes a few seconds. If you have a large project with hundreds of tomograms it will take a bit longer.
New integrated refinement program
There is a new refinement program which implements both traditional subtomogram averaging and subtilt refinement in a single program. This is an alternative to the next two major sections (Subtomogram Refinement and Subtilt Refinement). The full tutorial on the new program is here.
Integration of this program into e2projectmanager isn't quite complete yet (but should be by mid-March 2022). For the moment, we will run this command directly from the command-line.
We will do this as 2 sequential refinement runs for efficiency, and to show how you can continue a refinement. The initial refinement should turn the initial model into something ribosome shaped, but at low resolution. The second refinement should get to ~12 Å resolution.
You may need to replace sets/ribo.lst with whatever you named your set.
Replace both 4s with the number of threads for your machine.
- If you didn't use the "new" style initial model generation above, you may also need to alter --ref.
e2spt_refine_new.py --ptcls sets/ribo.lst --ref sptsgd_00/output_cls0.hdf --iters p,p,t --goldstandard --startres 50 --tophat local --parallel thread:4 --threads 4
When that's done, you should have a spt_00 folder containing the results of the 3 iterations we requested. You may take a look at:
- fsc_maskedtight_XX.txt - These are the masked gold-standard resolution curves after each iteration. Double click in the EMAN2 browser to plot them.
- threed_XX.hdf - These are the 3D maps after each iteration. Again, double-click in the browser to visualize.
The 3rd map should look considerably more like a ribosome than the initial model (threed_00.hdf), but the resolution will still be limited. In the next run, we will do more subtilt refinement to push the resolution.
We could have done this all in a single run, but simply including the additional --iters letters except for one key addition. The --maxres 12 argument indicates the highest resolution information to consider in the refinement. If this is not specified, it will be determined automatically based on the results of the previous iteration. This will make the iterations run faster, but it may make them progress towards the final resolution more slowly. So, we use the automatic method for 3 iterations to get the shape correct quickly, then in the second run, push the resolution.
note that --ref points at the final map from the previous run
--loadali2d and --loadali3d load the final alignment parameters from the first run. Note that aliptcls3d files are only produced for 'p' iterations.
--goldcontinue tells the refinement to continue with the even/odd pairs from the previous refinement, rather than phase randomizing the data at high resolution again
--maxres specifies the maximum resolution information to consider during alignment. If omitted, it uses an automatic scheme.
--tophat local enables local resolution measurement and filtration, it adds ~2 minutes per iteration. global will filter everything uniformly. e2help.py tophat
e2spt_refine_new.py --ptcls sets/ribo.lst --ref spt_00/threed_03.hdf --loadali2d spt_00/aliptcls2d_03.lst --loadali3d spt_00/aliptcls3d_02.lst --goldcontinue --iters t,p,t,r,d --keep 0.95 --tophat local --parallel thread:4 --threads 4 --maxres 12
That's it. You hopefully have a ~12 Å resolution map. If you wish to try and push the resolution further, you can download the original 4k tilt series from EMPIAR and pick all of the particles (instead of just ~450), and go through the process again.
Skip the next 2 sections about the old refinement. Other notes are below that.
Old Subtomogram refinement (~1 hr/iteration)
As an alternative to the new integrated tool above, the older pair of programs is still available. You shouldn't need to do both approaches. This step is similar to the "p" iterations above, though it uses an older algorithm.
This step performs a conventional iterative subtomogram averaging using the full set of particles. Typically it will achieve resolutions in the 15-25 A range with a reasonable number of particles. As it involves 3-D alignment of the full set of particles multiple times, it takes a significant amount of compute time. Higher resolutions are achieved in the next stage after this (subtilt refinement).
For the tutorial tilt-series:
Subtomogram Averaging -> 3D Refinement
set particles to "sets/ribo.lst"
set reference to "output.hdf" from Initial Model Generation
set goldstandard to 30
set mass to 3000
set threads to the number of CPUs on your machine
- Launch
Results will gradually appear in spt_XX/ Feel free to look at intermediate results with the EMAN2 file browser as they appear.
For your own data:
- If your molecule has symmetry, you should specify it, but it's important that the alignment reference you provide has been properly aligned to the symmetry axes of whichever symmetry you specify.
localfilter will use e2fsc.py to compute a local resolution map after each iteration and filter the map accordingly. This is useful for molecules with significant variability.
If you suspect that a large fraction of your particles are "bad" in some way, you may wish to try reducing pkeep, which will hopefully exclude bad particles preferentially over "good" particles.
Old Subtilt refinement (~9 hr/iteration)
This is the second half of the old refinement strategy. It is conceptually similar to the t,p and r iterations in the newer integrated program above.
With the results of a good subtomogram alignment/average, we are now ready to switch to alignment of the individual particle images in each tilt, along with per-particle-per-tilt CTF correction and other refinements. This is effectively a hybrid of single particle analysis and subtomogram averaging, and can readily achieve subnanometer resolution IF the data is of sufficient quality. The tutorial data set is, but many cellular tomograms, for example, are not collected with high resolution in mind, and even with this sort of refinement will be unable to achieve resolutions better than 10-30 A, depending on the data. This process is completely automatic, based on all of the metadata collected up to this point. While it is possible to perform "subtomogram refinement" with subtomograms from any tomogram, Subtilt Refinement cannot operate properly unless all preceding steps occurred within EMAN2.
For the tutorial tilt series:
Subtomogram Averaging -> Sub-tilt Refinement
path should be set to the name of one of a "spt_XX" folder to use as a starting point for the refinement
iter can be -1 to use the last complete iteration in the "spt_XX" folder. Alternatively you can specify a specific iteration to use
parallel should be "thread:N" where N is the number of cores you wish to use on a single machine. This job can be run on a linux cluster if you like: EMAN2/Parallel.
threads should also be set to the number of cores to use on a single machine
- Launch
For your own data:
niters is the number of iterations to run. The default of 4 should achieve convergence in most cases.
keep is the fraction of tilt images to use in the final map. This defaults to 0.5, meaning the worst 1/2 of the tilts for each particle will be discarded. This permits tilts which contain, for example, projections of fiducials or other strong densities, or with large amounts of motion to be automatically excluded in the final reconstruction.
maxalt specifies the maximum tilt angle to include from each particle. Most tilt series are collected such that the small tilt angles will have the least radiation damage, and very often high tilts suffer from more motion artifacts. If you enter, for example, "45" in this box then tilts <-45 and >45 will be discarded automatically. In most cases keep will already serve a similar purpose.
Congratulations! The final result of the tutorial will be found in "subtlt_00/". The final 3-D map will be "threed_04.hdf" with the default parameters. The final gold standard resolution curve will be "fsc_maskedtight_04.txt". The optional steps below are tools you can use to evaluate your results in more detail.
Refinement evaluation (optional)
This tool helps visualize and compare results from multiple subtomogram refinement runs.
Analysis and Visualization -> Evaluate SPT Refinements
In the GUI, you can look at all spt_XX or sptsgd_XX folders and compare the parameters which were used for each, as well as the resulting maps.
- Switch between folder types using the menu at top right.
- Columns can be sorted by clicking on the corresponding header.
- Uncheck items in the list at bottom-right to hide corresponding columns
ShowBrowser will bring up the e2display.py browser in the folder of the selected row.
!PlotFSC will display the "tight" FSC curve over all iterations.
PlotParams will plot the Euler angle distribution and other alignment parameters
- The 8 columns in the plot are:
- 0 - az (EMAN convention Euler angle)
- 1 - alt
- 2 - phi
- 3 - translation in X
- 4 - Y
- 5 - Z
- 6 - alignment score
- 7 - missing wedge coverage
- The 8 columns in the plot are: