Diff for "EMAN2/Programs/e2refine2d"

Differences between revisions 1 and 9 (spanning 8 versions)

e2refine2d

e2refine2d.py runs in much the same way as EMAN1's refine2d.py, though it has been improved in a number of subtle ways

This program will take a set of boxed out particle images and perform iterative reference-free classification to produce a set of representative class-averages. The point of this process is to reduce noise levels, so the overall shape of the particle views present in the data can be better observed. Generally cryo-EM single particles are noisy enough that it is difficult to distinguish subtle, or even not-so-subtle differences between particle images. By aligning and averaging similar particles together, less noisy versions of representative views are created. The class-averages produced by this program are typically used for:

Direct observation to look for heterogeneity or discover symmetry
Building initial models for single particle reconstruction
Separating particles into subgroups for additional analysis

This last point can be used to produce 'population-dynamics' movies of a particle in very close to the same orientation.

This program is quite fast for as many as a few thousand particles and ~100 classes. For most purposes if your data set is large (>10,000) particles you might consider using only a subset of the data for speed, though this clearly isn't appropriate for the 3rd use above. For large numbers of classes, specify the --fastseed option, or you will wait a very long time.

Options:

	--path	string	Path for the refinement, default=auto
	--iter	int	The total number of refinement iterations to perform
	--automask	bool	This will perform a 2-D automask on class-averages to help with centering. May be useful for negative stain data particularly.
	--input	string	The name of the file containing the particle data
	--ncls	int	Number of classes to generate
	--maxshift	int	Maximum particle translation in x and y
	--naliref	int	Number of alignment references to when determining particle orientations
	--exclude	string	The named file should contain a set of integers, each representing an image from the input file to exclude.
	--resume	bool	This will cause a check of the files in the current directory, and the refinement will resume after the last completed iteration. It's ok to alter other parameters.
	--initial	string	File containing starting class-averages. If not specified, will generate starting averages automatically
	--nbasisfp	int	Number of MSA basis vectors to use when classifying particles
	--minchange	int	Minimum number of particles that change group before deicding to terminate. Default = -1 (auto)
	--fastseed	bool	Will seed the k-means loop quickly, but may produce less consistent results.
	--simalign	string	The name of an 'aligner' to use prior to comparing the images (default=rotate_translate_flip)
	--simaligncmp	string	Name of the aligner along with its construction arguments (default=frc)
	--simralign	string	The name and parameters of the second stage aligner which refines the results of the first alignment
	--simraligncmp	string	The name and parameters of the comparitor used by the second stage aligner. (default=dot).
	--simcmp	string	The name of a 'cmp' to be used in comparing the aligned images (default=frc:nweight=1)
	--shrink	int	Optionally shrink the input particles by an integer amount prior to computing similarity scores. For speed purposes.
	--classkeep	float	The fraction of particles to keep in each class, based on the similarity score generated by the --cmp argument (default=0.85).
	--classkeepsig	bool	Change the keep ('--keep') criterion from fraction-based to sigma-based.
	--classiter	int	Number of iterations to use when making class-averages (default=5)
	--classalign	string	If doing more than one iteration, this is the name and parameters of the 'aligner' used to align particles to the previous class average.
	--classaligncmp	string	This is the name and parameters of the comparitor used by the fist stage aligner Default is dot.
	--classralign	string	The second stage aligner which refines the results of the first alignment in class averaging. Default is None.
	--classraligncmp	string	The comparitor used by the second stage aligner in class averageing. Default is dot:normalize=1.
	--classaverager	string	The averager used to generate the class averages. Default is 'mean'.
	--classcmp	string	The name and parameters of the comparitor used to generate similarity scores, when class averaging. Default is frc'
	--classnormproc	string	Normalization applied during class averaging
	--classrefsf	bool	Use the setsfref option in class averaging to produce better filtered averages.
	--normproj	bool	Normalizes each projected vector into the MSA subspace. Note that this is different from normalizing the input images since the subspace is not expected to fully span the image
-P	--parallel	string	Run in parallel, specify type:<option>=<value>:<option>:<value>
	--dbls	string	data base list storage, used by the workflow. You can ignore this argument.
-v	--verbose	int	verbose level [0-9], higner number means higher level of verboseness

This program uses an iterative MSA-based reference-free classification algorithm. The names in parentheses below are the filenames produced by each step. The files will be found in bdb:r2d_XX (XX is incremented each time e2refine2d.py is run). A brief outline of the process follows :

Initialize the iterative process by making some initial guesses at class-averages. These are invariant-based, meaning even with MSA, this initial classification is not exceptionally good.
1. Generate rotational/translational invariants for each particle (input_fp)
2. Perform MSA on the invariants to define an orthogonal subspace representing the most important differences among the classes (input_fp_basis)
3. Reproject the particles into the MSA subspace using --nbasis vectors (input_fp_basis_proj)
4. Classify the particles into --ncls classes using K-means (classmx_00)
5. Iterative class-averaging of the particles in each class to produce a set of initial averages (classes_init)
Align the current class-averages to each other, and sort them by similarity, keeping them centered (allrefs_YY) (Note that YY starts with 01 and is incremented after each iteration)
Perform MSA on the (aligned) class-averages. Again, this represents largest differences, but now performed on images, not invariants. (basis_YY)
Select a subset of --naliref averages to use as alignment references for this iteration (aliref_YY)
Align each particle to each of the reference averages from the last step. Keep the orientation corresponding to the best-matching reference. (simmx_YY)
Project aligned particles using reference MSA vectors from basis_YY (input_YY_proj)
K-means classification of input_YY_proj (classmx_YY)
New iterative class-averages (classes_YY)
Loop back to step 2 until --iter loops are complete

The primary files you would normally look at after a run is complete are:

classes_YY - the highest numbered file. This contains the final unaligned class-averages
allref_YY - Class-averages which have been sorted and aligned
basis_YY - This contains the MSA basis vectors, which may be useful if looking for signs of specific symmetries, etc.
aliref_YY - May be useful to look at which averages were used as 2-D alignment references

-  ⇤ ← Revision 1 as of 2009-09-17 18:42:12 → 
  Size: 9624
  Editor: SteveLudtke
  Comment:
+   ← Revision 9 as of 2011-08-22 13:32:16 → ⇥
  Size: 7730
  Editor: SteveLudtke
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-| [[#args|Command line arguments]] |  [[#checkfunc|Check functionality]] | [[EMAN2/e2refinefaq|e2refine FAQ]] |
+== e2refine2d ==
 Line 3:
-== e2refine ==
+e2refine2d.py runs in much the same way as EMAN1's refine2d.py, though it has been improved in a number of subtle ways
 Line 5:
+This program will take a set of boxed out particle images and perform iterative reference-free classification to produce a set of representative
class-averages. The point of this process is to reduce noise levels, so the overall shape of the particle views present in the data can
be better observed. Generally cryo-EM single particles are noisy enough that it is difficult to distinguish subtle, or even not-so-subtle differences
between particle images. By aligning and averaging similar particles together, less noisy versions of representative views are created. The class-averages
produced by this program are typically used for:
-Line 6:
+Line 11:
-||<35%><<TableOfContents>>||
+ * Direct observation to look for heterogeneity or discover symmetry
 * Building initial models for single particle reconstruction
 * Separating particles into subgroups for additional analysis
-Line 8:
+Line 15:
-|| {{attachment:e2refine.png}} ||
+This last point can be used to produce 'population-dynamics' movies of a particle in very close to the same orientation.
-Line 10:
+Line 17:
+This program is quite fast for as many as a few thousand particles and ~100 classes. For most purposes if your data set is large (>10,000) particles
you might consider using only a subset of the data for speed, though this clearly isn't appropriate for the 3rd use above. For large numbers of classes,
specify the --fastseed option, or you will wait a very long time.
-Line 11:
+Line 21:
-e2refine.py runs in much the same way as [[EMAN1/Programs/Refine|refine]] in [[EMAN1]]
+Options:
|| ||--path||string||Path for the refinement, default=auto||
|| ||--iter||int||The total number of refinement iterations to perform||
|| ||--automask||bool||This will perform a 2-D automask on class-averages to help with centering. May be useful for negative stain data particularly.||
|| ||--input||string||The name of the file containing the particle data||
|| ||--ncls||int||Number of classes to generate||
|| ||--maxshift||int||Maximum particle translation in x and y||
|| ||--naliref||int||Number of alignment references to when determining particle orientations||
|| ||--exclude||string||The named file should contain a set of integers, each representing an image from the input file to exclude.||
|| ||--resume||bool||This will cause a check of the files in the current directory, and the refinement will resume after the last completed iteration. It's ok to alter other parameters.||
|| ||--initial||string||File containing starting class-averages. If not specified, will generate starting averages automatically||
|| ||--nbasisfp||int||Number of MSA basis vectors to use when classifying particles||
|| ||--minchange||int||Minimum number of particles that change group before deicding to terminate. Default = -1 (auto)||
|| ||--fastseed||bool||Will seed the k-means loop quickly, but may produce less consistent results.||
|| ||--simalign||string||The name of an 'aligner' to use prior to comparing the images (default=rotate_translate_flip)||
|| ||--simaligncmp||string||Name of the aligner along with its construction arguments (default=frc)||
|| ||--simralign||string||The name and parameters of the second stage aligner which refines the results of the first alignment||
|| ||--simraligncmp||string||The name and parameters of the comparitor used by the second stage aligner. (default=dot).||
|| ||--simcmp||string||The name of a 'cmp' to be used in comparing the aligned images (default=frc:nweight=1)||
|| ||--shrink||int||Optionally shrink the input particles by an integer amount prior to computing similarity scores. For speed purposes.||
|| ||--classkeep||float||The fraction of particles to keep in each class, based on the similarity score generated by the --cmp argument (default=0.85).||
|| ||--classkeepsig||bool||Change the keep ('--keep') criterion from fraction-based to sigma-based.||
|| ||--classiter||int||Number of iterations to use when making class-averages (default=5)||
|| ||--classalign||string||If doing more than one iteration, this is the name and parameters of the 'aligner' used to align particles to the previous class average.||
|| ||--classaligncmp||string||This is the name and parameters of the comparitor used by the fist stage aligner  Default is dot.||
|| ||--classralign||string||The second stage aligner which refines the results of the first alignment in class averaging. Default is None.||
|| ||--classraligncmp||string||The comparitor used by the second stage aligner in class averageing. Default is dot:normalize=1.||
|| ||--classaverager||string||The averager used to generate the class averages. Default is 'mean'.||
|| ||--classcmp||string||The name and parameters of the comparitor used to generate similarity scores, when class averaging. Default is frc'||
|| ||--classnormproc||string||Normalization applied during class averaging||
|| ||--classrefsf||bool||Use the setsfref option in class averaging to produce better filtered averages.||
|| ||--normproj||bool||Normalizes each projected vector into the MSA subspace. Note that this is different from normalizing the input images since the subspace is not expected to fully span the image||
||-P||--parallel||string||Run in parallel, specify type:<option>=<value>:<option>:<value>||
|| ||--dbls||string||data base list storage, used by the workflow. You can ignore this argument.||
||-v||--verbose||int||verbose level [0-9], higner number means higher level of verboseness||
-Line 13:
+Line 57:
-This programs oversees iterative single particle reconstruction. The overall process is to take a pre-existing 3D image and a set of 2D images and to run a variety of (often intensive) image processing applications which produces a refined 3D model. In particular, the program iteratively executes a sequence of python scripts which perform specific tasks, starting with with 3D projection (e2project3d.py), comparision of particle data to projections (e2simmx.py),  classification (e2classify.py), the generation of class averages (e2classaverage.py), and finally the generation of a new 3D model (e2make3d.py). This pipeline is depicted graphically in '''Figure 1''' below, along with accompanying data inputs and outputs.
|| {{attachment:refinepipeline_small.png}} ||
+This program uses an iterative MSA-based reference-free classification algorithm. The names in parentheses below are the filenames produced by each step. The files will be found in bdb:r2d_XX (XX is incremented each time e2refine2d.py is run).
A brief outline of the process follows :
-Line 16:
+Line 60:
+. Initialize the iterative process by making some initial guesses at class-averages. These are invariant-based, meaning even with MSA, this initial classification is not exceptionally good.
  a. Generate rotational/translational invariants for each particle (input_fp)
  a. Perform MSA on the invariants to define an orthogonal subspace representing the most important differences among the classes (input_fp_basis)
  a. Reproject the particles into the MSA subspace using ''--nbasis'' vectors (input_fp_basis_proj)
  a. Classify the particles into ''--ncls'' classes using K-means (classmx_00)
  a. Iterative class-averaging of the particles in each class to produce a set of initial averages (classes_init)
 1. Align the current class-averages to each other, and sort them by similarity, keeping them centered (allrefs_YY) (Note that YY starts with 01 and is incremented after each iteration)
 1. Perform MSA on the (aligned) class-averages. Again, this represents largest differences, but now performed on images, not invariants. (basis_YY)
 1. Select a subset of ''--naliref'' averages to use as alignment references for this iteration (aliref_YY)
 1. Align each particle to each of the reference averages from the last step. Keep the orientation corresponding to the best-matching reference. (simmx_YY)
 1. Project aligned particles using reference MSA vectors from basis_YY (input_YY_proj)
 1. K-means classification of input_YY_proj (classmx_YY)
 1. New iterative class-averages (classes_YY)
 1. Loop back to step 2 until ''--iter'' loops are complete
-Line 17:
+Line 75:
-'''Fig. 1. Overview of data inputs and outputs in the EMAN2 refinement pipeline. Pink objects are data supplied by the user, blue objects are programs, and green objects are data output by EMAN2 programs.'''



<<Anchor(args)>>

=== Command Line Arguments ===
Most of the command line arguments have defaults, those which are absolutely. The user must atleast specify the total number of iterations, the symmetry and the proportional distribution or number of projections.

==== General parameters ====

|| version || bool || Show program's version number and exit ||
|| h, help || bool || Show help ||
|| c, check || bool || Checks the contents of the current directory to verify that e2refine.py will work ||
|| v, verbose || int || Toggle verbose mode - prints extra infromation to the command line while executing ||
|| input|| string || The input image stack of 2D particles||
|| iter|| int|| The number of refinement iterations ||
|| lowmem|| boolean || A low memory flag used to indicate memory should be used as sparsely as possible ||
|| model|| string || The seeding 3D model ||
|| path || string || The directory where output will be stored ||
|| sym|| string|| The [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Symmetry3D.html|symmetry]] being output 3D models and the limit the range of generated projections ||

==== Arguments used to execute e2project3d.py ====

See [[e2project3d|e2project3d.py.]]

|| orientgen|| string:args || The [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1OrientationGenerator.html|OrientationGenerator]] and parameters used for generation orientations in the asymmetric unit of the 3D model ||
|| projector || string:args || The [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Projector.html|projector]] used for generating projections ||


==== Arguments used to execute e2simmx.py ====

See [[e2simmx|e2simmx.py.]]

|| shrink || int || The shrink factor applied to particles prior to generation of the similarity matrix (e2simmx.py) ||
|| simalign || string:args || The main [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Aligner.html|aligner]] used during similarity matrix generation  ||
|| simaligncmp || string:args || The [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Cmp.html|comparator]] used by the main aligner during similarity matrix generation ||
|| simcmp|| string:args || The [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Cmp.html|comparator]] used to generate the final score which is stored in the similarity matrix ||
|| simralign|| string:args || The refinement [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Aligner.html|aligner]] used during similarity matrix generation ||
|| simraligncmp || string:args || The [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Cmp.html|comparator]] used by the refine align in similarity matrix generation ||

==== Arguments used to execute e2classify.py ====


See [[e2classify|e2classify.py]]

|| sep || int || The number of classes each particles can be associated with ||

==== Arguments used to execute e2classaverage.py ====

See [[e2classaverage|e2classaverage.py]].

|| classalign || string:args || The [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Aligner.html|aligner]] used for alignment during iterative class averaging ||
|| classaligncmp|| string:args || [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Cmp.html|Comparator]] used by the main aligner during iterative class averaging ||
|| classaverager|| string::args || [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Averager.html|Averager]] used for averaging the images in each class ||
|| classcmp || string:args || The main [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Cmp.html|comparator]] used to quality and exclude bad particles in iterative class averaging ||
|| classiter || int || The number of class averaging iterations ||
|| classkeep|| float || The keep threshold used for excluding bad particles in iterative class averaging ||
|| classnormproc|| string:args || The normalization [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Processor.html|processor]] used in class averaging ||
|| classralign|| string:args || The refinement [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Aligner.html|aligner]] used in iterative class averagin ||
|| classraligncmp || string:args || The [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Cmp.html|comparator]] used by the refinement aligner in iterative class averaging ||

==== Arguments used to execute e2make3d.py  ====

See [[e2make3d|e2make3d.py.]]

|| m3diter|| int || The number of iterations used my make3d when performing the Fourier inversion method of 3D reconstruction ||
|| m3dkeep|| float || The keep threshold used by e2make3d for the purpose of slice exclusion during 3D reconstruction ||
|| m3dpreprocess|| string:args || The normalization [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Processor.html|processor]] applied prior to slice insertion during 3D reconstruction ||
|| pad || int || The amount of padding used by the Fourier inversion 3D reconstruction technique ||
|| recon|| string:args || The [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1Reconstructor.html|reconstructor]] used for performing 3D reconstruction ||

==== Arguments used to post process the 3D reconstruction  ====

The ByMass links will resolve on January 22

|| mass || float || The estimated mass of the particle in kilodalton that, along with the apix argument, is used to run the  [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1NormalizeByMassProcessor.html| normalize.bymass processor]] immediately after 3D reconstruction ||
|| apix || float || The physical distance represented by a single pixel. This parameter, along with the mass argument, is used to run the  [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1NormalizeByMassProcessor.html| normalize.bymass processor]] immediately after 3D reconstruction. The apix argument is also used for generating the x-axis of the automatically generated convergence plots. ||
|| automask3d || float,int,int,int || The threshold, radius, nshells and nshellsgauss parameters, respectively, of the  [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1AutoMask3D2Processor.html| mask.auto3d processor]], which is applied directly after the application of the [[http://blake.bcm.edu/eman2/doxygen_html/classEMAN_1_1NormalizeByMassProcessor.html| normalize.bymass processor]]. ||



<<Anchor(checkfunc)>>

=== Check functionality ===
By specifying the --check flag, e2refine.py checks only whether the specified parameters are valid, and nothing more. Example output is shown below.

{{{
[someone@localhost]$ e2refine.py --check
#### Testing directory contents and command line arguments for e2refine.py
Error: you must specify the --it argument
start.img contains 2498 images of dimensions 100x100
threed.0a.mrc has dimensions 100x100x100
e2refine.py test.... FAILED
#### Test executing projection command: e2project3d.py threed.0a.mrc -f --sym=None --projector=standard --out=e2proj.img --check
Error: you must specify one of either --prop or --numproj
Error: none is an invalid symmetry type. You must specify the --sym argument
e2project3d.py command line arguments test.... FAILED
#### Test executing simmx command: e2simmx.py e2proj.img start.img e2simmx.img -f --saveali --cmp=dot:normalize=1 --align=rotate_translate --aligncmp=dot --check --nofilecheck
e2simmx.py command line arguments test.... PASSED
#### Test executing classify command: e2classify.py e2simmx.img e2classify.img --sep=2 -f --check --nofilecheck
e2classify.py command line arguments test.... PASSED
#### Test executing classaverage command: e2classaverage.py start.img e2classify.img e2classes.1.img --ref=e2proj.img --iter=3 -f --keepsig=1.000000 --cmp=dot:normalize=1 --align=rotate_translate --aligncmp=phase --check --nofilecheck
e2classaverage.py command line arguments test.... PASSED
#### Test executing make3d command: e2make3d.py e2classes.1.img --sym=None --iter=4 -f --recon=fourier --out=threed.0a.mrc --keepsig=1.000000 --check --nofilecheck
Error: none is an invalid symmetry type. You must specify the --sym argument
e2make3d.py command line arguments test.... FAILED
}}}
This functionality will be useful for people who have to submit their jobs to queues - being able to check that the script will work will ensure its successful execution.
+The primary files you would normally look at after a run is complete are:
 * classes_YY - the highest numbered file. This contains the final unaligned class-averages
 * allref_YY - Class-averages which have been sorted and aligned
 * basis_YY - This contains the MSA basis vectors, which may be useful if looking for signs of specific symmetries, etc.
 * aliref_YY - May be useful to look at which averages were used as 2-D alignment references