refine2d.py
This program performs 2-D refinement of particle sets. No 3-D references or models are used.
Usage
refine2d.py [--iter=<iterations>] [--ninitcls=<# initial classes>] [--finalsep=<# split each class>] [--minptcl=<min ptcl/class>] [--proc=<# processors>] [--ctfcw=<SF file for ctf cor>]
[--nosvd] [--nbasis=<# basis images>] [--logptcl]
Parameters
--version |
show program's version number and exit |
-h, --help |
show this help message and exit |
--debug |
debuging output |
|
|
--iter=ITER |
Number of refinement iterations |
-ININITCLS, --ninitcls=NINITCLS |
Number of initial classes for alignment iterations |
-FFINALSEP, --finalsep=FINALSEP |
number of additional class subsplits in final iteration |
--minptcl=MINPTCL |
Minimum number of particles in a final class-average |
-PPROC, --proc=PROC |
Processors to use |
-CCTFCW, --ctfcw=CTFCW |
Structure factor file for full CTF correction |
-BNBASIS, --nbasis=NBASIS |
Number of basis vectors to use in classification |
--nosvd |
Use straight k-means for classification instead of SVD based vectorization |
--nofinalsort |
Do not sort the final class-averages (this can be very slow) |
--logptcl |
Makes a logfile containing the identity of the class-average for each particle |
Description
refine2d.py performs 2-D refinement of a stack of particles with no reference to 3-D models, with or without CTF correction. The overall process is:
- make a small set of initial rough class-averages using startnrclasses (these are not intended to be good)
- align each particle to each class-average, and keep the alignment from the best match (particles are aligned, not classified)
perform SVD on the set of particles. (for this purpose, equivalent to MSA)
- project each particle into the SVD basis, and perform k-means classification on the result
- make new averages, and sort/align them
- iterate (to step 2)
After several iterations this will produce a very robust set of class-averages without any requirement that they form a consistent 3-D model. This is a very good way to test for heterogeneity among your particles, and as a cross-check to insure that the results of a 3-D refinement agree with the original data (projections of the 3-D model should look like the class-averages from refine2d).
Running refine2d.py
refine2d.py is best run in an empty directory, as many intermediate files are created. Unlike refine, refine2d cannot resume an interrupted refinement in the middle. Each time you run the command, it starts from scratch. The input file may have any name. The program works well with phase-flipped images, with or without CTF correction enabled. If CTF correction is used, it is applied only at the very end of processing, and both corrected and uncorrected averages are produced. Output files are as follows:
- iter.final.hed, iter.final.ctfc.hed, iter.final.sort.hed - The final results of the refinement, with or without ctf correction, and with or without a final sort
- iter.*.hed - The results after each iteration
- basis.*.hed - The basis sets (SVD results) after each iteration
- cls???? - directories used for 'finalsep' if present
--iter=<n> : The number of iterations to use depends on the data. for less-noisy/homogeneous data, 4-5 is likely fine. A typical value is 10.
--ninitcls=<n> : The number of classes to generate in each iteration, until the very last iteration. Typically you want to have at least 10-20 particles per class. Much larger numbers are also fine. ie- if you have 100,000 particles, making 100 classes is fine. Large numbers of classes, will, of course, slow the refinement down proportionally.
--finalsep=<nsplit> : In the final iteration, each class can (optionally) be split into several subclasses. The final result will be ninitcls * finalsep classes - bad classes. A 'bad class' is one with too few particles in it.
--minptcl=<n> : If a class has fewer than n particles it won't be included in the final results.
--proc=<n> : Number of processors to use during processing. Parallelism works the same way it does for all other EMAN1 programs.
--ctfcw=<sffile> : Enables CTF correction on the final results, using the structure factor file (sffile) for filtration, as in the refine program.
--nbasis=<n> : Number of basis images to use for classification. Normally the default value is fine. In older versions of refine2d.py this option was broken. If you want to experiment with this, please use a post-1.8 snapshot version of EMAN.