EMAN1/FAQ/Refine - EMAN Wiki

Refine command options

The refine command is at the heart of using EMAN1 for single particle reconstructions, and it has many options, some of which can produce truly awful results, so it is important not to simply go into the process blindly. This is an overview of the various parameters:

refine <total iter> mask=<rad> [proc=<maxproc>] [hard=<maxpr>] [simple] [ctfc=<res in A>] [ctfcw=<SF file>] [phaseopt] [setsf=<lowpass res>[,<highpass res>[,<SF file>]] [median] [nweight] [sym=<c2,c4,etc>] [ang=<dang>] [maxshift=<rad>] [pad=<size>] [classkeep=<sigma coef>] [classiter=<iter>] [refmaskali] [filt3d=<lp rad>] [sep=<n>] [xfiles=<a/pix>,<mass in kd>,<ali to>] [3dit=<n it>] [3dit2=<n it>] [speed=<1-5>] [sigfilt] [euler2=<oversmp>] [tree=<2,3>] [imask=<rad>] [amask=<r>,<thr>,<n iter>] [goodbad] [slow] [refine] [shrink=<n>] [projbatches=<n>] [ca3] [classfp=<msa#>] [perturb] [usefilt] [rfp] [precen] [phasecls] [fscls] [axialfilt=<res>] [msa=<proj>] [mra2] [3dr[=<range>]] [d3s=<sigmult>] [collapse=<ang>] [dfilt] [continue]

Normally used

<total iter> - Final number of iterations you wish to be complete in this directory. ie - if 5 are complete and you specify 6, 1 more iteration will run
mask=<rad> - A circular mask to be applied virtually everywhere. Should be a few pixels larger than the largest radius of your particle
ctfc=, ctfcw= OR median - CTF correction options. ctfc takes a filter resolution in Angstroms, but ctfcw is far superior. ctfcw takes the name of a 1D structure factor file used when fitting the data (MUST be the same file). median does no CTF correction at all. Naturally the data must be properly preprocessed for this option to function, otherwise crashes or invalid results are likely.
hard=<phase err> - Rather obscure. This option determines when a class-average should be excluded from the 3D reconstruction process. 25 is generally good. see 'make3d' for more info.
sym=<cn,dn,oct,icos> - For asymmetric objects, either omit this specification, or use 'c1'. cn denotes a single n-fold rotational symmetry (about the z-axis), dn denotes n-fold dihedral symmetry (cn with n 2-folds in the x-y plane), oct is octahedral (2-3-4, symmetry of a cube), icos is icosahedral (2-3-5).
ang=<dang> - Angular spacing between projections. Smaller numbers produce more projections, usually between 1-10. Usually 90/n where n is any integer.
pad=<size> - This is used to reduce artifacts in Fourier reconstruction. Should be about 25% larger than your model in most cases, and have small prime factors, ie - model size 48 -> pad=64, 64-> pad=96, 100-> pad=128
classkeep=<sig mult> - This determines how many raw particles are discarded for each class-average. This is defined in terms of the standard-deviation of the self-similarity of the particle set. A value close to 0 (should not be exactly 0) will discard about 50% of the data. 1 is a typical value, and will typically discard ~10-20% of the data.
refine – Almost always specified. This will increase the alignment accuracy (with a speed penalty). May be omitted for early rounds of refinement.
classiter=<n> - Generation of class-averages is an iterative process. Rather than just aligning the raw particles to a reference, they are iteratively aligned to each other to produce a class-average representative of the data, not of the model. ie - this eliminates initial model bias, typically 8 in the early rounds and 3 in later rounds, 0 may be used at the end, but model bias may result.
dfilt, fscls, phasecls – Mutually exclusive options that determine similarity criteria used for classification and quality comparisons. dfilt requires ctfcw, but generally produces the best results. phasecls uses mean phase error. fscls uses Fourier ring correlation. dfilt is an optimized variance with a matched filter.

Optional

filt3d=<rad> - Applies a lowpass filter to the 3D model between iterations. This can be used to correct problems that may result in high resolution terms being upweighted. <rad> is the same as for the 'lp=' option in proc3d.
sep=<n> - This interesting option causes each particle to be assigned to the n best classes, not just the single best class. This may be used to smooth and improve fine details in the final stages of a high resolution refinement. Generally used with oversampled ang= values.
tree=<2,3> - This can be a risky option, but it can produce dramatic speedups in the refinement process. Rather than comparing each particle to every reference, this will decimate the reference population to 1/4 or 1/9 of its original size, classify, then locally determine which of the matches is best. Is is safest in conjunction with very small angular steps, ie - large numbers of projections. The safest way to use this is for the initial iterations of refinement (then turn it off for the last couple of iterations). May not work on certain cluster configurations.
shrink=<n> - Another option that can produce dramatic speed improvements. In some cases, this option can actually produce an improvement in classification accuracy. This option scales the particles and references down by a factor of n before classification. Since data is often heavily oversampled, and classification is dominated by low resolution terms, this can be both safe, and actually improve classification by 'filtering' out high resolution noise. Generally shrink=2 is safe and effective especially for early refinement. In cases of extreme oversampling, larger values may be ok. This option should not be used for the final rounds of refinement at high resolution.
usefilt - This flag allows one to use arbitrarily filtered raw particles for classification purposes, but still use the unfiltered data when generating the actual reconstruction. To use this option, apply filter the data from start.hed into start.filt.hed. This option is now recommended for most reconstructions where ctfcw= is used. The particles should be filtered with proc2d wiener=.

Suggested

xfiles=<A/pix>,<mass>,<ali to> - This is a convenience option. For each 3D model it will produce a corresponding x-file: threed.1a.mrc -> x.1.mrc. Based on A/pix and mass (in kd), the x-file will be scaled so an isosurface threshold of 1 will contain ~ the specified mass. 'ali to' is an iteration number. ie - if 'ali to' is 4, then x.7.mrc would be aligned in 3D to x.4.mrc. x.3.mrc would not be aligned at all. Often this is set to a large value, like 99.
amask=<r>,<threshold>,<iter> - This option applies an automatically generated 'form fitting' soft (Gaussian) mask to the model after each iteration. The mask generation is generally quite good. See proc3d option automask2 for details. This option can only be used in conjunction with xfiles=, since selection of the threshold requires proper volume normalization. This option can have a profound effect on proper convergence, but should be used with caution. Too tight a mask will produce artificially inflated resolution values. To use properly, insure that the mask does not cut through any significant density in the final model, and that the mean value is already ~0 at the mask position. This is somewhat akin to solvent flattening.