EMAN1/FAQ/DfiltOption

Q: What is this 'dfilt' option for the refine command, and how does it work ? How does it compare to phasecls and fscls ?

A: In ~2002 when we were first moving to subnanometer resolution, we were struggling with some problems with similarity criteria. It turned out that for GroEL, for some reason, using simple a weighted correlation coefficient to compare raw particles to model projections produced a bias, which tended to shift the side views of GroEL to orientations a few degrees away from the side view. After some effort, we realized that the reason for this was a difference in filtration between the particles and the reference model. That is, the reference model can take advantage of the data from all of the particles in the raw data set, and has been CTF corrected and Wiener filtered to produce a final model of appropriate 'clarity'. Individual particles, of course, while they have been phase-corrected, still retain their original envelope function, and in some cases have been further filtered to reduce noise. When you compare a 'blurry' side view of GroEL to a much less 'blurry' projection of a side view, they don't match perfectly, the blurriness around the edges makes the particle seem taller than the reference by a small amount. If the reference were tilted very slightly, it too will appear to be slightly taller, and with a correlation coefficient for a similarity metric, this tilted view ends up with a higher similarity score, resulting in slight misclassification of the particles, which only intensifies as you iterate. The effect is, of course, highly geometry dependent, and also dependent on the various filters that have been applied.

The question then becomes how best to properly compare particles to references taking this filtration difference into account. The obvious solution is to drop back to an oft-used criterion, the phase residual (or the closely related mean phase error), as it is not sensitive to differences in Fourier amplitudes. One could even use a weighted mean phase error to take some slight advantage of amplitudes without biasing the process too much. While this did, in fact work reasonably well, it also completely discards the Fourier amplitude information which is in principle valuable information which should still be used to achieve better alignments and thus higher resolutions. This mean phase error method is embodied in the 'phasecls' option.

In considering how to take advantage of some of the Fourier amplitude information, while still avoiding possible orientation bias, another fairly simple idea occurred to us. Why not use a Fourier ring correlation curve and integrate it in some fashion. The FRC and FSC (in 3-D) are both also immune to filtration artifacts, as each ring/shell is independently normalized. This allows us to take advantage of relative amplitude variations around the ring/shell. There is still a lot of flexibility in how one integrates the resulting curve, and other issues, but anyway, our variant of this routine is implemented as the 'fscls' option.

Both of these options worked well and seemed to solve the orientation bias problem at least. However, in testing on a variety of data sets, it became clear that in some cases fscls worked better and in other cases phasecls worked better, and it was far from clear why this was the case or how to go about optimizing the possible variants of these routines. Then another idea occurred to us. Why not try to take advantage of all of the information present in the particle using a fairly complicated method. THIS is the method implemented in the 'dfilt' and 'dfilt2' options. This algorithm incorporates the following steps:

Match the filtration of the reference image to the particle
1. This involves computing the 1-D power spectrum of the particle, and imposing it on the projection. IF the two particles were in the same orientation, this should be the optimal filtration. In the dfilt option, noise is ignored in this process making it theoretically suboptimal. The dfilt2 option tries to rectify this somewhat, but there is no clear winner between the two.
Normalize the density of the filtered projection to the density of the particle.
1. This basically finds the optimal linear transform to apply to the density (in real space) for the models to match. The key is that this is done under a mask supplied with the projection, so it isn't strictly a linear operation.
Finally, compute the RMSD between the two particle images. This becomes the similarity metric (smaller is better).

Over recent years, it is clear that 'dfilt' is the best performer overall of the available criteria, and it has been used in all of the structures in our recent publications, including the 4 A resolution structure of GroEL. However, an important note. The 'dfilt' option requires that you are doing full (ctfcw=) CTF correction to function, whereas the other two don't care. phasecls and fscls are also quite reasonable, and it is unclear which of the two is generally better.

Hope that helps.