I have processed my data set to high resolution, but I'm sure the raw data is heterogeneous. Is there any way for me to resolve this heterogeneity to get a better resolution structure out, or to produce multiple models from my single data set ?

Two approaches have evolved in EMAN for dealing with heterogeneous data sets.

mkdir r0 r1
cd 0
foreach i (cls*lst)
proc2d $i ../r0/start.hed first=1
end
cp threed.4a.mrc ../r0/threed.0a.mrc    (use the last iteration)
cd ../1
foreach i (cls*lst)
proc2d $i ../r1/start.hed first=1
end
cd ../r0
refine ...
cd ../r1
refine ...

foreach i (cls*lst)
proc2d $i avgs.hed average
end

Then either manually, or using multirefine (on avgs.hed), separate the data you want from avgs.hed. For each image in avgs.hed that you want to keep, you could run:

proc2d cls0023.lst mygooddata.hed

The problem with this is that, while you would get the particles you want, they are already aligned in 2-D to the class-average. If you used these particles for futher processing, they would end up getting rotated a second time, which may adversely effect your final resolution, so instead, run:

~/EMAN/python/lstsub.py cls0023.lst mygooddata.hed ptcl2orig.lst

This will copy the original unrotated particles corresponding to each class.

Once you have the particles you want in mygooddata.hed, you can run refine or other postprocessing on it.

I have run refine2d.py just as above and am trying to separate my data, only when I run v2 on iter.final.img and decide that, say, image 6 reflects a 'bad particle' it turns out the cls0005.lst is not the corresponding list file - I did an "iminfo *lst | grep myparticlenumber" and determined that the right lst file is cls0019.lst - is there something that I'm doing wrong?

The iter.final.hed file is actually sorted. Probably the iter.final.hed shouldn't be, but it is. You need to recreate the averages file in the correct sorted order as shown above, then look at avgs.img instead of iter.final.hed.

It seems as though refine2d.py will only work on 2000 images at a time - is there some way around it?

The bootstrapping procedure which is used for the first round works on only the first 2000 particles. The remaining iterations use all of the data. startnrclasses in the first iteration is not intended to produce very good results, so it doesn't bother working with all of the data.

EMAN1/FAQ/Heterogeneous (last edited 2008-11-26 04:42:30 by localhost)