Should the data set be centered before subjecting it to refine2d.py?
No. The only impact centering will have is on the quality of the 'bootstrap' averages (iter.1.img). If they are roughly centered, the impact will be negligible. Even if there is impact on the first set of averages, after a few iterations, this would be eliminated. So, basically, no, they don't need to be precentered, and it's possible (depending on the precentering method) that precentering would actually have a negative effect.
In my case the results of refine2d.py (iter.final.img) has many averages in it that are definitely off center. Does this matter?
While it doesn't really 'matter', it shouldn't be happening, as there is a centering procedure as part of the iterative refinement loop. There are two possible problems here. First, are you using EMAN 1.8 or newer ? One of the major improvements in EMAN 1.8 was a more robust refine2d.py. Second, if your images are negative (ie protein appears black rather than white), the centering routine may not work properly. If neither of these issues is happening in your case, I may need to see the results to identify the problem.
In retrospect I was using EMAN 1.7. I updated to 1.8 and have since not had the same problem.
I have had a similar problem. It only occurs with one of two similar data sets, and I am using v1.8 (although typing 'refine2d.py --version' gives EMAN2 v1.9). Some of the iter.# and iter.final classes are very badly off centre. Both data sets were roughly centred first (using alignment to rot-av total sums in IMAGIC). Do you think it is affecting the alignment, and if so, do you have any suggestions to counter this?