Q: What symmetry should I specify during refinement if I don't know what it is ?
A: There is no single good answer to this question. It can be a very complicated issue. Take the case of TriC/CCT, for example:
In this case, the object in question really has no symmetry, or at most D1 symmetry, since it is constructed of 8 unique proteins. However, these proteins have extremely high homology, and in fact, at resolutions worse than 4-6 A, the structure effectively has D8 symmetry, because the differences between subunits are too subtle to observe. Even for particles which don't have different proteins, the concept of true symmetry at all resolutions is a myth. At SOME resolution the symmetry will be broken, but the resolution at which this appears is unpredictable.
There is an additional complication. Say, for example that you have a multimeric molecule with an ATP (or other functional ligand) binding site. Unless you do something to cause these binding sites to be saturated or alternatively, completely empty, you will expect the different subunits to be in subtly different conformations. In fact, say you had a ring of 6 subunits, and only 1 had bound ATP, not only would that one be in a slightly different conformation, but its neighbors might also be perturbed by the first one. This is one mechanism by which things like cooperative binding are accomplished. Regardless, the structures might remain effectively symmetric to, say 12 Å, but be asymmetric at higher resolutions. However, that doesn't mean you couldn't impose the symmetry anyway, and achieve a structure which evaluates to 8 Å resolution, despite the fact that the final structure represents the average of several conformations.
Subtle symmetry arguments aside, for most symmetric particles, the symmetry can be fairly easily observed in reference-free class averages, so this is a good place to start when assessing the symmetry of an object. Of course, it is possible that even if your object is symmetric, the symmetric view of the object may be absent in your particle set due to problems with preferred orientation. This can make accurate symmetry assessment much more challenging.
So, look at your reference-free class averages and consider the following possibilities:
- icos : icosahedral symmetry. You should have 5-folds, 3-folds and 2-folds, though often all won't be readily seen. Generally the objects will be ball-like, such as virus capsids.
- oct : octahedral symmetry. You should see 4-folds, 3-folds and 2-folds. This is the symmetry of a cube or an octahedron.
- tet : tetrahedral or cubic symmetry. 3-folds and 2-folds (no 4-fold). This is the symmetry of a tetrahedron.
- Cn : A single rotational symmetry axis with n-fold symmetry.
- Dn : Same as Cn but with a 2-fold symmetry 90 degrees from the Cn axis. Generally this would mean 2 ring-like structures attached to each other, for example most chaperonins, apoptosome, etc. Observed as a class-average with an n-fold symmetry, and another class-average with a clear 2-fold symmetry (usually also pseudo-mirror symmetry)
- C1 : No symmetry. In fact, you can simply omit the sym= option in EMAN2 programs, but if you need to specify something, just say C1.
It is worth noting that three other methods have commonly been used to make this assessment in literature as well. First, you can compute the rotational correlation for your (centered) particle data for each possible symmetry. For example, for 4-fold symmetry, you would take each particle and compute a correlation between the particle and itself rotated 90, 180 and 270 degrees. All of these correlation scores are summed, and plotted for each possible symmetry. The true symmetries should have the highest scores. Keep in mind, though, that, for example, in the case of icosahedral symmetry, this means you should see peaks for 2,3 AND 5, which may not be readily distinguished. It also requires that the symmetric views of the assembly in question are present in your data, which is not always the case.
The second method is similar, as "rotational power spectrum" is computed for each particle and summed. Again, you expect to see peaks corresponding to the true symmetry.
The third method is related to the reference-free class-average method, but rather than looking at the averages themselves, you look at the eigenimages of the particle data. This approach can produce useful results, however, it can also be deceptive, as one possible set of Eigenimages that can come out of this analysis correspond to various circular harmonics, particularly if the rotational alignment of the particles is poor.
In the end, none of these 2-D methods can really provide a decisive answer, due to the fact that they are only 2-D, and mathematically it is possible to have symmetry (or near-symmetry) in a 2-D projection, even if the underlying 3-D object is not symmetric. While these situations are not very likely in protein biology, it still means that 2-D answers are not mathematically robust. At best these methods should be used for determining putative symmetries, which can then be tested in 3-D.
Once a putative symmetry has been identified, you can try running e2initialmodel.py using this symmetry on your class-averages, and assess the results. This program is relatively fast, so you can try different options if more than one possibility exists. Once you have a reasonable guess at the symmetry, try using this symmetry for a full refinement using your raw data.
A common method proposed by users new to the single particle field is to just dump their raw data into a single particle refinement with no imposed symmetry, and see what you get out. Unfortunately, there are many problems with this approach:
- model bias. If the object has symmetry and it is not imposed, you give model bias a lot more room to come up with incorrect structures which still seem to match the data, particularly if you give the refinement a bad starting model.
- Achieving the same resolution without symmetry requires n times more data than it does in the presence of an n-fold symmetry.
- Achieving the same resolution without symmetry requires ~ n^2 times more compute time.
That is, for an icosahedral virus, an asymmetric reconstruction requires ~60x more data and 3600x more compute time to complete (naively).
For this reason, the suggested approach is to refine your structure with your best guess at the correct symmetry, then after the model has fully converged, relax the symmetry, and continue refining for 3-4 more cycles (generally with parameters targeting lower resolution). If you have the symmetry correct, the symmetric structure will remain largely intact, with only very small variations (which may be true symmetry breaking effects, or model bias effects). If you had the symmetry incorrect, you should see rapid divergence from your initial refined structure. For example, it is completely possible to refine GroEL imposing 8-fold rather than 7-fold symmetry. One can even obtain a structure will good evaluated 'resolution'. However, if symmetry is then 'turned off', the structure will rapidly 'fall apart', under continued refinement (eventually reaching it's true 7-fold arrangement).
While it may be unsatisfying that it is impossible to achieve a 'truly symmetric' particle without imposing the symmetry in question, it is unfortunately inevitable due to the noise level present in TEM data. Even if the particles were PERFECTLY symmetric, our noisy data is not. While methods such as maximum liklihood can minimize noise bias effects, in the end, you must decide whether the biological questions you are answering are better served by a symmetry-imposed average or an asymmetric structure. The answer may well be both in many cases. Look at icosahedral viruses, where, despite the fact that effectively all capsids will have a disruption to the symmetry at at least 1 5-fold vertex, symmetry-imposed reconstructions remain the norm. Despite the known inaccuracy in ~1/12 of the vertices, the final high resolution structures remain highly accurate overall. However, it has also become quite common to relax the symmetry of such structures to look for the symmetry-breaking vertex, albeit at much lower resolution.