Making Simulated Single Particle Data

If your goal is to develop software and have a good test data set where you know the ground truth, just give up now. All simulated data sets I have ever seen fall far short of real data. That is, any reasonable algorithm will perform extremely will even with very noisy simulated data, much more so than with real data.

However, if your goal is to better understand how the software works through use of some artificial, but somewhat realistic simulated data, or if you need to perform initial testing on algorithms to make sure they at least work with simulated data, then this is the page to read. It is a good idea to use HDF format as much as possible in this process to preserve all metadata in the header.

Making simulated data isn't all that difficult. If you wish to start with a structure from the EMDB instead of PDB, you can jump directly to the seconds step:

  1. If you want to start from a PDB file, the first thing to do is convert the PDB file to a density map. There may be some issues with non-crystallographic symmetry, etc. Run e2pdb2mrc.py --help for a full set of options, but this will work for most purposes. Note that there is no MMCIF support at present:

e2pdb2mrc.py <input.pdb> <output.hdf> --center --apix <apix> --res <resolution> --box <boxsize>
  1. Next, you will likely want to make a set of projections in different orientations. You will likely want to repeat this process multiple times to simulate data from different "micrographs" each with a different defocus:

e2project3d.py <input.hdf> --output <output.hdf> --orientgen=rand:phitoo=1:n=200
  1. Once you have projections you will need to modify them to simulate the CTF/MTF of the instrument, and add noise. While EMAN2 does have e2ctfsim.py --apply <projections> for interactive CTF simulation/visualization, this isn't very useful in producing simulated data. Instead, I suggest using e2filtertool.py to interactively figure out what sorts of CTF and noise parameters you want to use, then use e2proc2d.py to apply these to the individual projection stacks you just created. You should be familiar with e2filtertool.py before attempting this. I highly recommend watching the video tutorial if you aren't familiar with this tool.

    • run e2filtertool.py <projection stack>. This will show a subset of the projections suitable

    • Create a math -> simulatectf entry