Making Simulated Single Particle Data
If your goal is to develop software and have a good test data set where you know the ground truth, just give up now. All simulated data sets I have ever seen fall far short of real data. That is, any reasonable algorithm will perform extremely will even with very noisy simulated data, much more so than with real data.
However, if your goal is to better understand how the software works through use of some artificial, but somewhat realistic simulated data, or if you need to perform initial testing on algorithms to make sure they at least work with simulated data, then this is the page to read. It is a good idea to use HDF format as much as possible in this process to preserve all metadata in the header.
Making simulated data isn't all that difficult. If you wish to start with a structure from the EMDB instead of PDB, you can jump directly to the seconds step:
If you want to start from a PDB file, the first thing to do is convert the PDB file to a density map. There may be some issues with non-crystallographic symmetry, etc. Run e2pdb2mrc.py --help for a full set of options, but this will work for most purposes. Note that there is no MMCIF support at present:
e2pdb2mrc.py <input.pdb> <output.hdf> --center --apix <apix> --res <resolution> --box <boxsize>
- you will need to pick a good box size and A/pix value. Note that resolution here refers to the 1/2 width of a Gaussian blurring operation. This is not at all what resolution is in CryoEM, where it is a measure of noise level. Generally resolution should be a larger number than 2*A/pix.
- Next, you will likely want to make a set of projections in different orientations. You will likely want to repeat this process multiple times to simulate data from different "micrographs" each with a different defocus:
e2project3d.py <input.hdf> --output <output.hdf> --orientgen=rand:phitoo=1:n=200
- you can replace 200 with the number of particles you wish to create. You would run this command once (with different output files) for each 'micrograph' you wish to simulate
this oriengen will produce completely random orientations. e2help.py orientgen for other possibilities
if your input map possesses symmetry, you can specify --sym <symmetry>, though this isn't very meaningful with completely random orientations
Once you have projections you will need to modify them to simulate the CTF/MTF of the instrument, and add noise. While EMAN2 does have e2ctfsim.py --apply <projections> for interactive CTF simulation/visualization, this isn't very useful in producing simulated data. Instead, I suggest using e2filtertool.py to interactively figure out what sorts of CTF and noise parameters you want to use, then use e2proc2d.py to apply these to the individual projection stacks you just created. You should be familiar with e2filtertool.py before attempting this. I highly recommend watching the video tutorial if you aren't familiar with this tool.
run e2filtertool.py <projection stack>. This will show a subset of the projections which will be updated interactively.
Create a math -> simulatectf entry
- Typical values for parameters for a high-end microscope and detector would be:
- ampcont=10 %
- bfactor=50 A^2
- cs=2.7 mm
- defocus= 1 - 2 um
- phaseflip=1 (otherwise CTF phase flipping isn't performed)
- for noise simulation, you can use the simple "noiseamp" parameter above, or add another noise-adding processor, which may give you more control. Noiseamp values are project-dependent. If you have a defocus of 1 and adjust noiseamp such that the projections are barely visible through the noise, this is probably a fairly realistic noise level.
The goal of e2filtertool.py is to interactively adjust the parameters until you are happy with them. While there is a menu item which will allow you to apply this processor to the full set of projections and save it, repeating that process for N stacks of projections would be annoying. Instead, once you have a set of parameters you like:
look at the filtertool_default.txt file in the local folder. This text file contains the parameters you need to use e2proc2d.py to apply the CTF to many sets of projections, eg -
e2proc2d.py <projections_1.hdf> <simulated_1.hdf> --process=math.simulatectf:ampcont=10.0:bfactor=50.0:cs=2.7:defocus=1.5:noiseamp=0.02:phaseflip=1:voltage=300.0
repeat this process for each stack of projections, changing the defocus= option for each stack (again, the range of 1-2 microns would be typical for most projects)