Differences between revisions 1 and 2
Revision 1 as of 2019-09-17 16:06:47
Size: 45
Editor: SteveLudtke
Comment:
Revision 2 as of 2019-09-18 04:40:55
Size: 1798
Editor: SteveLudtke
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:

If your goal is to develop software and have a good test data set where you know the ground truth, just give up now. All simulated data sets I have ever seen fall far short of real data. That is, any reasonable algorithm will perform extremely will even with very noisy simulated data, much more so than with real data.

However, if your goal is to better understand how the software works through use of some artificial, but somewhat realistic simulated data, or if you need to perform initial testing on algorithms to make sure they at least work with simulated data, then this is the page to read.

Making simulated data isn't all that difficult. If you wish to start with a structure from the EMDB instead of PDB, you can jump directly to the seconds step:

 1) If you want to start from a PDB file, the first thing to do is convert the PDB file to a density map. There may be some issues with non-crystallographic symmetry, etc. Run {{{e2pdb2mrc.py --help}}} for a full set of options, but this will work for most purposes. Note that there is no MMCIF support at present:
{{{
e2pdb2mrc.py <input.pdb> <output.hdf> --center --apix <apix> --res <resolution> --box <boxsize>
}}}
you will need to pick a good box size and A/pix value. Note that resolution here refers to the 1/2 width of a Gaussian blurring operation. This is not at all what resolution is in CryoEM, where it is a measure of noise level. Generally resolution should be a larger number than 2*A/pix.
 1) Next, you will likely want to make a set of projections in different orientations. You will likely want to repeat this process multiple times to simulate data from different "micrographs" each with a different defocus:
{{{
e2project3d.py <input.hdf> --output
}}}

Making Simulated Single Particle Data

If your goal is to develop software and have a good test data set where you know the ground truth, just give up now. All simulated data sets I have ever seen fall far short of real data. That is, any reasonable algorithm will perform extremely will even with very noisy simulated data, much more so than with real data.

However, if your goal is to better understand how the software works through use of some artificial, but somewhat realistic simulated data, or if you need to perform initial testing on algorithms to make sure they at least work with simulated data, then this is the page to read.

Making simulated data isn't all that difficult. If you wish to start with a structure from the EMDB instead of PDB, you can jump directly to the seconds step:

  • 1) If you want to start from a PDB file, the first thing to do is convert the PDB file to a density map. There may be some issues with non-crystallographic symmetry, etc. Run e2pdb2mrc.py --help for a full set of options, but this will work for most purposes. Note that there is no MMCIF support at present:

e2pdb2mrc.py <input.pdb> <output.hdf> --center --apix <apix> --res <resolution> --box <boxsize>

you will need to pick a good box size and A/pix value. Note that resolution here refers to the 1/2 width of a Gaussian blurring operation. This is not at all what resolution is in CryoEM, where it is a measure of noise level. Generally resolution should be a larger number than 2*A/pix.

  • 1) Next, you will likely want to make a set of projections in different orientations. You will likely want to repeat this process multiple times to simulate data from different "micrographs" each with a different defocus:

e2project3d.py <input.hdf> --output 

EMAN2/SimulatedData (last edited 2019-09-19 15:29:48 by SteveLudtke)