Threaded Parallelism in EMAN2

Most modern computers have at a minimum 4 cores. A decade or two ago this would have been called 4 CPUs, however, now you may still have 2 physical CPUs in your computer, but each may have multiple cores, effectively multiple CPUs in a single package. For that reason, we often make the distinction between the CPU (the physical package) and the number of cores (the effective number of CPUs). Typical computers in 2025 will have 4-12 cores, but can potentially have as many as 96.

Also please note that many vendors advertise the number of threads the CPU supports. In most cases this is 2x the number of cores, and really isn't very useful for heavy duty math and image processing. When specifying the number of threads in EMAN2, use the number of cores instead, though you may get a small benefit out of using ~25% more.

This page explains how to make use of multiple compute cores on your efficiently when running EMAN2 jobs. This parallelism mode is the most efficient by far, but will only work on a SINGLE computer at a time. Historically there is/was an additional mode for running EMAN2 jobs using MPI on multiple nodes at once. While this mode still exists, it requires some knowledge to configure, and with so many cores on a node and with increasing reliance on GPU computations, the MPI mechanism is no longer much of a focus.

If you do absolutely need to make use of multiple computers or clusters together, please see the main Parallelism page.

Quickstart

Programs with parallelism support will take the –parallel command line option as follows:

--parallel=thread:<n>

where <n> should be replaced by the number of cores you wish to use. That's it. Quite simple.

If using the project manager, any parallel boxes should contain thread:<N>. Any threads boxes should just contain the number of threads.

Large /tmp file problem

This will put (sometimes large) temporary files in /tmp. On some systems now /tmp is a ramdisk, which can cause real problems. You can use an alternate folder for these temporary files, but make sure they are on the local computer, not a remote filesystem shared among machines for the same account:

--parallel=thread:<n>:<tmp_path>

for example:

--parallel=thread:32:/home/stevel/tmp

The –threads option should not have this problem.

Details

As above, in essence all you need to do is say, for example:

--parallel=thread:4

to make use of 4 cores on your computer.

If the specific program supports it, you should also specify:

--threads=4

This option is for cases where –parallel (which also supports MPI and other types of parallelism) cannot be used.

Specifying a number of threads significantly larger than the number of cores your computer has will quite probably cause the job to run more slowly, and in some cases may cause it to run disastrously slowly.

Note about disk space: - This parallelism option will put a bunch of scratch files in /tmp. These files can get quite large, so if your /tmp filesystem is small, you may wish to put the scratch files elsewhere. You can just specify –parallel=thread:<n>:</path/to/scratch> to do this, but be warned: The scratch directory MUST be on a local hard drive, NOT a shared filesystem from another computer !!! Violating this could lead to database corruption !

IMPORTANT WARNING ABOUT MEMORY - For most tasks, if you specify thread:8, that job will use 8x as much memory (RAM, NOT disk space) as if you use only a single thread. If you have, for example, only 16 gigs of RAM, and your job was using 4 gigs when you ran it on a single processor, if you then specify thread:8, you will probably exhaust your system memory, and likely cause either excessive swapping (making your machine seem to run like molasses), or possibly even crash the entire machine. This is particularly important if you have a machine with, say, 64 cores.

Note that not all programs will run in parallel, and some will intrinsically use the GPU rather than threads on the CPU. If a program does not accept the –parallel option or the –threads option, then it isn't parallelized.