EMAN2/Parallel/Threaded

Threaded Parallelism in EMAN2

Most modern computers have at a minimum 2 'cores'. A decade ago this would have been called 2 CPUs, however, now, multiple CPUs are effectively packaged together into a single physical device, but it is often still possible to have multiple physical CPUs, each with multiple cores, on a single computer. Typical computers in 2010 will have 4 cores, but then total number can be 12 (or even higher with some AMD configurations).

This page explains how to make use of multiple compute cores on your efficiently when running EMAN2 jobs. This parallelism mode is the most efficient by far, but will only work on a SINGLE computer at a time. If you need to make use of multiple computers or clusters, please see the main Parallelism page. For most jobs you should be able to achieve close to an N-fold speedup when using this mechanism (ie- a job using 4 cores will run close to 4x faster than a job running on one core), however, this will vary with the size of the job. In general the larger the size of the project, the more efficiently you will be able to make use of multiple cores.

Quickstart

Programs with parallelism support will take the --parallel command line option as follows:

--parallel=<type>:<option>=<value>:<option>=<value>:...

To make use of multiple cores on your computer simply specify:

--parallel=thread:<n>

where <n> should be replaced by the number of cores you wish to use. That's it. Quite simple.

Details

As said above, in essence all you need to do is say, for example:

--parallel=thread:4

to make use of 4 cores on your computer. However, if you are running, for example, on your desktop computer or workstation, you might wish to consider using 1 less core than you actually have to help make your machine more responsive for normal interactive use while the job is running. This is completely up to you.

Specifying a number of threads larger than the number of cores your computer has will quite probably cause the job to run more slowly, and in some cases may cause it to run disastrously slowly.

How do I know how many cores my machine has ? - This depends on what OS you are using. On a Mac, simply use the 'About this Mac' item on the apple menu. It may say something like "2 x 2.66 6 core Xeons" or somesuch (meaning, in this case, 12 cores). Under linux, you can 'cat /proc/cpuinfo', and it will give information on each core. Processors are numbered starting with 0, so if you see 'Processor : 3' as the last entry, you have 4 cores on your machine. However, there is a possibility that this number may be 2x larger than your actual number of cores. Intel has a technology called 'hyperthreading' which they use to market their chips. This will make the machine appear to have 2x as many cores as it actually physically has, and can give a performance advantage under some specific situations (like word-processing, etc.), but is actually quite detrimental for something like large computational jobs. Again, if you have only 4 physical cores, with hyperthreading making it look like you have 8 cores, you should only specify 4 threads to EMAN2, or you will almost certainly make your job run slower, and perhaps even crash your machine under certain situations.

IMPORTANT WARNING ABOUT MEMORY - For most tasks, if you specify thread:4, that job will use 4x as much memory (RAM, not disk space) as if you use only a single thread. If you have, for example, only 2 gigs of RAM, and your job was using 1 gig when you ran it on a single processor, if you then specify thread:4, you will probably exhaust your system memory, and likely cause either excessive swapping (making your machine seem to run like molasses), or possibly even crash the entire machine. This is particularly important if you have a machine with, say, 12 cores. This effect can be mitigated to some extent through use of the '--lowmem' option provided for commands like 'e2refine.py', but it will not eliminate the problem. This issue is extremely problem dependent, though. If you are refining something like the demo groel data set with a box size <200, and only ~5,000 particles, memory isn't likely to even approach being exhausted. However, if you're processing a virus with a box size of 800x800 to very high resolution, you will almost certainly have issues unless you have a LOT of RAM.

Note that not all programs will run in parallel. If a program does not accept the --parallel option, then it is not parallelized.