eman2:parallel_threaded
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
eman2:parallel_threaded [2025/07/02 23:59] – created steveludtke | eman2:parallel_threaded [2025/07/03 01:27] (current) – steveludtke | ||
---|---|---|---|
Line 1: | Line 1: | ||
==== Threaded Parallelism in EMAN2 ==== | ==== Threaded Parallelism in EMAN2 ==== | ||
- | Most modern computers have at a minimum | + | Most modern computers have at a minimum |
- | This page explains how to make use of multiple compute cores on your efficiently when running EMAN2 jobs. This parallelism mode is the most efficient by far, but will only work on a SINGLE computer at a time. If you need to make use of multiple computers or clusters, please see the main [[EMAN2: | + | Also please note that many vendors advertise |
- | Please | + | This page explains how to make use of multiple compute cores on your efficiently when running EMAN2 jobs. This parallelism mode is the most efficient by far, but will only work on a SINGLE computer at a time. Historically there is/was an additional mode for running EMAN2 jobs using MPI on multiple nodes at once. While this mode still exists, it requires some knowledge to configure, and with so many **cores** on a node and with increasing reliance on GPU computations, |
+ | |||
+ | If you do absolutely need to make use of multiple computers or clusters together, please see the main [[EMAN2:Parallel|Parallelism]] page. | ||
=== Quickstart === | === Quickstart === | ||
Programs with parallelism support will take the --parallel command line option as follows: | Programs with parallelism support will take the --parallel command line option as follows: | ||
+ | < | ||
--parallel=thread:< | --parallel=thread:< | ||
+ | </ | ||
where <n> should be replaced by the number of cores you wish to use. That's it. Quite simple. | where <n> should be replaced by the number of cores you wish to use. That's it. Quite simple. | ||
Line 19: | Line 21: | ||
This will put (sometimes large) temporary files in /tmp. On some systems now /tmp is a ramdisk, which can cause real problems. You can use an alternate folder for these temporary files, but make sure they are on the local computer, not a remote filesystem shared among machines for the same account: | This will put (sometimes large) temporary files in /tmp. On some systems now /tmp is a ramdisk, which can cause real problems. You can use an alternate folder for these temporary files, but make sure they are on the local computer, not a remote filesystem shared among machines for the same account: | ||
+ | < | ||
--parallel=thread:< | --parallel=thread:< | ||
+ | </ | ||
for example: | for example: | ||
+ | < | ||
--parallel=thread: | --parallel=thread: | ||
+ | </ | ||
The --threads option should not have this problem. | The --threads option should not have this problem. | ||
Line 31: | Line 37: | ||
As above, in essence all you need to do is say, for example: | As above, in essence all you need to do is say, for example: | ||
+ | < | ||
--parallel=thread: | --parallel=thread: | ||
+ | </ | ||
- | to make use of 4 cores on your computer. However, if you are running, for example, on your desktop computer or workstation, | + | to make use of 4 cores on your computer. |
- | As mentioned on the previous page you should also specify: | + | If the specific program supports it, you should also specify: |
+ | < | ||
--threads=4 | --threads=4 | ||
+ | </ | ||
- | for any programs that support it. This option is for cases where --parallel (which also supports MPI and other types of parallelism) cannot be used. | + | This option is for cases where --parallel (which also supports MPI and other types of parallelism) cannot be used. |
- | + | ||
- | Specifying a number of threads larger than the number of cores your computer has will quite probably cause the job to run more slowly, and in some cases may cause it to run disastrously slowly. | + | |
- | //What about hyperthreading// | + | Specifying a number |
- | //How do I know how many cores my machine has ?// - This depends on what OS you are using. On a Mac, simply use the 'About this Mac' item on the apple menu. It may say something like "2 x 2.66 6 core Xeons" or somesuch (meaning, in this case, 12 cores). Under linux, you can 'cat / | + | * //What about hyperthreading// - Some computers support the concept of hyperthreading. This is when a CPU pretends to have more cores than it actually has, and tries to run 2 jobs using the same core. Sometimes |
- | //Caveat -// Under linux there is a possibility that this number may be 2x larger than your actual number of cores. Intel has a technology called | + | * //How do I know how many cores my machine has ?// - This depends on what OS you are using. |
+ | * On a Mac, simply use the 'About this Mac' | ||
+ | * Under linux, you can 'cat / | ||
**Note about disk space:** - This parallelism option will put a bunch of scratch files in /tmp. These files can get quite large, so if your /tmp filesystem is small, you may wish to put the scratch files elsewhere. You can just specify --parallel=thread:< | **Note about disk space:** - This parallelism option will put a bunch of scratch files in /tmp. These files can get quite large, so if your /tmp filesystem is small, you may wish to put the scratch files elsewhere. You can just specify --parallel=thread:< | ||
- | **IMPORTANT WARNING ABOUT MEMORY** - For most tasks, if you specify thread:4, that job will use 4x as much memory (RAM, NOT disk space) as if you use only a single thread. If you have, for example, only 2 gigs of RAM, and your job was using 1 gig when you ran it on a single processor, if you then specify thread:4, you will probably exhaust your system memory, and likely cause either excessive swapping (making your machine seem to run like molasses), or possibly even crash the entire machine. This is particularly important if you have a machine with, say, 12 cores. This effect can be mitigated to some extent through use of the ' | + | **IMPORTANT WARNING ABOUT MEMORY** - For most tasks, if you specify thread:8, that job will use 8x as much memory (RAM, NOT disk space) as if you use only a single thread. If you have, for example, only 16 gigs of RAM, and your job was using 4 gigs when you ran it on a single processor, if you then specify thread:8, you will probably exhaust your system memory, and likely cause either excessive swapping (making your machine seem to run like molasses), or possibly even crash the entire machine. This is particularly important if you have a machine with, say, 64 cores. |
- | Note that not all programs will run in parallel. If a program does not accept the --parallel option, then it is not parallelized. | + | Note that not all programs will run in parallel, and some will intrinsically use the GPU rather than threads on the CPU. If a program does not accept the --parallel |
eman2/parallel_threaded.1751500787.txt.gz · Last modified: by steveludtke