Diff for "EMAN2/Parallel"

Differences between revisions 3 and 23 (spanning 20 versions)

Parallel Processing in EMAN2

EMAN2 uses a modular strategy for running commands in parallel. That is, you can choose different ways to run EMAN2 programs in parallel, depending on your environment. We now support 3 distinct methods for parallelism, and each has its own page of documentation.

Which option is best ? If you are running on a single machine/node, then Threaded is by far the most efficient option, and the easiest to use as well. If you are running on a few nodes on a single cluster, I would suggest MPI as probably the easiest option, and the one that will cause your sysadmin the fewest headaches, but this may not be true on all clusters. DC is most appropriate when you are trying to use multiple independent computers, or combine the resources from multiple clusters. In a sense it is the most flexible, as nodes can be added and removed during the job at any time and DC will make efficient use of what's available at any moment in time. However, it takes a lot more work to use it, is somewhat complicated, and the network policies on some clusters will not permit its use.

Please follow the appropriate link:

Threaded - This is for use on a single computer with multiple processors (cores). For example, the Core2Duo processors of a few years ago had 2 cores. In 2010, individual computers often have single or dual processors with 2, 4 or 6 cores each, for a total of up to 12 cores. EMAN2 can make very efficient use of all of these cores, but this mode will ONLY work if you want to run on a single computer.
MPI - This is the standard parallelism method used on virtually all large clusters nowadays. It will require a small amount of custom installation for your specific cluster, even if you are using a binary distribution of EMAN2. Follow this link for more details
Distributed - This was the original parallelism method developed for EMAN2. It can be used on anything from sets of workstations to multiple clusters, and can dynamically change how many processors it's using during a single run, allowing you, for example, to make use of idle cycles at night on lab workstations, but reduce the load during the day for normal use. It is very flexible, but requires a bit of effort, and a knowledgeable user to configure and use.

Note : All 3 parallelism options have been fully supported and stable since early 2011. Both MPI and DC have been tested on jobs using at least 256 cores, for multiple days, and are in routine use on large refinement jobs at multiple sites. That said, DC and MPI can both take a little effort to establish on a new system, particularly if you have no past experience with cluster computing. We are happy to help if you have difficulties.

-  ⇤ ← Revision 3 as of 2009-05-21 16:28:50 → 
  Size: 4313
  Editor: SteveLudtke
  Comment:
+   ← Revision 23 as of 2011-09-03 01:23:15 → ⇥
  Size: 2829
  Editor: SteveLudtke
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 4:
-environment. Unfortunately, as of May, 2009, the parallelism infrastructure is just beginning to come together. This should be gradually fleshed out over
summer 2009. At the moment, only one parallelism infrastructure is fully functional.
+environment. We now support 3 distinct methods for parallelism, and each has its own page of documentation.
-Line 7:
+Line 6:
-Programs with parallelism support will take the --parallel command line option as follows:
+Which option is best ?  If you are running on a single machine/node, then Threaded is by far the most efficient option,
and the easiest to use as well. If you are running on a few nodes on a single cluster, I
would suggest MPI as probably the easiest option, and the one that will cause your sysadmin
the fewest headaches, but this may not be true on all clusters. DC is most appropriate when you
are trying to use multiple independent computers, or combine the resources from multiple clusters. 
In a sense it is the most flexible, as nodes can be added and removed during the
job at any time and DC will make efficient use of what's available at any moment in time.
However, it takes a lot more work to use it, is somewhat complicated, and the network policies on
some clusters will not permit its use.
-Line 9:
+Line 16:
---parallel=<type>:<option>=<value>:<option>=<value>:...
+Please follow the appropriate link:
-Line 11:
+Line 18:
-for example, for the distributed parallelism model: ''--parallel=dc:localhost:9990''
+ * [[EMAN2/Parallel/Threaded|Threaded]] - This is for use on a single computer with multiple processors (cores). For example, the Core2Duo processors of a few years ago had 2 cores. In 2010, individual computers often have single or dual processors with 2, 4 or 6 cores each, for a total of up to 12 cores. EMAN2 can make very efficient use of all of these cores, but this mode will ONLY work if you want to run on a single computer.
 * [[EMAN2/Parallel/Mpi|MPI]] - This is the standard parallelism method used on virtually all large clusters nowadays. It will require a small amount of custom installation for your specific cluster, even if you are using a binary distribution of EMAN2. Follow this link for more details
 * [[EMAN2/Parallel/Distributed|Distributed]] - This was the original parallelism method developed for EMAN2. It can be used on anything from sets of workstations to multiple clusters, and can dynamically change how many processors it's using during a single run, allowing you, for example, to make use of idle cycles at night on lab workstations, but reduce the load during the day for normal use. It is very flexible, but requires a bit of effort, and a knowledgeable user to configure and use.
-Line 13:
+Line 22:
-=== Local Machine (multiple cores) ===
Not yet implemented, please use Distributed Computing

=== Distributed Computing ===

==== Introduction ====
This is the sort of parallelism made famous by projects like SETI-at-home and Folding-at-Home. The general idea is that you have a list of small jobs to do,
and a bunch of computers with spare cycles willing to help out with the computation. The number of computers willing to do computations may vary with time, and
possibly may agree to do a computation, but then fail to complete it. This is a very flexible parallelism model, which can be adapted to both individual computers
with multiple cores as well as linux clusters, or sets of workstations laying around the lab.

There are 3 components to this system:

User Application (customer) <==> Server <==> Compute Nodes (client)

The user application builds a list of computational tasks that it needs to have completed, then sends the list to the server. Compute nodes with nothing to do then
contact the server and request tasks to compute. The server sends the tasks out to the clients. When the client finishes the requested computation, results are sent
back to the server. The user application then requests the results from the server and completes processing. As long as the number of tasks to complete is larger than the
number of clients servicing requests, this is an extremely efficient infrastructure.

Internally things are somewhat more complicated and tackle issues such as data caching on the clients, how to handle clients that die in the middle of processing, etc., but
the basic concept is quite straightforward.

==== How to use Distributed Computing in EMAN2 ====
To use distributed computing, there are three basic steps:
 * Run a server on a machine that the clients can communicate with
 * Run some number of clients pointing at the server
 * run an EMAN2 program with the --parallel option

===== Using DC on a single multi-core workstation =====
 * Ideally your data will be stored on a hard drive physically connected to the workstation (not on a shared network drive)
 * Run a server on the workstation ''e2parallel.py dcserver'' 
 * The server will print a message saying what port it's running on. This will usually be 9990. If it is something else, make a note of it.
 * Run one client for each core you want to use for processing : ''e2parallel.py dcclient --server=localhost --port=9990'' (replace the port with the correct number if necessary)
 * Run your EMAN2 programs with the option ''--parallel=dc:localhost:9990'' (again, use the right port number)

===== Using DC on a linux cluster =====
 * The server should run on the node (often the head node or a specialized 'storage node') with a direct physical connection to the storage
 * If you want to use clients from multiple clusters, then remember all of the clients must be able to make a network connection to the server machine
 * Run a server on the workstation ''e2parallel.py dcserver'' 
 * The server will print a message saying what port it's running on. This will usually be 9990. If it is something else, make a note of it.
 * Run one client for each core you want to use for processing : ''e2parallel.py dcclient --server=<server> --port=9990'' (replace the server hostname and port with the correct values)
 * Run your EMAN2 programs with the option ''--parallel=dc:<server>:9990'' (again, use the right port number and server hostname)


=== MPI ===
Sorry, we haven't had a chance to finish this yet. For the moment you will have to use the Distributed Computing mode on clusters.
+Note : All 3 parallelism options have been fully supported and stable since early 2011. Both MPI and DC have been tested on jobs using at least 256 cores,
for multiple days, and are in routine use on large refinement jobs at multiple sites. That said, DC and MPI can both take a little effort to establish on 
a new system, particularly if you have no past experience with cluster computing. We are happy to help if you have difficulties.