Diff for "EMAN2PythonStyleGuide"

Differences between revisions 12 and 13

EMAN2 Python Programming Style Guide

This document describes the structure of programs distributed with EMAN2. If you plan to write a program for inclusion with EMAN2 in the 'bin' directory, you must follow the following general guidelines.

Python Style

There are some deviations from PEP8, which we believe is a bit dated, and has a few simply bad suggestions.

<TAB> must be used uniformly for line indentation. 4 spaces as proposed in PEP8 is NOT permitted in EMAN2. Altering this can cause utter chaos in the GitHub repository. If your editor automatically converts tabs to spaces in a document then you push the change to master it will result in you showing as the last person to change every line of code, making tracing problems nearly impossible.
The line length restriction suggested in PEP8 is NOT recommended. Use of dynamic word wrap in editors is preferred, though comments should be avoided at the end of long lines of text.
Single character index variables are permissible, particularly as loop variables, though some care should be used with respect to nesting. When possible, longer 2-4 letter loop variables with some implicit meaning are preferred.
In general, most variable names should be human-readable with "_" (underscore) used to separate words when necessary.
CamelCase may be used in some cases, particularly GUI code, because this is the convention used by Qt. This can lead to some confusion (was that method called SetData or set_data?), but is unavoidable at this point. Moving forward the underscore separated word approach is preferred.
Excessively long variable names are not encouraged. While this can make code more understandable, it can also make mathematical expressions hard to parse. A happy medium using abbreviations is suggested.
All methods, functions and classes must have a docstring (which will be used by help() in Python). This should take the place of beginning of function comments.
comments are encouraged, at the end of lines for brief comments, or on lines by themselves preceding the line being commented on when longer
Class variables should be used in favor of globals whenever possible. Sometimes globals are unavoidable, but they should be minimized.
Code should be written with threadsafety in mind whenever possible. Most of the EMAN2 C code releases the GIL, so threading is somewhat widespread in the system.
Please make an attempt to find existing functionality within the EMData class, or Processors before rewriting things in Python, even if using NumPy as an alternative for speed. While EMAN and NumPy communicate well with each other, it can occasionally lead to problems.
When working with NumPy, it is generally preferred to go from EMData -> NumPy array rather than the other way around. A image.numpy() will produce a NumPy array which shares memory with the EMData object. When an EMData object is created from a NumPy object a copy of the data is made, and there is no connection!

Program naming and options

All programs must be e2<program>.py This helps distinguish them from non-eman2 programs and from SPARX programs which are sx<program>.py

Where possible, the same options should be used across programs. For example, the '--verbose' option is required for all programs. See StandardParms for details.

Program sample code

This little example shows how all EMAN2 programs are expected to be structured.

Each program should include:

   1 #!/usr/bin/env python
   2 # The first line is critical, and must be exactly this
   3 
   4 # Example Author block:
   5 # Author: Steven Ludtke (sludtke@bcm.edu), 10/27/2010 - rewritten almost from scratch
   6 # Author: David Woolford (woolford@bcm.edu), 9/7/2007 (woolford@bcm.edu)
   7 # Copyright (c) 2000-2010 Baylor College of Medicine
   8 
   9 # Official copyright notice. EMAN2 is distributed under a joint GPL/BSD license. 
  10 # Please copy the actual notice from the top of one of the other EMAN2 programs. 
  11 #
  12 # You must agree to use this license if your
  13 # code is distributed with EMAN2. While you may use your own institution for the copyright notice
  14 # the terms of the GPL/BSD license permit us to redistribute it.
  15 
  16 # import block, any necessary import statements
  17 from EMAN2 import *
  18 import math
  19 
  20 # main() block. Each program will have a single function called main() which is executed when the
  21 # program is used from the command-line. Programs must also be 'import'able themselves, so you
  22 # must have main()
  23 def main():
  24 
  25   progname = os.path.basename(sys.argv[0])
  26   usage = """prog [options]
  27 
  28   This is the main documentation string for the program, which should define what it does and how to use it.
  29   """  
  30 
  31   # You MUST use EMArgumentParser to parse command-line options
  32   parser = EMArgumentParser(usage=usage,version=EMANVERSION)
  33         
  34   parser.add_argument("--input", type=str, help="The name of the input particle stack", default=None)
  35   parser.add_argument("--output", type=str, help="The name of the output particle stack", default=None)
  36   parser.add_argument("--oneclass", type=int, help="Create only a single class-average. Specify the number.",default=None)
  37   parser.add_argument("--verbose", "-v", dest="verbose", action="store", metavar="n",type=int, default=0, help='verbose level [0-9], higher number means higher level of verboseness')
  38 
  39   (options, args) = parser.parse_args()
  40 
  41   # Now we have a call to the function which actually implements the functionality of the program
  42   # main() is really just for parsing command-line arguments, etc.  The actual program algorithms 
  43   # must be implemented in additional functions so this program could be imported as a module and
  44   # the functionality used in another context
  45 
  46   E2n=E2init(sys.argv)
  47 
  48   data=EMData.read_images(options.input)
  49 
  50   results=myfunction(data,options.oneclass)
  51 
  52   for im in results: im.write_image(options.output,-1)
  53 
  54   E2end(E2n)
  55 
  56 def myfunction(data,oneclass):
  57   # do some stuff
  58   ret = [i*5.0 for i in data]
  59 
  60   return ret
  61 
  62 # This block must always be the last thing in the program and calls main()
  63 # if the program is executed, but not if it's imported
  64 if __name__ == "__main__":
  65     main()

-  ⇤ ← Revision 12 as of 2021-06-28 18:33:27 → 
  Size: 3496
  Editor: ErikAnderson
  Comment:
+   ← Revision 13 as of 2022-05-06 18:14:36 → ⇥
  Size: 6286
  Editor: SteveLudtke
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 5:
+=== Python Style ===
There are some deviations from PEP8, which we believe is a bit dated, and has a few simply bad suggestions.
 * <TAB> '''must''' be used uniformly for line indentation. 4 spaces as proposed in PEP8 is NOT permitted in EMAN2. Altering this can cause utter chaos in the GitHub repository. If your editor automatically converts tabs to spaces in a document then you push the change to master it will result in you showing as the last person to change every line of code, making tracing problems nearly impossible. 
 * The line length restriction suggested in PEP8 is NOT recommended. Use of dynamic word wrap in editors is preferred, though comments should be avoided at the end of long lines of text.
 * Single character index variables are permissible, particularly as loop variables, though some care should be used with respect to nesting. When possible, longer 2-4 letter loop variables with some implicit meaning are preferred.
 * In general, most variable names should be human-readable with "_" (underscore) used to separate words when necessary. 
 * CamelCase may be used in some cases, particularly GUI code, because this is the convention used by Qt. This can lead to some confusion (was that method called SetData or set_data?), but is unavoidable at this point. Moving forward the underscore separated word approach is preferred.
 * Excessively long variable names are not encouraged. While this can make code more understandable, it can also make mathematical expressions hard to parse. A happy medium using abbreviations is suggested.
 * All methods, functions and classes '''must''' have a docstring (which will be used by help() in Python). This should take the place of beginning of function comments.
 * comments are encouraged, at the end of lines for brief comments, or on lines by themselves preceding the line being commented on when longer
 * Class variables should be used in favor of globals whenever possible. Sometimes globals are unavoidable, but they should be minimized.
 * Code should be written with threadsafety in mind whenever possible. Most of the EMAN2 C code releases the GIL, so threading is somewhat widespread in the system.
 * Please make an attempt to find existing functionality within the EMData class, or Processors before rewriting things in Python, even if using NumPy as an alternative for speed. While EMAN and NumPy communicate well with each other, it can occasionally lead to problems.
 * When working with NumPy, it is generally preferred to go from EMData -> NumPy array rather than the other way around. A image.numpy() will produce a NumPy array which shares memory with the EMData object. When an EMData object is created from a NumPy object a copy of the data is made, and there is no connection!