Eman2BdbStorage - EMAN Wiki

Using the database from Python (for programmers or advanced users)

The normal method for accessing image data on disk is using the read_image, read_images and write_image methods, for example:

   1 # e2.py
   2 img=EMData()
   3 img.read_image("test.hdf",5)  # reads the 6th image from test.hdf (first image is 0)
   4 img.write_image("test2.hdf",-1)   # appends (-1) the image to the end of test2.hdf
   5 img_list=EMData.read_images("test.hdf",range(50))   # reads the first 50 images from test.hdf into a list of EMData objects
   6 n=EMUtil.get_image_count("test.hdf")   # counts the number of images in test.hdf

When writing to a (typically) 8 bit file format, like JPEG, PNG, PGM, the floating point values in the image need to be converted to an 8 bit scale. By default this is done with an algorithm that exludes outliers (ie - it doesn't span the full range of the image). To override this behavior, set the dictionary elements "render_min" and "render_max" on the image to be saved, and the specified range will be used instead. Here is a simple example:

a=test_image()
a["render_min"]=a["minimum"]
a["render_max"]=a["maximum"]
a.write_image("a.png")

File i/o can also be performed with databases, such as :

   1 img.read_image("bdb:test",5)
   2 img.write_image("bdb:test2",-1)

However, this is not the preferred mechanism for using the database interface, since there are many more powerful operations which can be performed. Such as:

   1 e2.py    # This implicitly performs a 'from EMAN2db import *', which opens the local environment: DB=EMAN2DB.open_db()
   2 testdb = db_open_dict("bdb:test")       # this opens a specific database in the local directory called "test"
   3 testdb[0]=test_image()    # stores an EMData object in the 'test' database
   4 img=testdb[0]             # This reads the EMData object back from the database
   5 testdb.set_attr(0,"mykey",5.5)   # This sets an attribute "mykey" on EMData keyed 0 in database 'test'
   6                                   # This operation is MUCH faster than doing the same thing with any
   7                                   # flat file
   8 testdb.get_attr(0,"mykey")       # This retrieves an attribute of image 0 from database test without
   9                                   # loading the image data
  10 testdb["testimg"]=test_image()   # Keys in the database need not be integers, though the
  11                                   # read_image, etc. methods can only access integer keys
  12 testdb["alist"]=[1,2,3,4,5]      # You can also use the 'test' database to store arbitrary other
  13                                   # metadata, not just images. This assigns a list to key 'alist'
  14 db_close_dict("test")             # While database will be cleanly closed automatically, except for
  15                                   # cases where python is forcibly terminated (^c is ok), it isn't
  16                                   # a bad idea to close them if you know you won't use them again

Basically, each database object can be treated as a python dictionary. Any Python object that can be pickled (almost any python object) can be stored as a value in these dictionaries. It is even possible to mix images of different sizes within a single object.

The attribute mechanism (set_attr, get_attr) is tied into the EMData object attribute dictionary. That is, the following operations are functionally equivalent, but the second version is MUCH faster.

   1 img=testdb[0]
   2 img.set_attr("mykey",5.5)
   3 testdb[3]=img
   4 # OR
   5 DB.test.set_attr(0,"mykey",5.5)

Unlike python dictionaries, if a value in the database is an object, changing the object does not result in writing the change back to the database, unless you explicitly write it again. For example:

   1 # With a dictionary
   2 test={1:["a","b","c"],2:3}
   3 test[1][1]="c"
   4 print test[1]
   5 ["a","c","c"]
   6 # With a database
   7 testdb = db_open_dict("bdb:test")
   8 testdb[1]=["a","b","c"]
   9 testdb[2]=3
  10 testdb[1][1]="c"    # This effectively does nothing
  11 print testdb[1]
  12 ["a","b","c"]
  13 # To make the above actually work
  14 d=testdb[1]
  15 d[1]="c"
  16 testdb[1]=d

You can write/read the full header for an EMData object inexpensively with:

   1 testdb[2]=test_image()
   2 hdr=testdb.get_header(2)   # returns the equivalent of get_attr_dict on an EMData object
   3 #If DB is associated with the disk database, get header requires an argument (image number).
   4 hdr["apix_x"]=2.0
   5 testdb.set_header(2, hdr)    # hdr can be either a dictionary or and EMData object

There is a small cost associated with opening each database, so it is generally a good idea for performance purposes to open the database and only close it if you aren't expecting to use it again for some time.