Differences between revisions 3 and 4
Revision 3 as of 2013-05-22 02:54:03
Size: 5651
Editor: SteveLudtke
Comment:
Revision 4 as of 2013-05-24 14:14:26
Size: 5655
Editor: SteveLudtke
Comment:
Deletions are marked like this. Additions are marked like this.
Line 14: Line 14:
 * Speed. Not really designed for very large 'databases' of information. While you could have a dictionary with 1,000,000 in it, deferred writing would be critical to maintaining any sort of performance.  * Speed. Not really designed for very large 'databases' of information. While you could have a dictionary with 1,000,000 items in it, deferred writing would be critical to maintaining any sort of performance.
Line 88: Line 88:

JSON Files

(BDB Replacement)

JSON files replace the much despised BDB database mechanism for storing metadata in EMAN2. These files offer a number of advantages over BDB, but there are also a few tradeoffs.

Advantages:

  • Human-readable, and human-editable.
  • Can be renamed, deleted, copied, etc, just like any other file.
  • Standard file format, a subset of JavaScript, so interfaces easily with the web.

  • Persistance & threadsafety. Through use of file-locking it should be safe to read a single JSON file from multiple threads/processes.

Tradeoffs:

  • Speed. Making any change to a JSON file requires re-writing the entire file. If writing is deferred, then other processes won't see the changes until the write actually happens.
  • Speed. Not really designed for very large 'databases' of information. While you could have a dictionary with 1,000,000 items in it, deferred writing would be critical to maintaining any sort of performance.
  • Not designed for images. While it is possible to store images and/or any other (pickleable) object in a JSON file, since opening the file results in reading the ENTIRE file, storing a large stack of images in JSON format is not advisable. Use HDF instead when you need to store images.
  • While technically, multiple writers should be safe, if multiple processes are all writing to .js files at high speed, it is conceivable that there could be some corruption. We have not yet ever observed this happening, but it isn't completely impossible.

Basic usage

The basic methods for accessing JSON files (which should be fairly familiar to those with BDB experience):

The main object is the JSDict class. An instance of this class represents a single file on disk with a '.js' extension.

  • js_open_dict(filename)

    • Opens a JSON file as a dict-like database object (JSDict). Writes to JDB dictionaries are inefficient unless deferred writing is used. Default behavior is to write the entire dictionary to disk when any element is changed. File locking is attempted to avoid conflicts, but may not work in all situations. This mechansim should be While it is possible to store images in JSON files it is not recommended due to inefficiency, and making files which are difficult to read.

  • js_close_dict(filename)

    • This does not need to be called explicitly, but will free some resources associated with the database. Not associated with closing a file pointer.
  • js_remove_dict(filename)

    • closes and deletes a database using the same specification as db_open_dict. Unlike BDB functions this will actually remove the associated file on disk.
  • js_check_dict(filename, readonly=True)

    • Checks for the existence of the named JSON file and insures that it can be opened for reading [and writing]. It does not check the contents of the file, just for its exsistence and permissions.
  • js_list_dicts(path)

    • Gives a list of readable json files at a given path.

The JSDict class acts much like a standard python dictionary, once opened. Default behavior is to sync with the file on disk (only if necessary) on each read or write, giving it a high level of persistence and making it feasible to use in multi-process and shared-filesystem environments. However, this scheme can be extremely inefficient, so mechanisms exist for deferred writing of changes and reading without checking the file for changes (though this second task is fairly inexpensive anyway. In addition to all of the standard dictionary methods:

  • get(self,key,noupdate=False)

    • This will retrieve a value from the dictionary, exactly like dict[key], but permits skipping the check to see if the file has changed on disk since the last access.
  • setval(self,key,val,deferupdate=False)

    • This will set a value in the dictionary. This is identical to dict[key]=value, unless deferupdate is set, in which case the change is made in memory, but not immediately committed back to the JSON file on disk. To commit changes made with deferupdate set, either call sync(self) or make another change without deferupdate set. All changes are committed at the next sync(self).

  • update(self,otherdict)

    • Just like the normal dictionary update method, but will only do one sync(self) at the end of the update.

  • delete(self,key,deferupdate=False)

    • As with setval you can defer the actual key deletion in the file on disk.

  • Note: remember that while any pickleable object can be stored in a JSDict, storing a stack of 10,000 images probably isn't a very good idea. Use HDF files for anything other than incidental image storage.
  • Note: If two processes are changing the JSON file at the same time

Examples

Convert a BDB to a JSON file

Pretty trivial:

a = db_open_dict("bdb:refine_01#register")
b = js_open_dict("refine01/register.js)

b.update(a)

Make a large JSON file efficiently

Consider this:

from EMAN2 import *

a=range(1000)
d=js_open_dict("tst.js")

for i in range(500): 
        d[i]=a
        print i

now consider this:

from EMAN2 import *

a=range(1000)
d=js_open_dict("tst.js")

for i in range(500): 
        d.setval(i,a,deferupdate=True)
        print i

Both produce exactly the same tst.js file in the end, but the first version takes almost 2 minutes to run as compared to 0.5 seconds for the second version. Of course, if another program were to try and access the file during that 0.5 seconds, it wouldn't see any of the changes...

Eman2JSStorage (last edited 2023-09-29 11:40:22 by SteveLudtke)