JSON Files

(BDB Replacement)

JSON files replace the much despised BDB database mechanism for storing metadata in EMAN2. These files offer a number of advantages over BDB, but there are also a few tradeoffs.

Advantages:

Tradeoffs:

Command Line Program

e2procjson.py can be used to perform a range of manipulations on JSON files. Use --help for a list of options.

Basic Python Usage

The main object is the JSDict class, which provides a dictionary-like access to the .json file. Each instance of this class represents a single file on disk with a '.json' extension.

The JSDict class acts much like a standard python dictionary, once opened. Default behavior is to sync with the file on disk (only if necessary) on each read or write, giving it a high level of persistence and making it feasible to use in multi-process and shared-filesystem environments. However, this scheme can be extremely inefficient, so mechanisms exist for deferred writing of changes and reading without checking the file for changes (though this second task is fairly inexpensive anyway. In addition to all of the standard dictionary methods:

Examples

Write some metadata to a JSON file

js = js_open_dict("info/mytest.json")
js["key1"] = 123.5
js["key2"] = "alphabet"
js["key3"] = test_image()

print js["key1"]
display(js["key3"])

But see below for possible efficiency issues.

Convert a BDB to a JSON file

Pretty trivial:

a = db_open_dict("bdb:refine_01#register")
b = js_open_dict("refine01/register.json)

b.update(a)

Now try looking at register.json in a text editor. You'll see that it is nicely formatted text, and can be edited by hand. Formatting is not required for a valid file, and if you make changes by hand that break the formatting, the next time a program changes something it will get automatically reformatted for you.

Gotacha - All keys must be strings !

e2.py
js = js_open_dict("test.json")
js[5] = "testing"    # this will work, but 5 will be converted to a string
print js[5]          # this will also work at the top-level
js["5"] = "new test" # this will replace the original !
print js[5]          # now "new test"

js["test"]={1:2,2:3,3:4}  # This is where the real danger lies !
print js["test"]          # exactly what we put in, BUT
js[5] = 1                 # this change triggers a re-sync with the JSON file
print js["test"]          # note all of the keys have been converted from integers to strings !

Bottom line, if you are storing dictionaries in JSON files, make sure you have converted all keys to strings yourself, or you will get very odd behavior. This is only for keys on dictionaries. Dictionary values, list items, and any other Python class stored in a JSON file will be preserved without change !

JSON write performance (large files)

Consider this:

from EMAN2 import *

a=range(1000)
d=js_open_dict("tst.json")

for i in range(500): 
        d[i]=a
        print i

now consider this:

from EMAN2 import *

a=range(1000)
d=js_open_dict("tst.json")

for i in range(500): 
        d.setval(i,a,deferupdate=True)
        print i

Both produce exactly the same tst.json file in the end, but the first version takes almost 2 minutes to run as compared to 0.5 seconds for the second version. Of course, if another program were to try and access the file during that 0.5 seconds, it wouldn't see any of the changes...

Eman2JSStorage (last edited 2023-09-29 11:40:22 by SteveLudtke)