eman2:eman2hdf
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
eman2:eman2hdf [2025/07/04 17:28] – steveludtke | eman2:eman2hdf [2025/07/04 19:30] (current) – steveludtke | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== HDF5 ====== | + | This page has [[eman2:hdf5format|moved]] |
- | [[https:// | + | |
- | + | ||
- | The advantage of HDF5 over other formats used in CryoEM is that it can store arbitrary metadata (header information) with every image, and can store stacks of images of any dimensionality using any number format (8 - 32 bit int, floating point, etc.) | + | |
- | + | ||
- | If you convert images from some other file format into HDF5 in EMAN2, it will retain most of the header information from the other format following [[EMAN2: | + | |
- | + | ||
- | ==== EMAN2 HDF5 Specifications ==== | + | |
- | Note, this is still a very rough draft provided so other implementers have at least something to go on... | + | |
- | + | ||
- | HDF5 is an interdisciplinary file format standard used by a wide range of scientific communities to represent N-dimensional data efficiently and accurately. EMAN2 uses this format by default for all data storage, however, the format is extremely flexible, so it is necessary to define the conventions used within the file. | + | |
- | + | ||
- | We followed a draft standard for interdisciplinary image storage in HDF when we implemented it, and my hope was that it would be some sort of official or semi-official standard by now, but I don't think that's happened yet. However, the specifications are pretty simple. HDF files are structured much like a filesystem, with a GROUP representing a folder, an ATTRIBUTE representing a single piece of metadata, and a DATASET containing actual data: | + | |
- | < | + | |
- | HDF5 " | + | |
- | GROUP "/" | + | |
- | GROUP " | + | |
- | GROUP " | + | |
- | | + | |
- | DATATYPE | + | |
- | DATASPACE | + | |
- | DATA { | + | |
- | (0): 70 | + | |
- | } | + | |
- | } | + | |
- | GROUP " | + | |
- | ATTRIBUTE " | + | |
- | | + | |
- | | + | |
- | DATA { | + | |
- | (0): 2.1 | + | |
- | } | + | |
- | } | + | |
- | ... other attributes | + | |
- | + | ||
- | DATASET " | + | |
- | | + | |
- | | + | |
- | DATA { | + | |
- | | + | |
- | | + | |
- | ... rest of image | + | |
- | + | ||
- | } | + | |
- | GROUP " | + | |
- | ... | + | |
- | + | ||
- | </ | + | |
- | + | ||
- | That is: | + | |
- | * there is a top level GROUP called " | + | |
- | * inside that is a GROUP called " | + | |
- | * which contains a single integer attribute " | + | |
- | * 0 is always the lowest numbered image | + | |
- | * There is no guarantee that all of 0-n will be present at all times, but no image >n will be present. | + | |
- | * following this are the actual images. | + | |
- | * each image is a GROUP with an integer name. | + | |
- | * that group contains a list of named attributes | + | |
- | * all of the attributes used in EMAN2 should be listed here: [[http:// | + | |
- | * the image attributes defined by EMAN are prefixed with " | + | |
- | * we request that others making use of this specification update this page if they add their own metadata items (ask for edit permission or email updates) | + | |
- | * at least apix_x, apix_y, and apix_z are recommended | + | |
- | * " | + | |
- | * followed by a DATASET, containing the actual image data | + | |
- | + | ||
- | + | ||
- | If something I said above is ambiguous, you can take an EMAN2 HDF5 file and run h5dump on it, and it will give you human-readable output. | + | |
- | + | ||
- | Also, please note that there was an earlier HDF convention EMAN1 used for a while in the early 2000s, which didn't follow this standard. EMAN2 will still read the old format, but always writes the new format. Chimera is capable of reading this format, but also supports another simpler HDF structure, which it will write by default. | + | |
- | + | ||
- | ==== Data Compression ==== | + | |
- | Data compression is now used in all of the EMAN2 data processing pipelines by default. HDF5 includes native, transparent support for standard lossless GZIP compression. EMAN2 combines this with explicit bit-reduction (we have a paper on this pending), where the truncated bits should be pure noise. While any HDF5 reading software can transparently read these compressed files, without processing the resulting pixel values will be integers rather than the (typical) floating point values. EMAN2 stores 4 header parameters when writing compressed data (also documented on the parameter page): EMAN.stored_rendermin, | + | |
- | + | ||
- | Stored integer values can be converted to the original floating point values using the fooling formula: | + | |
- | + | ||
- | float=(stored_integer_pixel/ | + | |
- | + | ||
- | This process is handled automatically and transparently when reading HDF5 files in EMAN2. That is, when writing a floating point image to a compressed HDF5 file in EMAN2, the data will be scaled to its original values upon reading, though, clearly the histogram will be more discrete than the original data. The formula above is provided for those wishing to support EMAN2 compressed images optimally in their own software. | + | |
- | + | ||
- | When compressing, | + | |
- | + | ||
- | + | ||
- | Please let me know if you need any more information (sludtke@bcm.edu)... | + |
eman2/eman2hdf.1751650129.txt.gz · Last modified: by steveludtke