Table of supported image formats in EMAN2

Type

Extension

Read

Write

3D

Image Stacks

Volume Stacks

Bit Trunc.

Region I/O

Comments

Primary EMAN2 Format

HDF5

hdf

Y

Y

Y

Y

Y

Y

Y

HDF5 is an international standard for scientific data (http://www.hdfgroup.org/HDF5/). It supports arbitrary metadata (header info) and is very portable. This is the standard interchange format for EMAN2. Chimera can read EMAN2 style HDF files.

LST

lst

Y

Y

Y

Y

N

N

(see below for usage tips) ASCII file contains a list of image file names and numbers with optional metadata. EMAN2 treats these files as actual image files. Two variants, LST and LSX. LSX has the additional restraint that all lines have the same length. Can be manipulated with the LSXFile clas in Python or can be treated like any other image file, eg - opened with EMData.read_image

Cryo-EM Formats

DM2 (Gatan)

dm2

Y

N

N

N

N

N

N

Proprietary Gatan format (older version)

DM3 (Gatan)

dm3

Y

N

N

N

N

N

Proprietary Gatan format from Digital Micrograph

DM4 (Gatan)

dm4

Y

N

Y

Y

N

N

Proprietary Gatan format from Digital Micrograph, used with K2 cameras

SER (FEI)

ser

Y

N

N

Y

N

N

Proprietary FEI format (Falcon camera ?)

EER (TF)

eer

Y

N

N

Y

N

N

N

Falcon 4 camera counting mode format. Extremely large frame count with RLE compression to make frames very small. Supports up to 4x oversampling of counting data. Default reader is without oversampling. See below for details.

EM

em

Y

Y

Y

N

N

Y

As produced by the EM software package

ICOS

icos

Y

Y

Y

N

N

Y

Old icosahedral format

Imagic

img/hed

Y

Y

Y

Y

Y

N

Y

This format stores header and image data in 2 separate files. Region I/O is only available for 2D. The Imagic format in EMAN2 is fully compatible with Imagic4D standard since the 2.0 release.

MRC

mrc

Y

Y

N

Y

N

Y

Largely compatible with CCP4. Note that some programs will treat 3D MRC files as stacks of 2D imagess (like IMOD). This behavior is partially supported in EMAN, but be aware that it is impossible to store metadata about each image in the stack when doing this, so it is not suitable as an export format for single particle work. EMAN2 supports reading of FEI MRC, which is an extended MRC format for tomography. The extra header information will be read into the header. All FEI MRC images will be 2-byte integer.

MRCS

mrcs

Y

Y

N

Y

soon?

N

Y

Identical to MRC format above. If the filename is .mrcs, then a 3-D volume file will automatically be treated as a stack of 2-D images. If any other extension is used, it will appear to be a single 3-D volume.

Spider Stack

spi

Y

Y

Y

Y

N

Y

To read the overall image header in a stacked spider file, use image_index = -1.

Spider Single

spi

Y

Y

Y

N

N

Y

Specify "--outtype=spidersingle" to use with e2proc2d/3d

SER

ser

Y

N

N

Y

N

N

Also known as TIA (Emospec) file format, used by FEI Tecnai and Titan microscope for acquiring and displaying scanned images and spectra

PIF

pif

Y

Y

Y

Y

N

N

Purdue Image Format. This will read most, but not all PIF images. Recent support added for mode 40 and 46 (boxed particles). Some of the FFT formats cannot be read by EMAN2. PIF writing is normally done in FLOAT mode, which is not used very often in PIF. PIF technically permits only images with odd dimensions, EMAN does not enforce this.

BDB

N/A

Y

Y

Y

Y

N

Y

This entry is for EMAN2's (retired) embedded database system. While it is still possible to read/write BDB's for backwards compatibility, we do not suggest any new use of this format in EMAN2 (SPARX still uses it for many operations)

Other Supported Formats

Amira

am

Y

Y

Y

N

N

N

A native format for the Amira visualization package

DF3

df3

Y

Y

Y

N

N

N

File format for POV-Ray, support 8,16,32 bit integer per pixel

FITS

fts

Y

N

Y

N

N

N

Widely used file format in astronomy

JPEG

jpg/jpeg

N

Y

N

N

N

N

Note that JPEG images use lossy compression and are NOT suitable for quantitative analysis. PNG (lossless compression) is a better alternative unless file size is of critical importance.

OMAP

omap

Y

N

Y

N

N

N

Also called DSN6 map, 1 byte integer per pixel

PGM

pgm

Y

Y

N

N

N

N

Standard graphics format with 8 bit greyscale images. No compression.

PNG

png

Y

Y

N

N

N

N

Excellent format for presentations. Lossless data compression, 8 bit or 16 bit per pixel

SAL

hdr/img

Y

N

N

N

N

N

Scans-A-Lot. Old proprietary scanner format. Separate header and data file

SITUS

situs

Y

Y

Y

N

N

N

Situs-specific ASCII format on a cubic lattice. Used by Situs programs

TIFF

tiff/tif

Y

Y

N

Y

N

N

Good format for use with programs like photoshop. Some variants are good for quantitative analysis, but JPEG compression should be avoided.

V4L

v4l

Y

N

N

N

N

N

Used by some video-capture boards in Linux. Acquires images from the V4L2 interface in real-time(video4linux).

VTK

vtk

Y

Y

Y

N

N

N

Native format from Visualization Toolkit

XPLOR

xplor

Y

Y

Y

N

N

N

8 bytes integer, 12.5E float ASCII format

Image files in EMAN

Virtually all cryo-EM file formats are supported as well as many generic image formats. The default format used in EMAN2 processing is HDF5, which supports stacks of 2-D and 3-D images as well as arbitrary header information for each image in the file. If you convert an image to a format like MRC, you will lose any metadata not compatible with that format.

Read Images with New-Syntax File Specification

EMAN2 understands a new syntax file name specification for reading. The syntax is

filename[:image_number(s)[^image_numbers_to_exclude]]

where image_numbers can be an integer or a comma-separated list of integers, a Python-like slice specification. Image numbers start from index 0.

out.hdf:5      will read image with index 5
out.hdf:15,21  will read images with index 15 and 21
out.hdf:::2    will read all images with even index
out.hdf:1::2   will read all images with odd index

out.hdf:2:    -> 2,3,...,N-1
out.hdf::3:   -> 0,1,2
out.hdf:::5   -> 0,5,10,15,...,<N
out.hdf::10:2 -> 0,2,4,6,8
out.hdf:::-1  -> N-1,N-2,N-3,...,2,1,0
out.hdf:2:9   -> 2,3,4,5,6,7,8

Compression and Bit Truncation

EMAN2 supports bit truncation with lossless compression as a mechanism for reducing file size without information loss. The file size reductions can be quite dramatic for raw data. Compression is currently supported only in HDF5 format, but bit truncation is supported for all formats, and bit-truncated files can be very effectively (losslessly) compressed with command-line tools, such as gzip, bzip2 or similar. The main use case for bit truncation is gain normalized movie averages being stored as 32 bit floating point values (or even 16 bit integers).

For raw counting mode movie data collected on direct detectors, we recommend using the manufacturer's recommended storage mechanism (compressed TIFF, EER, etc.). Bit truncation is only appropriate for movie averages, not for the individual movie frames.

Writing images with new-synatx notation (Bit Truncation)

Examples:

General recommendations

File format conversions

LST/LSX files

These files are simple text files containing references to particle data in other true image files, along with optional per-particle metadata. They can be used to define subsets of the data in a particle project without having to make a copy of all of the image data. They are functionally similar to the STAR files used in some other software packages, but are more flexible.

LST files can be treated as if they were actual image files (read-only), and any metadata will appear as part of the image header. For example the first few lines of a r3d_00/ptcls_01.lst file look like:

#LSX
# This file is in fast LST format. All lines after the next line have exactly the number of characters shown on the next line. This MUST be preserved if editing.
# 240
0       particles/50ca-ND.hdf   {"class":0,"score":-0.128,"xform.projection":{"__class__":"Transform","matrix":"[0.282377,-0.0917493,-0.954906,-3.5,0.512193,0.856077,0.0692077,10.5,0.811124,-0.508639,0.28873,0]"}}                    
5       particles/APO-ND.hdf    {"class":1,"score":-0.215,"xform.projection":{"__class__":"Transform","matrix":"[0.954661,-0.237476,-0.179519,0,0.0696826,-0.40802,0.91031,0,-0.289424,-0.881547,-0.372973,0]"}}                           
2       particles/CIA-ND.hdf    {"class":0,"score":-0.133,"xform.projection":{"__class__":"Transform","matrix":"[-0.367687,0.219952,-0.903564,-3.5,0.283238,0.951951,0.116473,-42,0.885767,-0.213098,-0.412319,0]"}}                      

The first 3 lines define the file format and line length (for rapid seeking). Next are 3 particles referred from HDF files in the particles/ folder. Each of these images has additional metadata, which will override any metadata stored in the original .HDF file. For example, if the first image in particles/50ca-ND.hdf had score set to 0.1, when reading this LST file, the score would instead be -0.128:

>>> from EMAN2 import *
>>> img=EMData("r3d_00/ptcls_01.lst",0)
>>> print(img["score"])
-0.128
>>> print(a["xform.projection"])
Transform({'az':57.909,'alt':73.218,'phi':-85.855,'tx':-3.50,'ty':10.50,'tz':0.00,'mirror':0,'scale':1.0000,'type':'eman'})

Note that this Transform object supports all common CryoEM conventions and can be used to do mathematical manipulations like "what is the orientation difference between these two Transforms".

Manipulating the LST file itself rather than reading the referenced image can be accomplished easily with the LSXFile class, which handles all issues related to line-length, etc.

>>> lsx=LSXFile("r3d_00/ptcls_01.lst")
>>> print(len(lsx))
399000
>>> print(lsx[0])
[0, 'particles/50ca-ND.hdf', {'class': 0, 'score': -0.128, 'xform.projection': Transform({'az':57.909,'alt':73.218,'phi':-85.855,'tx':-3.50,'ty':10.50,'tz':0.00,'mirror':0,'scale':1.0000,'type':'eman'})}]

# this will update the header for the first particle (class=99)
>>> lsx[0]=[0, 'particles/50ca-ND.hdf', {'class': 99, 'score': -0.128, 'xform.projection': Transform({'az':57.909,'alt':73.218,'phi':-85.855,'tx':-3.50,'ty':10.50,'tz':0.00,'mirror':0,'scale':1.0000,'type':'eman'})}]

# note that assigning to [-1] will append to the end of the file:
>>> lsx[-1]=[12,'particles/xyz.hdf',{"class":3}]
>>> print(len(lsx))
399001

help() can provide additional information.

Special issues for MRC/MRCS/CCP4 files

Stack files

MRC/CCP4 format supports a single 1-D, 2-D or 3-D image, with an associated header. At some point in time, someone decided it would be a good idea to store sets of 2-D particle images as "stacks" in 3-D. That is, a set of NZ identically sized NX x NY images are stacked to make a single 3-D pseudo-volume image. The problem is that the original format was not designed for this, and historically there was no consistent way a program could tell if an MRC file contains a true volume (like a 3-D reconstruction) or a stack of 2-D images. While a number of developers have recently agreed upon a standard way of doing this in future, the last 30 years of files floating around in the community don't have this information stored in a consistent way. As of EMAN2.1, stack files should use the ".mrcs" extension and single volumes should use the ".mrc" extension. ".mrc" files will always be read as if they contain a single image, and ".mrcs" files can never be 3-D. This may evolve in the future as the new standards become more refined.

Additionally, there are options in the e2proc2d.py command which will treat single volume files as stacks of images without the .mrcs extension (again, if you just use the .mrcs extension, these methods should not be required) :

These options can also be used with other file formats.

Old IMOD 8 bit MRCs

For many years IMOD used the opposite signed vs. unsigned convention for 8 bit MRC files from everyone else. This was fixed sometime in the 2010s, but if you have MRC images/volumes created with one of the older software versions, and read them in other MRC compatible software (like EMAN2), you may find odd contrast with black regions in the middle of white regions or vice-versa. There is a processor available in EMAN2, which can fix this issue:

math.fixmode             :  byte_stou(BOOL)   byte_utos(BOOL)

This can be used with e2proc2d/3d or from Python. eg-

e2proc3d.py imodtomo.mrc imodtomo_fixed.hdf:8 --process math.fixmode:byte_utos=1

Special issues for EER files

EER is a specialized format for the Falcon4 camera which records actual pixel events at a very high effective framerate. Individual frames are RLE encoded, so despite storing up to 4x superresolution counting mode images, file size is still smaller than a gain corrected MRC stack averaged to 30 FPS.

The default reader will operate without oversampling. You will need to specify an option, --eer2x or --eer4x, with e2proc2d.py to read super-resolution data instead (8k x 8k or 16k x 16k).

To make use of these files you normally also need to have the appropriate gain reference image from the Falcon 4, which at the time of this writing is stored in FEIRAW format. There is a program in examples, which can convert FEIRAW files to any other format you like, but they aren't natively read by other EMAN2 programs.

Several bugs were fixed in mid-October 2020, so it is critical that you use a version dated 10/22/20 or later!

Here is a workflow for processing gain references with EER files:

   1 examples/feiraw2hdf.py gain_post_ec_eer.raw gain_post_ec_eer.hdf
   2 e2proc2d.py gain_post_ec_eer.hdf gain_norm.hdf --process math.reciprocal
   3 e2proc2d.py myimage.eer moviestack.mrcs --avgseq 60
   4 e2proc2d.py moviestack.mrcs moviestack.mrcs --inplace --mult gain_norm.hdf

Reading and Writing images in Python (for programmers)

The main image object in EMAN2 is called EMData(). EMData objects represent an image in an arbitrary file format in the computer's memory, with arbitrary associated tag-based metadata.

Simple Image Reading/Writing

The specification for reading/writing images is:

   1 # note that optional arguments [ ] below require all previous arguments to be specified
   2 
   3 # Read multiple images at once, class-method
   4 imagelist=EMData.read_images(filename,[image#_list],[header_only])
   5 
   6 # Read a single image
   7 img=EMData()
   8 img.read_image(filename,[image#],[header_only],[Region])
   9 # or
  10 img=EMData(filename,[image#],[header_only])
  11 
  12 
  13 # write a single image
  14 EMData.write_image(filename,image#,[filetype],[header_only],[Region],[Datatype])

Where filename, is the name of the file containing the image data, in any supported format, image# is the zero-indexed image number within the file, image#_list is a python list or tuple of image numbers, header_only is a boolean flag indicating that only the header should be read/written from/to the file, Region is a Region(x0,y0,xsize,ysize) or Region(x0,y0,z0,xsize,ysize,zsize) object.

Filetype can be : IMAGE_UNKNOWN, IMAGE_AMIRA, IMAGE_IMAGIC, IMAGE_PIF, IMAGE_SPIDER, IMAGE_VTK, IMAGE_DM3, IMAGE_GATAN2, IMAGE_LST, IMAGE_PNG, IMAGE_TIFF, IMAGE_XPLOR, IMAGE_DM4, IMAGE_HDF, IMAGE_MRC, IMAGE_SAL, IMAGE_EM, IMAGE_ICOS, IMAGE_PGM, IMAGE_SINGLE_SPIDER, IMAGE_V4L. If IMAGE_UNKNOWN is used on write, then the file extension will be used to determine the filetype. Note that since MRC format does not distinguish between 3-D volumes and stacks of 2-D images, the '.mrcs' extension MUST be used for stack files, and the '.mrc' extension MUST be used for non-stack volume data.

Datatype can be: EM_CHAR, EM_FLOAT, EM_INT, EM_UINT, EM_USHORT, EM_DOUBLE, EM_FLOAT_COMPLEX, EM_SHORT, EM_UCHAR, EM_USHORT_COMPLEX, EM_SHORT_COMPLEX. While SHORT_COMPLEX types are defined, they should never be actually used. FLOAT_COMPLEX is only really usable for HDF files. Strongly suggest not reading/writing complex images, and simply recomputing the FFT instead. Not all file formats support all data types!

If no image#_list is specified to read_images, then ALL images in the file will be read in.

   1 # Create a new EMData object and initializes it with the first image in "myimage.hdf".
   2 # This will work with any supported file format, not just HDF
   3 img=EMData("myimage.hdf")
   4 
   5 # Replace the data in EMdata object 'img' with the 3rd image from "myimage.hdf" (the first is #0)
   6 img.read_image("myimage.hdf",2)
   7 
   8 # Write an EMData object to disk as the 3rd image in "image.hdf"
   9 img.write_image("image.hdf",2)
  10 
  11 # Read all of the images from the SPIDER stack file (also works with single image files) "test.spi"
  12 # lst will become a list of EMData objects
  13 lst=EMData.read_images("test.spi")
  14 
  15 # Count the number of images available in a stack file
  16 n=EMUtil.get_image_count("myimage.hdf")
  17 
  18 # Create a new EMData object with ONLY HEADER INFORMATION from the 5th image
  19 # in the "myimage.hdf" stack file. Any image processing operations on this object
  20 # will cause EMAN2 to crash, because it doesn't have data loaded for the actual image.
  21 # This can be useful when all you need is the header information from a bunch of images.
  22 hdr=EMData("myimage.hdf",4,True)

Region I/O

Region I/O permits reading or writing sub-images/volumes from within a file. It is not supported for all file formats. This is useful when processing huge files (like full 4k tomograms) on machines with limited RAM. For region reading, it is possible to specify a Region extending outside the actual image dimensions, though this generally isn't a good idea. For region writing, the region must be completely inside image bounds.

   1 # Read a subimage with origin (1,1,1) and size 8x8x8
   2 img = EMData
   3 region = Region(1,1,1,8,8,8)
   4 img.read_image("3dimage.hdf",0,False,region)

Storage type

Internally EMAN2 stores all images as 32-bit (single precision) floating point. Many file formats also support other storage modes. The various formats are defined in a dictionary imported from EMAN2.py: file_mode_map. There is also a file_mode_range dictionary which contains the numeric limits for each type. If you set the header values renfer_min and render_max in each image before writing, this will control how the float data is scaled to the specified mode. ie - if render_min is 0 and render_max is 1.0, then the 0-1 range in the internal image will be mapped to the full available scale of (integer mode) output formats. Note also that not all file formats support all modes.

Here are some examples of how to write in alternative formats:

img = EMData(128,128)
img.write('float-image.mrc')  #by default, image will be write as float
img.write_image('short-image.mrc', 0, IMAGE_MRC, False, None, EM_SHORT) #write mrc file in short (16bit)
img.write_image('byte-image.mrc', 0, IMAGE_MRC, False, None, EM_UCHAR) #write mrc file in byte (8bit)
img.write_image('byte-image.spi', 0, IMAGE_UNKNOWN, False, None, EM_FLOAT) #write mrc file in byte (8bit)

EMAN2/ImageFormats (last edited 2024-02-01 04:08:01 by SteveLudtke)