Diff for "EMAN2/Eman2Metadata"

Differences between revisions 42 and 54 (spanning 12 versions)

Parameters/Metadata stored in EMData Objects

The EMData object, and its representation on disk in the BDB local database, XML files, and when serialized in Python using 'pickle', supports the concept of arbitrary header parameters also known as metadata. This metadata are key/value pairs. The keys are always simple ascii text, and the values may be virtually anything represented as an EMObject, including simple ints, floats and strings, as well as more complicated objects such as Transform classes. If an EMData object is stored in some fashion other than the 3 mechanisms above, some loss of metadata is almost guaranteed. The file i/o objects will try to preserve some of the basic metadata, but most of the cryoEM formats simply don't support arbitrary header-data. The EMAN2 convention is to use BDB for most internal purposes, and to use HDF5 for data transfer/exchange.

This page will serve as a repository for the officially supported parameter key/value pairs. While you are free to set any metadata keys/values you like in an EMData object, the names listed here may be interpreted in specific ways by specific modules within EMAN, so it would be unwise to abuse them. Also, if you make up your own name for some purpose, it wouldn't hurt to register it here in the 'unofficial' section, to avoid it being used by someone else for a different purpose.

Special tags (read-only for getting image info)

These values are cached and only recomputed if the image changes

nx,ny,nz	int	Dimensions of the image, also available as get_xsize(),etc. Note: Assignments such as e["nx"] = 30 are an (not preferred) alternative to set_size(nx,ny,nz)
minimum	float	Smallest value in the image
maximum	float	Largest value in the image
mean	float	The average pixel value in the image
sigma	float	The standard deviation of the pixel values in the image
square_sum	float	Sum of the squares of the pixel values
mean_nonzero	float	The mean value of all nonzero pixels
sigma_nonzero	float	The standard deviation of the pixels ignoring pixels which are zero
is_complex	int	Flag indicating that the image is complex (R/I or A/P pairs)
is_complex_ri	int	Flag indicating that a complex image is R/I not A/P
changecount	int	An integer which is incremented every time the image is marked as changed
data_path	string	Used only in BDB files, to indicate that the binary data for an image should be read from an alternate location. Data cannot be written back to such objects.

These values are computed on the fly

skewness	float	Skewness of the pixel values
kurtosis	float	Kurtosis of the pixel values
median	float	Median value of the pixel values
nonzero_median	float	Median value of nonzero pixels

Official tags (used in EMAN2/SPARX as distributed):

apix_x,y,z	float	Angstroms per pixel on the x-axis (also _y and _z). If _y or _z are 0 or not present it will be assumed that they are the same as _x. Note that CTF objects have an independent value for A/pix
class_id	int	Set by classification routines to indicate which class number the particle is in
class_ptcl_src	string	In a class-average, this is the file containing the raw images used to create the average
class_ptcl_idxs	tuple	In a class-average, this is a list of particle numbers used in the final average (see class_ptcl_src and exc_class_ptcl_idxs)
ctf	Ctf subclass	A subclass of Ctf containing all CTF parameters
ctf_phase_flipped	bool	Set to true if the CTF phases have been flipped
ctf_wiener_filtered	bool	Set to true if a Wiener filter has been applied
ctf_snr_total	float list	Set in class-averages by some averagers indicating the total estimated radial SNR of the average
data_path	string	Used for virtual stacks. References binary file and location for image data as file*location
data_source	string	Used in virtual stacks. This is a reference back to the source image from which this image was derived
data_n	ing	Used in virtual stacks. This is the image number
eigval	float	Eigenvalue, only set for images which represent Eigenvectors
exc_class_ptcl_idxs	tuple	In a class-average, this is a list of particle numbers provided to the averager, but excluded from the final average (see class_ptcl_src)
match_n	int	used to represent the number of a reference particle this particle best matched
match_qual	float	used to represent the quality associated with match_n, smaller is a better match
microscope_voltage	float	Voltage of the microscope in kV
microscope_cs	float	Cs of the microscope in mm
model_id	int	in a projection during multi-model refinement, this is the index of the model for the current projection. For single model refinements always 0
projection_image	string	In a class-average, this represents the image file which was used for initial alignment references
projection_image_idx	string	In a class-average, this represents the specific image number in projection_image
ptcl_repr	int	If an image/volume represents the combination of one or more other images, this is the count of the number of particles that went into the average
ptcl_helix_coords	tuple	The two endpoints and a box width that defines a helix box (x1, y1, x2, y2, box_width)
ptcl_source_coord	tuple	The central coordinate of a boxed particle in terms of its source image, normally (x,y), may be (x,y,z) for subtomograms
ptcl_source_image	string	The name of the image from which the particle was extracted. Full path, may be in bdb syntax
reconstruct_norm	float	Normalization factor applied to a single projection/class-average during reconstruction
reconstruct_qual	float	Quality of a single projection/class-average relative to others during reconstruction. Unlike with comparators, larger values are better.
reconstruct_preproc	bool	Set if the image has been preprocessed for use with a reconstructor
render_min,max	float	Used when rendering an image to 8/16 bit integers. These are the values representing the minimum and maximum integer values
segment_centers	float list	Used when a volume has been segmented into regions. Set of 3*nregions floats in x1,y1,z1,x2,y2,z2,... order, indicating the center of each region as defined by the specific algorithm
source_path	string	When an image is read from a file, this is set to the filename
source_n	string	When an image is read from a file, this is set to the image number
subvolume_x0,y0,z0	int	Used when the EMData stores only a portion of a larger image in certain contexts (notably direct Fourier inversion. This represents the location of the origin of 'this' in the larger virtual volume
subvolume_full_nx,ny,nz	int	Used with subvolume_x0,... Specifies the size of the virtual volume that 'this' is a part of
threed_ptcl_src	string	In a 3-D map, this is the file containing the raw images used to create the reconstruction
threed_ptcl_idxs	tuple	In a 3-D map, this is a list of particle numbers used in the final reconstruction (see threed_ptcl_src)
threed_excl_ptcl_idxs	tuple	In a 3-D map, this is a list of particle numbers excluded from the final map (see threed_ptcl_src)
timestamp	string	When data for an image is being written this is updated with the current time. It is not updated for metadata changes, only when the image data is written
xform.projection	Transform	A Transform object used by Projectors. It is applied to a 3-D model prior to projecting by summing along Z. The inverse of this Transform is used by Reconstructors
xform.align2d	Transform	A Transform object representing a 2-D transformation used to align this EMData object to a reference in 2-D
xform.align3d	Transform	A Transform object representing a 3-D transformation used to align this (3-D) EMData object to a (3-D) reference

Proposed new Official tags (comments welcome):

apix_scan	float	Scan pixel size in Angstroms	(there are already parameters for apix_x/y/z above. Is this different in some way ?, you could provide this as supplementary information, but apix_x/y/z remain the 'official' values. Is this what you want ?)
box_location	int_array	4 values, x0, y0, xsize, ysize representing the location of the particle in the original (box_source) image	(Pawel suggested this be changed to the reduced image, because normally we first reduce the micrograph, the window it.) (No, the box location should be in the coordinates of the referenced 'parent' image. If you want to reference a reduced image you can, but I think it makes a lot more sense to provide the capability of refining the coordinates from the reduced image when you return to the original image.) Pawel: the coordinates in boxer refer to the image from windowing was done. This is is (or can be) a reduced micrograph. There is more confusion here: name apix_scan suggests this is pixel size of the scan, NOT THE REDUCED MICROGRAPH FROM WHICH PARTICLES WERE WINDOWED. Incidentally, both pixel sizes are needed and have to be in the header. Steve: This differs from the philosophy of e2boxer, which is that you always provide e2boxer with the original image, it may internally downscale it for boxing, but the final box positions are in terms of the original image. I don't understand why you would want to externally downscale, then have to rescale the coordinates again later ? This is very messy. Anyway, any single image has a particluar A/pix value associated with it. I have no objections to something like apix_original, but what exactly is the point ? You aren't likely to rescale reduced images back to their original size...
box_score	float	A value representing the relative quality (meaning may vary) of this particle compared to others
box_source	string	Filename (not full path) of the image this particle was extracted from
box_source_id	string	Other (database) identifier of the image the raw data came from if available

Unofficial tags (To prevent reuse, used by someone in their own code or for testing):

Tags used to store header information derived from other file formats:

Flags come form MRC file

datatype	int	pixel storage data type in EMAN format: EM_UCHAR, EM_SHORT, EM_USHORT, EM_SHORT_COMPLEX, EM_FLOAT, EM_FLOAT_COMPLEX
apix_x,y,z	float	Angstroms per pixel on the x-axis (also _y and _z). If _y or _z are 0 or not present it will be assumed that they are the same as _x. Note that CTF objects have an independent value for A/pix
MRC.minimum	float	Minimum density value
MRC.maximum	float	Maximum density value
MRC.mean	float	Mean density value
origin_x, y, z	float	image origin for x, y, z axis
MRC.nxstart	int	No. of first column in map
MRC.nystart	int	No. of first row in map
MRC.nzstart	int	No. of first section in map
MRC.mx	int	Number of intervals along X
MRC.my	int	Number of intervals along Y
MRC.mz	int	Number of intervals along Z
MRC.nx	int	number of columns
MRC.ny	int	number of rows
MRC.nz	int	number of sections
MRC.xlen	float	Cell dimensions (Angstroms)
MRC.ylen	float	Cell dimensions (Angstroms)
MRC.zlen	float	Cell dimensions (Angstroms)
MRC.alpha	float	Cell angles (Degrees)
MRC.beta	float	Cell angles (Degrees)
MRC.gamma	float	Cell angles (Degrees)
MRC.mapc	int	Which axis corresponds to Columns
MRC.mapr	int	Which axis corresponds to Rows
MRC.maps	int	Which axis corresponds to Sections
MRC.ispg	int	Space group number (0 for images)
MRC.nsymbt	int	Number of chars used for storing symmetry operators
MRC.machinestamp	int	machine stamp in CCP4 convention: big endian=0x11110000 little endian=0x44440000
MRC.rms	float	rms deviation of map from mean density
MRC.nlabels	int	Number of labels being used

Flags come form IMAGIC file

datatype	int	pixel storage data type in EMAN format: EM_UCHAR, EM_USHORT, EM_FLOAT, EM_FLOAT_COMPLEX
IMAGIC.imgnum	int	image number, index from 1 to n
IMAGIC.count	int	total number of images - 1
IMAGIC.error	int	Error code for this image
IMAGIC.headrec	int	number of header records/image (always 1)
IMAGIC.mday	int	image creation date
IMAGIC.month	int	image creation month
IMAGIC.year	int	image creation year
IMAGIC.hour	int	image creation hour
IMAGIC.minute	int	image creation minute
IMAGIC.sec	int	image creation second
IMAGIC.reals	int	image size in reals
IMAGIC.pixels	int	image size in pixels
IMAGIC.type	char(4)	PACK, INTG, REAL, COMP, RECO
IMAGIC.ixold	int	Top left X-coord. in image before windowing
IMAGIC.iyold	int	Top left Y-coord. in image before windowing
IMAGIC.oldav	float	old average density
IMAGIC.label	char(80)	image id string
ptcl_repr	int	raw images represented by this image. Note: non-standard use
xform.projection	Transform	particle orientation, set from the orientation flags(alt, az, phi) in the IMAGIC header
xform.align3d	Transform	particle orientation for 3D image, set from the orientation flags(alt, az, phi) in the IMAGIC header

Flags come form SPIDER file

datatype	int	pixel storage data type in EMAN format: EM_FLOAT
SPIDER.nslice	int	number of slices in volume; 1 for a 2D image
SPIDER.type	int	file type
SPIDER.irec	float	total number of records in the file (unused)
SPIDER.angvalid	int	1 if tilt angles have been computed
SPIDER.phi	float	tilt angle phi
SPIDER.theta	float	tilt angle theta
SPIDER.gamma	float	tilt angle gamma
SPIDER.headrec	int	number of records in header
SPIDER.headlen	int	header length in bytes
SPIDER.reclen	int	record length in bytes
SPIDER.dx	float	x translation
SPIDER.dy	float	y translation
SPIDER.dz	float	z translation
SPIDER.istack	int	0 for simple 2D or 3D (non-stack) files. for stacked image, istack=2 in overall header, istack =-1 in following individual images.
SPIDER.maxim	int	maxim is only used in the overall header for a stacked image file. It is the number of the highest image currently used in the stack. The number is updated, if necessary, when an image is added or deleted from the stack.
SPIDER.imgnum	int	imgnum is only used in a stacked image header. It is the number of the current image or zero if the image is unused.
SPIDER.Kangle	int	flag that additional angles are present in header. 1 = one additional rotation is present, 2 = additional rotation that preceeds the rotation that was stored in words 15..20.
SPIDER.phi1	float
SPIDER.theta1	float
SPIDER.psi1	float
SPIDER.phi2	float
SPIDER.theta2	float
SPIDER.psi2	float
SPIDER.date	char(11)	creation date
SPIDER.time	char(8)	creation time
SPIDER.title	char(160)	title
SPIDER.scale	float	scale factor
xform.projection	Transform	particle orientation, set from the orientation flags(phi, theta, psi, tx, ty, tz, scale) in the SPIDER header
xform.align3d	Transform	particle orientation for 3D image, set from the orientation flags(phi, theta, psi, tx, ty, tz, scale) in the SPIDER header

Flags come form PGM file

PGM.max_gray	int	maximum value for grey level
PGM.min_gray	int	minimum value for grey level

Flags come form SAL file

datatype	int	pixel storage data type in EMAN format: EM_SHORT
SAL.pixel	float	pixel size

Flags come form TIFF file

datatype	int	pixel storage data type in EMAN format: EM_UCHAR, EM_USHORT, EM_FLOAT
TIFF.bitspersample	unsigned short	bits per pixel sample
TIFF.resolution_x	float	x dimension resolution
TIFF.resolution_y	float	y dimension resolution

Flags from Gatan DM3 files

||DM3.acq_date||string||Acquisition date ||DM3.acq_time||string||Acquisition time ||DM3.actual_mag||float||Calibrated magnification

DM3.antiblooming

int

||DM3.binning_x||int||Binning (X) ||DM3.binning_y||int||Binning (Y) ||DM3.camera_x||int||Camera size (X) ||DM3.camera_y||int||Camera size (Y) ||DM3.cs||float||Microscope Cs ||DM3.exposure_number||int||Camera exposure number ||DM3.exposure_time||float||Exposure time ||DM3.frame_type||string||Frame type ||DM3.indicated_mag||float||Indicated magnification ||DM3.name||string||Filename ||DM3.pixel_size||float||Pixel size (microns) ||DM3.source||string||Camera name ||DM3.voltage||float||Microscope voltage

DM3.zoom

float

Flags come form XPLOR file

apix_x,y,z	float	Angstroms per pixel on the x-axis (also _y and _z). If _y or _z are 0 or not present it will be assumed that they are the same as _x. Note that CTF objects have an independent value for A/pix
XPLOR.alpha	float	alpha angle of the cell
XPLOR.beta	float	beta angle of the cell
XPLOR.gamma	float	gamma angle of the cell

-  ⇤ ← Revision 42 as of 2011-01-21 18:39:12 → 
  Size: 12864
  Editor: gtang
  Comment:
+   ← Revision 54 as of 2011-10-21 06:01:43 → ⇥
  Size: 17773
  Editor: IanRees
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 33:
-||class_ptcl_idxs ||tuple ||In a class-average, this is a list of particle numbers used in the final average (see class_ptcl_src) ||
+||class_ptcl_idxs ||tuple ||In a class-average, this is a list of particle numbers used in the final average (see class_ptcl_src and exc_class_ptcl_idxs) ||
 Line 45:
+||microscope_voltage||float||Voltage of the microscope in kV||
||microscope_cs||float||Cs of the microscope in mm||
||model_id||int||in a projection during multi-model refinement, this is the index of the model for the current projection. For single model refinements always 0||
-Line 63:
+Line 66:
+||timestamp||string||When data for an image is being written this is updated with the current time. It is not updated for metadata changes, only when the image data is written||
-Line 89:
+Line 93:
+||datatype ||int ||pixel storage data type in EMAN format: EM_UCHAR, EM_SHORT, EM_USHORT, EM_SHORT_COMPLEX, EM_FLOAT, EM_FLOAT_COMPLEX ||
||apix_x,y,z ||float ||Angstroms per pixel on the x-axis (also _y and _z). If _y or _z are 0 or not present it will be assumed that they are the same as _x. Note that CTF objects have an independent value for A/pix ||
-Line 92:
+Line 98:
+||origin_x, y, z ||float ||image origin for x, y, z axis ||
-Line 117:
+Line 124:
+||datatype ||int ||pixel storage data type in EMAN format: EM_UCHAR, EM_USHORT, EM_FLOAT, EM_FLOAT_COMPLEX ||
-Line 134:
+Line 142:
-||ptcl_repr ||int ||raw images represented by this image ||
||orientation_convention ||string ||orientation convention ||
||euler_alt ||float ||euler angle alt ||
||euler_az ||float ||euler angle az ||
||euler_phi ||float ||euler angle phi ||

==== Flags come form EMIM file ====
||pixel ||float ||pixel/voxel size in anstrom ||
||micrograph_id ||int ||micrograph number or CCD frame ||
+||ptcl_repr ||int ||raw images represented by this image. Note: non-standard use ||
||xform.projection ||Transform ||particle orientation, set from the orientation flags(alt, az, phi) in the IMAGIC header ||
||xform.align3d ||Transform ||particle orientation for 3D image, set from the orientation flags(alt, az, phi) in the IMAGIC header ||

==== Flags come form SPIDER file ====
||datatype ||int ||pixel storage data type in EMAN format: EM_FLOAT ||
||SPIDER.nslice ||int ||number of slices in volume; 1 for a 2D image ||
||SPIDER.type ||int ||file type ||
||SPIDER.irec ||float ||total number of records in the file (unused) ||
||SPIDER.angvalid ||int ||1 if tilt angles have been computed ||
||SPIDER.phi ||float ||tilt angle phi ||
||SPIDER.theta ||float ||tilt angle theta ||
||SPIDER.gamma ||float ||tilt angle gamma ||
||SPIDER.headrec ||int ||number of records in header ||
||SPIDER.headlen ||int ||header length in bytes ||
||SPIDER.reclen ||int ||record length in bytes ||
||SPIDER.dx ||float ||x translation ||
||SPIDER.dy ||float ||y translation ||
||SPIDER.dz ||float ||z translation ||
||SPIDER.istack ||int ||0 for simple 2D or 3D (non-stack) files. for stacked image, istack=2 in overall header, istack =-1 in following individual images. ||
||SPIDER.maxim ||int ||maxim is only used in the overall header for a stacked image file. It is the number of the highest image currently used in the stack. The number is updated, if necessary, when an image is added or deleted from the stack. ||
||SPIDER.imgnum ||int ||imgnum is only used in a stacked image header. It is the number of the current image or zero if the image is unused. ||
||SPIDER.Kangle ||int ||flag that additional angles are present in header. 1 = one additional rotation is present, 2 = additional rotation that preceeds the rotation that was stored in words 15..20. ||
||SPIDER.phi1 ||float || ||
||SPIDER.theta1 ||float || ||
||SPIDER.psi1 ||float || ||
||SPIDER.phi2 ||float || ||
||SPIDER.theta2 ||float || ||
||SPIDER.psi2 ||float || ||
||SPIDER.date ||char(11) ||creation date ||
||SPIDER.time ||char(8) ||creation time ||
||SPIDER.title ||char(160) ||title ||
||SPIDER.scale ||float ||scale factor ||
||xform.projection ||Transform ||particle orientation, set from the orientation flags(phi, theta, psi, tx, ty, tz, scale) in the SPIDER header ||
||xform.align3d ||Transform ||particle orientation for 3D image, set from the orientation flags(phi, theta, psi, tx, ty, tz, scale) in the SPIDER header ||

==== Flags come form PGM file ====
||PGM.max_gray ||int ||maximum value for grey level ||
||PGM.min_gray ||int ||minimum value for grey level ||

==== Flags come form SAL file ====
||datatype ||int ||pixel storage data type in EMAN format: EM_SHORT ||
||SAL.pixel ||float ||pixel size ||

==== Flags come form TIFF file ====
||datatype ||int ||pixel storage data type in EMAN format: EM_UCHAR, EM_USHORT, EM_FLOAT ||
||TIFF.bitspersample ||unsigned short ||bits per pixel sample ||
||TIFF.resolution_x ||float ||x dimension resolution ||
||TIFF.resolution_y ||float ||y dimension resolution ||

==== Flags from Gatan DM3 files ====
||DM3.acq_date||string||Acquisition date
||DM3.acq_time||string||Acquisition time
||DM3.actual_mag||float||Calibrated magnification
||DM3.antiblooming||int||
||DM3.binning_x||int||Binning (X)
||DM3.binning_y||int||Binning (Y)
||DM3.camera_x||int||Camera size (X)
||DM3.camera_y||int||Camera size (Y)
||DM3.cs||float||Microscope Cs
||DM3.exposure_number||int||Camera exposure number
||DM3.exposure_time||float||Exposure time
||DM3.frame_type||string||Frame type
||DM3.indicated_mag||float||Indicated magnification
||DM3.name||string||Filename
||DM3.pixel_size||float||Pixel size (microns)
||DM3.source||string||Camera name
||DM3.voltage||float||Microscope voltage
||DM3.zoom||float||

==== Flags come form XPLOR file ====
||apix_x,y,z ||float ||Angstroms per pixel on the x-axis (also _y and _z). If _y or _z are 0 or not present it will be assumed that they are the same as _x. Note that CTF objects have an independent value for A/pix ||
||XPLOR.alpha ||float ||alpha angle of the cell ||
||XPLOR.beta ||float ||beta angle of the cell ||
||XPLOR.gamma ||float ||gamma angle of the cell ||