Differences between revisions 1 and 8 (spanning 7 versions)
Revision 1 as of 2010-04-05 19:21:18
Size: 5705
Editor: root
Comment:
Revision 8 as of 2010-12-07 10:55:17
Size: 1603
Editor: root
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
== Backup: Long Answer == = EMEN2 Backups =
Line 3: Line 3:
An EMEN2 database environment contains three types of files: database files, log files, and region files. An EMEN2 environment contains a number of things:
Line 5: Line 5:
Database files contain key/value pairs that comprise all the records in the database, as well as a number of database files used for indexes. Database files are contained in $DB_HOME/data and subdirectories. Log files contain data from all committed transactions, and are stored in $DB_HOME/log as log.XX, where XX are consecutive integers starting from 1. BerkeleyDB files:
 * __db.* (BDB backing files)
 * home/ (BDB registration)
 * log/ (BDB log files)
 * data/ (EMEN2 databases)
Line 7: Line 11:
To provide guarantees about transaction atomicity and durability, changes are first written to log files on stable storage before a transaction is marked as committed. The database files are not updated until this has been completed. In the event of a crash or hardware failure, the database files can be checked against the log files to correct any errors or missing data. EMEN2-managed file attachments:
 * emen2data/ (file storage)
 * tiles/ (thumbnails and other derived data)
 * tmp/ (temporary files)
Line 9: Line 16:
Because a cold backup copies the database files, the database must be stopped so they are not changed while the backup is in progress. Once a cold backup is made, it can be updated with a hot backup. A hot backup only copies new log files, which are append-only, and does not require the database files to be stable during the backup. Configuration and application logs:
 * DB_CONFIG
 * config.json
 * applog/ (EMEN2 application logs)
 * ssl/ (encryption keys)
Line 11: Line 22:
= Cold Backups =
Line 12: Line 24:
== backup.py == The most "foolproof" way to backup EMEN2 is to stop all emen2 processes, and archive the entire EMEN2DBHOME directory. At this point, everything can be backed up as normal files without any special consideration.
Line 14: Line 26:
EMEN2's database core includes some methods to help manage database environment backups. These methods (db.archivelogs, db.coldbackup, db.hotbackup) can be called from the cmdlineutils/backup.py script. If you have specified paths outside EMEN2DBHOME, e.g. for binary attachment storage, you will also need to archive these directories.

= Hot Backups =

If the EMEN2 environment is currently open, the BerkeleyDB files cannot simply be copied, because they are likely to change during the operation. The mechanism I recommend for creating incremental backups is to first create a cold backup, then copy updated BerkeleyDB log files using:
Line 17: Line 33:
Usage: backup.py [options]

Options:
  --help Print help message
  -h HOME, --home=HOME DB_HOME
  -c CONFIGFILE, --configfile=CONFIGFILE
  --archive archive log files
  --cold cold backup
  --hot hot backup
  --force Force overwrite of existing backup
emen2control.py --log_archive
Line 29: Line 36:
This command will copy the EMEN2DBHOME/log/log.* files to the configuration-specified directory, and you can use these to bring a cold-backup up to date using "db_restore -c"
Line 30: Line 38:
== Cold Backup == = Non-BerkeleyDB Files =
Line 32: Line 40:
To create a cold backup, shut down any open database processes (see [[EMEN2/emen2control.py|emen2control.py]]).

Once all processes are stopped, you can either copy or tar the DB_HOME environment, or use the the EMEN2 backup utility with the "--cold" option:

{{{
[emen2@ncmidb ~]# python cmdlineutils/backup.py --cold
}}}

This will run a database checkpoint, and create a cold backup in the path specified by [[EMEN2/config.yml|BACKUPPATH]]. The database files, highest numbered log file, and configuration files will be copied.

To prevent overwriting an existing cold backup, the script will not run if the target directory exists. You can rename/remove the existing cold backup first, or specify the "--force" option to backup.py.

Example:

{{{
[emen2@ncmidb ~]# python cmdlineutils/backup.py --cold
  ... snip: startup ...
Opening Database Environment: /home/emen2/db/
Cold Backup: Checkpoint
Cold Backup: Copying data: /home/emen2/db/data -> /home/emen2/db_backup/data
Cold Backup: Copying config: /home/emen2/db/config.yml -> /home/emen2/db_backup/config.yml
Cold Backup: Copying config: /home/emen2/db/DB_CONFIG -> /home/emen2/db_backup/DB_CONFIG
Cold Backup: Copying log: /home/emen2/db/log/log.0000000311 -> /home/emen2/db_backup/log/log.0000000311
}}}

Once you have created a cold backup, it can be updated by running a hot backup.

It is safe to copy hot/cold backups because they are not active database environments.

== Hot Backup ==


A hot backup copies these log files to an existing cold backup and uses them to bring it up to date with the current state of the main database environment.

{{{
backup.py --hot
}}}

Example:

{{{
[emen2@ncmidb ~]# python cmdlineutils/mdlineutils/backup.py --hot
  ... snip: startup ...
Opening Database Environment: /home/emen2/db/
Hot Backup: Log archive
Log Archive: Checkpoint
Log Archive: /home/emen2/db/log/log.0000000303 -> /home/emen2/log_archive/log.0000000303
Log Archive: /home/emen2/db/log/log.0000000304 -> /home/emen2/log_archive/log.0000000304
Log Archive: /home/emen2/db/log/log.0000000305 -> /home/emen2/log_archive/log.0000000305
Hot Backup: Copying log: /home/emen2/db/log/log.0000000303 -> /home/emen2/db_backup/log/log.0000000303
Hot Backup: Copying log: /home/emen2/db/log/log.0000000304 -> /home/emen2/db_backup/log/log.0000000304
Hot Backup: Copying log: /home/emen2/db/log/log.0000000305 -> /home/emen2/db_backup/log/log.0000000305
Hot Backup: Copying log: /home/emen2/db/log/log.0000000306 -> /home/emen2/db_backup/log/log.0000000306
Log Archive: Checkpoint
Log Archive: Removing /home/emen2/db/log/log.0000000303
Log Archive: Removing /home/emen2/db/log/log.0000000304
Log Archive: Removing /home/emen2/db/log/log.0000000305
}}}


== Log Archive ==


This is normally done automatically as part of the normal hot backup process, but can be invoked manually if necessary (e.g. running out of disk space on DB_HOME partition)

{{{
[emen2@ncmidb ~]# python cmdlineutils/backup.py --archive
 ... snip: startup ...
Opening Database Environment: /home/emen2/db/
Log Archive: Checkpoint
Log Archive: /home/emen2/db/log/log.0000000303 -> /home/emen2/log_archive/log.0000000303
Log Archive: /home/emen2/db/log/log.0000000304 -> /home/emen2/log_archive/log.0000000304
Log Archive: /home/emen2/db/log/log.0000000305 -> /home/emen2/log_archive/log.0000000305
}}}


== Recovery ==

To prepare a cold/hot backup environment for use, run db_recover with the "-c" and "-h" flags. You should then copy the environment to the location specified by $DB_HOME.

Example:

{{{
[emen2@ncmidb ~]# db_recover -c -h db_backup
[emen2@ncmidb ~]# mv db db.crashed
[emen2@ncmidb ~]# cp -vr db_backup db
}}}


== Additional Help ==

If you have any questions about how to best backup your EMEN2 environment, or to recover from a crash, please contact Ian Rees.
Files that are not part of the BerkeleyDB environment (emen2data, tiles, config, etc.) can be copied at any time using normal backup procedures; "rsync" is probably the most appropriate tool.

EMEN2 Backups

An EMEN2 environment contains a number of things:

BerkeleyDB files:

  • db.* (BDB backing files)

  • home/ (BDB registration)
  • log/ (BDB log files)
  • data/ (EMEN2 databases)

EMEN2-managed file attachments:

  • emen2data/ (file storage)
  • tiles/ (thumbnails and other derived data)
  • tmp/ (temporary files)

Configuration and application logs:

  • DB_CONFIG
  • config.json
  • applog/ (EMEN2 application logs)
  • ssl/ (encryption keys)

Cold Backups

The most "foolproof" way to backup EMEN2 is to stop all emen2 processes, and archive the entire EMEN2DBHOME directory. At this point, everything can be backed up as normal files without any special consideration.

If you have specified paths outside EMEN2DBHOME, e.g. for binary attachment storage, you will also need to archive these directories.

Hot Backups

If the EMEN2 environment is currently open, the BerkeleyDB files cannot simply be copied, because they are likely to change during the operation. The mechanism I recommend for creating incremental backups is to first create a cold backup, then copy updated BerkeleyDB log files using:

emen2control.py --log_archive

This command will copy the EMEN2DBHOME/log/log.* files to the configuration-specified directory, and you can use these to bring a cold-backup up to date using "db_restore -c"

Non-BerkeleyDB Files

Files that are not part of the BerkeleyDB environment (emen2data, tiles, config, etc.) can be copied at any time using normal backup procedures; "rsync" is probably the most appropriate tool.

EMEN2/Backups (last edited 2013-04-18 06:47:18 by IanRees)