Size: 5705
Comment:
|
Size: 1856
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
== Backup: Long Answer == | = EMEN2 Backups = |
Line 3: | Line 3: |
An EMEN2 database environment contains three types of files: database files, log files, and region files. | An EMEN2 environment contains a number of things: |
Line 5: | Line 5: |
Database files contain key/value pairs that comprise all the records in the database, as well as a number of database files used for indexes. Database files are contained in $DB_HOME/data and subdirectories. Log files contain data from all committed transactions, and are stored in $DB_HOME/log as log.XX, where XX are consecutive integers starting from 1. | BerkeleyDB files: * _db.* (environment backing files) * data/ (databases) * journal/ (transaction journal) |
Line 7: | Line 10: |
To provide guarantees about transaction atomicity and durability, changes are first written to log files on stable storage before a transaction is marked as committed. The database files are not updated until this has been completed. In the event of a crash or hardware failure, the database files can be checked against the log files to correct any errors or missing data. | EMEN2-managed file attachments: * binary/ (file storage) * preview/ (thumbnails and other derived data) * tmp/ (temporary files) |
Line 9: | Line 15: |
Because a cold backup copies the database files, the database must be stopped so they are not changed while the backup is in progress. Once a cold backup is made, it can be updated with a hot backup. A hot backup only copies new log files, which are append-only, and does not require the database files to be stable during the backup. | Configuration and application logs: * DB_CONFIG * config.json * log/ (EMEN2 application logs) * ssl/ (SSL certificates) |
Line 11: | Line 21: |
= Cold Backups = | |
Line 12: | Line 23: |
== backup.py == | The most "foolproof" way to backup EMEN2 is to stop all emen2 processes, and archive the entire EMEN2DBHOME directory. At this point, everything can be backed up as normal files without any special consideration. |
Line 14: | Line 25: |
EMEN2's database core includes some methods to help manage database environment backups. These methods (db.archivelogs, db.coldbackup, db.hotbackup) can be called from the cmdlineutils/backup.py script. | = Hot Backups = If the database is currently open, the BerkeleyDB files cannot simply be copied, because they are likely to change during the operation. The mechanism I recommend for creating incremental backups is to first create a cold backup, then checkpoint the environment and copy updated BerkeleyDB log files using: |
Line 17: | Line 30: |
Usage: backup.py [options] Options: --help Print help message -h HOME, --home=HOME DB_HOME -c CONFIGFILE, --configfile=CONFIGFILE --archive archive log files --cold cold backup --hot hot backup --force Force overwrite of existing backup |
emen2ctl archive -h <EMEN2DBHOME> |
Line 29: | Line 33: |
This command will copy the journal/log.* files to the configuration-specified directory, by default, EMEN2DBHOME/journal_archive. These can be copied to the journal directory of the cold backup, and replayed using the BerkeleyDB recover command, "db_recover -c -h <backup directory>". Please email me if you have any questions or concerns about this operation. | |
Line 30: | Line 35: |
== Cold Backup == | = Non-BerkeleyDB Files = |
Line 32: | Line 37: |
To create a cold backup, shut down any open database processes (see [[EMEN2/emen2control.py|emen2control.py]]). | Files that are not part of the BerkeleyDB environment (binary, preview, config, etc.) can be copied at any time using normal backup procedures; rsync is probably the most appropriate tool. |
Line 34: | Line 39: |
Once all processes are stopped, you can either copy or tar the DB_HOME environment, or use the the EMEN2 backup utility with the "--cold" option: {{{ [emen2@ncmidb ~]# python cmdlineutils/backup.py --cold }}} This will run a database checkpoint, and create a cold backup in the path specified by [[EMEN2/config.yml|BACKUPPATH]]. The database files, highest numbered log file, and configuration files will be copied. To prevent overwriting an existing cold backup, the script will not run if the target directory exists. You can rename/remove the existing cold backup first, or specify the "--force" option to backup.py. Example: {{{ [emen2@ncmidb ~]# python cmdlineutils/backup.py --cold ... snip: startup ... Opening Database Environment: /home/emen2/db/ Cold Backup: Checkpoint Cold Backup: Copying data: /home/emen2/db/data -> /home/emen2/db_backup/data Cold Backup: Copying config: /home/emen2/db/config.yml -> /home/emen2/db_backup/config.yml Cold Backup: Copying config: /home/emen2/db/DB_CONFIG -> /home/emen2/db_backup/DB_CONFIG Cold Backup: Copying log: /home/emen2/db/log/log.0000000311 -> /home/emen2/db_backup/log/log.0000000311 }}} Once you have created a cold backup, it can be updated by running a hot backup. It is safe to copy hot/cold backups because they are not active database environments. == Hot Backup == A hot backup copies these log files to an existing cold backup and uses them to bring it up to date with the current state of the main database environment. {{{ backup.py --hot }}} Example: {{{ [emen2@ncmidb ~]# python cmdlineutils/mdlineutils/backup.py --hot ... snip: startup ... Opening Database Environment: /home/emen2/db/ Hot Backup: Log archive Log Archive: Checkpoint Log Archive: /home/emen2/db/log/log.0000000303 -> /home/emen2/log_archive/log.0000000303 Log Archive: /home/emen2/db/log/log.0000000304 -> /home/emen2/log_archive/log.0000000304 Log Archive: /home/emen2/db/log/log.0000000305 -> /home/emen2/log_archive/log.0000000305 Hot Backup: Copying log: /home/emen2/db/log/log.0000000303 -> /home/emen2/db_backup/log/log.0000000303 Hot Backup: Copying log: /home/emen2/db/log/log.0000000304 -> /home/emen2/db_backup/log/log.0000000304 Hot Backup: Copying log: /home/emen2/db/log/log.0000000305 -> /home/emen2/db_backup/log/log.0000000305 Hot Backup: Copying log: /home/emen2/db/log/log.0000000306 -> /home/emen2/db_backup/log/log.0000000306 Log Archive: Checkpoint Log Archive: Removing /home/emen2/db/log/log.0000000303 Log Archive: Removing /home/emen2/db/log/log.0000000304 Log Archive: Removing /home/emen2/db/log/log.0000000305 }}} == Log Archive == This is normally done automatically as part of the normal hot backup process, but can be invoked manually if necessary (e.g. running out of disk space on DB_HOME partition) {{{ [emen2@ncmidb ~]# python cmdlineutils/backup.py --archive ... snip: startup ... Opening Database Environment: /home/emen2/db/ Log Archive: Checkpoint Log Archive: /home/emen2/db/log/log.0000000303 -> /home/emen2/log_archive/log.0000000303 Log Archive: /home/emen2/db/log/log.0000000304 -> /home/emen2/log_archive/log.0000000304 Log Archive: /home/emen2/db/log/log.0000000305 -> /home/emen2/log_archive/log.0000000305 }}} == Recovery == To prepare a cold/hot backup environment for use, run db_recover with the "-c" and "-h" flags. You should then copy the environment to the location specified by $DB_HOME. Example: {{{ [emen2@ncmidb ~]# db_recover -c -h db_backup [emen2@ncmidb ~]# mv db db.crashed [emen2@ncmidb ~]# cp -vr db_backup db }}} == Additional Help == If you have any questions about how to best backup your EMEN2 environment, or to recover from a crash, please contact Ian Rees. |
If you have changed your configuration to use directories outside of EMEN2DBHOME (most commonly, to place binary storage on different disk) make sure you back these up as well! Again, rsync is fine. |
EMEN2 Backups
An EMEN2 environment contains a number of things:
BerkeleyDB files:
- _db.* (environment backing files)
- data/ (databases)
- journal/ (transaction journal)
EMEN2-managed file attachments:
- binary/ (file storage)
- preview/ (thumbnails and other derived data)
- tmp/ (temporary files)
Configuration and application logs:
- DB_CONFIG
- config.json
- log/ (EMEN2 application logs)
- ssl/ (SSL certificates)
Cold Backups
The most "foolproof" way to backup EMEN2 is to stop all emen2 processes, and archive the entire EMEN2DBHOME directory. At this point, everything can be backed up as normal files without any special consideration.
Hot Backups
If the database is currently open, the BerkeleyDB files cannot simply be copied, because they are likely to change during the operation. The mechanism I recommend for creating incremental backups is to first create a cold backup, then checkpoint the environment and copy updated BerkeleyDB log files using:
emen2ctl archive -h <EMEN2DBHOME>
This command will copy the journal/log.* files to the configuration-specified directory, by default, EMEN2DBHOME/journal_archive. These can be copied to the journal directory of the cold backup, and replayed using the BerkeleyDB recover command, "db_recover -c -h <backup directory>". Please email me if you have any questions or concerns about this operation.
Non-BerkeleyDB Files
Files that are not part of the BerkeleyDB environment (binary, preview, config, etc.) can be copied at any time using normal backup procedures; rsync is probably the most appropriate tool.
If you have changed your configuration to use directories outside of EMEN2DBHOME (most commonly, to place binary storage on different disk) make sure you back these up as well! Again, rsync is fine.