SDP-302 | Parallelized checkpoint processing to reduce duration. Enable parallel checkp...oints, and include test suite coverage for same. Excerpt of email from Alan Kwan: --- I've framed out a pseudo code implementation of how it could behave as backup_functions in SDP: dump_checkpoint_parallel() - get list of db files (this can be optimized to sort by largest or smallest to keep work queues as saturated as possible) - get p4_var variable set to # of worker threads, else use logic to determine a just in time value: - figure out cpu core count - check system active load value - define # of threads = to core count minus active load, minus 1 (if result is 0 or less than 1, set to 1 - not parallel) - define work queue (ls -1 /p4/1/offline_db/) - insert code to execute against work queue based on ( http://hackthology.com/a-job-queue-in-bash.html ), and while limit, keep working until end - checkpoint files are named /p4/1/checkpoints/p4_1.ckp.db.have.number.gz (along with their MD5) - rewrite the offline_ restore parallel would implement something similar - get the list of compressed checkpoint files, throw in a work queue, and jr -z on each one into the same offline_db folder until they're all done. augment remove_old_checkpoints_and_journals to incorporate these sort of checkpoints Excerpt of email from Robert Cowham: --- An alternative step along the way also is to use pigz or similar for parallel compression which is where a lot of time is spent. Typically the focus should be on the 3-7 or so files which comprise the vast majority of the data (db.have/db.rev and friends/db.integed/ db.label depending) I would also be tempted to tar the result into one file rather after zipping/before unzipping for ease of management. « | |
Add Job |