SDP-302 #11

# The form data below was edited by tom_tyler
# Perforce Workshop Jobs
#
#  Job:           The job name. 'new' generates a sequenced job number.
#
#  Status:        Job status; required field.  There is no enforced or
#                 promoted workflow for transition of jobs from one
#                 status to another, just a set of job status values
#                 for users to apply as they see fit.  Possible values:
#
#                 open - Issue is available to be worked on.
#
#                 inprogress - Active development is in progress.
#
#                 blocked - Issue cannot be implemented for some reason.
#
#                 fixed - Fixed, optional status to use before closed.
#                 
#                 closed - Issue has been dealt with definitively.
#
#                 punted - Decision made not to address the issue,
#                    possibly not ever.
#
#                 suspended - Decision made not to address the issue
#                    in the immediate future, but noting that it may
#                    have some merit and may be revisited later.
#
#                 duplicate - Duplicate of another issue that.
#
#                 obsolete - The need behind the request has become
#                    overcome by events.
#
#  Project:       The project this job is for. Required.
#
#  Severity:      [A/B/C] (A is highest)  Required.
#
#  ReportedBy     The user who created the job. Can be changed.
#
#  ReportedDate:  The date the job was created.  Automatic.
#
#  ModifiedBy:    The user who last modified this job. Automatic.
#
#  ModifiedDate:  The date this job was last modified. Automatic.
#
#  OwnedBy:       The owner, responsible for doing the job. Optional.
#
#  Description:   Description of the job.  Required.
#
#  DevNotes:      Developer's comments.  Optional.  Can be used to
#                 explain a status, e.g. for blocked, punted,
#                 obsolete or duplicate jobs.  May also provide
#                 additional information such as the earliest release
#                 in which a bug is known to exist.
#
# Component:      Projects may use this optional field to indicate
#                 which component of the project a given job is associated
#                 with.
#
#                 For the SDP, the list of components is defined in:
#                 //guest/perforce_software/sdp/tools/components.txt
#
#  Type:          Type of job [Bug/Doc/Feature/Problem].  Required.
#                 
#                 Bug: is a problem that is fairly well understood,
#                 e.g. one for which there is a reproduction or clear
#                 articulation of the problem.
#                 
#                 Doc: A Documentation fix.
#                 
#                 Feature: An enhancement request, perhaps adding
#                 a new product features, improving maintainability,
#                 essentially any new software improvement other than
#                 a fix to something broken.
#                 
#                 Problem: a suspected bug, or one without a clear
#                 understanding of exactly what is broken.
#
#  Release:       Release in which job is intended to be fixed.

Job:	SDP-302

Status:	closed

Project:	perforce-software-sdp

Severity:	C

ReportedBy:	akwan

ReportedDate:	2018/02/21 18:57:59

ModifiedBy:	tom_tyler

ModifiedDate:	2023/04/13 08:10:32

OwnedBy:	tom_tyler

Description:
	Parallelized checkpoint processing to reduce duration.
	
	Enable parallel checkpoints, and include test suite coverage for
	same.
	
	Excerpt of email from Alan Kwan:
	---
	 I've framed out a pseudo code implementation of how it could behave
	as backup_functions in SDP:
	
	dump_checkpoint_parallel()
	
	- get list of db files (this can be optimized to sort by largest or
	smallest to keep work queues as saturated as possible)
	- get p4_var variable set to # of worker threads, else use logic to
	determine a just in time value:
	- figure out cpu core count
	- check system active load value
		- define # of threads = to core count minus active load, minus 1 (if
	result is 0 or less than 1, set to 1 - not parallel)
	- define work queue (ls -1 /p4/1/offline_db/)
	- insert code to execute against work queue based on (
	http://hackthology.com/a-job-queue-in-bash.html ), and while limit,
	keep working until end - checkpoint files are named
	/p4/1/checkpoints/p4_1.ckp.db.have.number.gz (along with their MD5)
		- rewrite the offline_
	
	restore parallel would implement something similar - get the list of
	compressed checkpoint files, throw in a work queue, and jr -z on each
	one into the same offline_db folder until they're all done.
	
	augment remove_old_checkpoints_and_journals to incorporate these sort
	of checkpoints
	
	Excerpt of email from Robert Cowham:
	---
	
	An alternative step along the way also is to use pigz or similar for
	parallel compression which is where a lot of time is spent.
	
	Typically the focus should be on the 3-7 or so files which comprise
	the vast majority of the data (db.have/db.rev and friends/db.integed/
	db.label depending)
	
	I would also be tempted to tar the result into one file rather after
	zipping/before unzipping for ease of management.

DevNotes:
	[2023/04/13 tom_tyler]: This job is now closed.  Parallel checkpoints
	are now fully supported. The needed p4d features to ensure reliable
	processing have been released, and the SDP now takes advantage of
	them.  See notes about DO_PARALLEL_CHECKPOINTS in the Instance Vars
	file (e.g. /p4/common/config/p4_1.vars) for more info.
	
	[2021/07/06 tom_tyler]: This job has been suspended.  Turns out some
	needed p4d support (a command to get a list of checkpointed tables)
	isn't available. Also, there is hope that a future release of p4d will
	provide this capability without the need for scripting.
	
	While there are implementations of the parallel checkpoint mechanism
	that have been made to work (by checkpointing all tables whether they
	need it or not), this is the sort of thing that can never fail.  We
	decided this feature, while it would be valuable, is best done as a
	p4d feature rather than an SDP feature.  When the needed functionality
	is added to p4d, this job will be re-opened.
	
	[2020/08/18 tom_tyler]: Re-opening this job to re-add this feature,
	with full test suite coverage.
	
	Older Notes:
	
	This can be done reliably, but will be sophisticated.  We may want
	to add an optional new setting in instance_vars.template, e.g.
	PARALLEL_CHECKPOINTS with a default value of 0.
	
	Then either dump_checkpoint() or dump_checkpoint_parallel() would be
	called depending on whether that new var is set to 1 or not.  So
	by default it would still do single-threaded checkpoints, and
	would do parallel checkpoints if explicitly enabled.

Component:	core-unix

Type:	Feature
#	Change	User	Description	Committed
#11	default
#10	default
#9	default
#8	default
#7	default
#6	default
#5	default
#4	default
#3	default
#2	default
#1	default