SDP-28 #1

  • //
  • spec/
  • job/
  • SDP-28
  • View
  • Commits
  • Open Download .zip Download (5 KB)
# The form data below was edited by super-tom_tyler
# Perforce Workshop Jobs
#
#  Job:           The job name. 'new' generates a sequenced job number.
#  Status:        Job status; [open/closed/suspended].  Required
#  Project:       The project this job is for. Required.
#  Severity:      [A/B/C] (A is highest)  Required.
#  ReportedBy     The user who created the job. Can be changed.
#  ReportedDate:  The date the job was created.  Automatic.
#  ModifiedBy:    The user who last modified this job. Automatic.
#  ModifiedDate:  The date this job was last modified. Automatic.
#  OwnedBy:       The owner, responsible for doing the job. Optional.
#  Description:   Description of the job.  Required.
#  DevNotes:      Developer's comments.  Optional.
#  Type:	  Type of job; [Bug/Feature].  Required.

Job:	SDP-28

Status:	suspended

Project:	perforce-software-sdp

Severity:	C

ReportedBy:	tom_tyler

ReportedDate:	2017/04/02 17:24:00

ModifiedBy:	super-tom_tyler

ModifiedDate:	2017/04/02 17:24:00

Description:
	Add a state engine to the SDP.
	
	THE IDEA:
	
	The SDP currently has no awareness of state.  This leads to some
	undesirable behaviors when scheduled jobs run when the system
	is not in a good state, e.g. after prior failures.  Making it
	state-aware could also prevent issues with, for example, two
	admins both trying to run a live checkpoint at the same time.
	
	RELEVANT HISTORY:
	
	A custom "state aware" SDP was developed and deployed an SDP customer
	as part of a larger effort to develop an automated (though human-initiated)
	custom failover solution.  Follwoing deep discussions about how the	
	SDP worked and should work in a variety of failure scenarios, it was
	deteremined that a state engine was necessary to simplify failover
	procedures.  Failover procedures would then involve manual work to
	deterimine the initial state of things to start a failover, which
	would otherwise require manual review of logs and insight into
	inner workings of the SDP.
	
	A list of SDP states is here:
	https://swarm.workshop.perforce.com/files/guest/tom_tyler/sw/main/p4failover/src/SDP_States.txt
	
	The custom implementation required extensive modifications to the
	stock SDP that were not merged back to the SDP mainline.  Thus
	customizations included addition of a state engine, and development
	of a failover solution.
	
	This job is only for the state engine component.  A new state engine
	implementation can live independently of a failover solution, though a
	failover solution would benefit from (require?) a state engine, as
	you would only want to initiate failover if you knew what you were
	failover over to was happy and healthy.
	
	Key considerations regarding merging the earlier State Engine:
	* The state engine was a v1.0 solution, which made the SDP require
	more specialized knowledge to configure and operate.  (This is
	true for any relaiable automated failover solution).  That was
	fine for the initial customer, which had a team of crack Perforce
	admins willing and interested in learning the details of the SDP,
	and also willing to understand the state engine itself.  But as
	implemented, it "raises the bar" on knoweldge required to
	recover the SDP from failure scenarios (though it also handles
	many failure scenarios better).
	* It's doesn't support Windows, being heavily based in bash
	shell scripts.  Thus far we've maintained the SDP with
	mostly equivalent functionality, albeit different
	implmementations, across platforms.  This would be a big
	divergence from that with this bash implementation of the
	state engine.  (Folks who have wanted to rewrite the SDP
	from the ground up in Ruby, here's your chance!)
	* It's a big change, and shoudn't go back in without a massive
	update to the automated SDP regression testing.  The custom
	variant itself was tested extensively in one environment.
	* It was written for 2010.2 specifcially, the "1.0" release
	for Perforce replication technology.  Thus, it's based on
	old-fashioned SDP journal-replay methods, with no smarts for,
	or reliance on, p4d replication.  Nowadays replication is far	
	more polished.
	* The customizations are not available on The Workshop; they were
	done on a private server.  Merging is not possible, but given the
	extent of change, starting from scratch with a new design is
	a better approach.

DevNotes:
	This job, SDP-28, was originally named job000323.
	
	2012-03-09 giles_rainy_brown: Having had a look through the scripts,
	it would be nice to have something about the journal number
	held in the state file; this would make it easier (and therefore
	quicker) for Tech Support to work out where things may have gone
	wrong.

Type:	Feature
# Change User Description Committed
#1 default