# The form data below was edited by super-tom_tyler
# Perforce Workshop Jobs
#
# Job: The job name. 'new' generates a sequenced job number.
# Status: Job status; [open/closed/suspended]. Required
# Project: The project this job is for. Required.
# Severity: [A/B/C] (A is highest) Required.
# ReportedBy The user who created the job. Can be changed.
# ReportedDate: The date the job was created. Automatic.
# ModifiedBy: The user who last modified this job. Automatic.
# ModifiedDate: The date this job was last modified. Automatic.
# OwnedBy: The owner, responsible for doing the job. Optional.
# Description: Description of the job. Required.
# DevNotes: Developer's comments. Optional.
# Type: Type of job; [Bug/Feature]. Required.
Job: SDP-28
Status: suspended
Project: perforce-software-sdp
Severity: C
ReportedBy: tom_tyler
ReportedDate: 2017/04/02 17:24:00
ModifiedBy: super-tom_tyler
ModifiedDate: 2017/04/02 17:24:00
Description:
Add a state engine to the SDP.
THE IDEA:
The SDP currently has no awareness of state. This leads to some
undesirable behaviors when scheduled jobs run when the system
is not in a good state, e.g. after prior failures. Making it
state-aware could also prevent issues with, for example, two
admins both trying to run a live checkpoint at the same time.
RELEVANT HISTORY:
A custom "state aware" SDP was developed and deployed an SDP customer
as part of a larger effort to develop an automated (though human-initiated)
custom failover solution. Follwoing deep discussions about how the
SDP worked and should work in a variety of failure scenarios, it was
deteremined that a state engine was necessary to simplify failover
procedures. Failover procedures would then involve manual work to
deterimine the initial state of things to start a failover, which
would otherwise require manual review of logs and insight into
inner workings of the SDP.
A list of SDP states is here:
https://swarm.workshop.perforce.com/files/guest/tom_tyler/sw/main/p4failover/src/SDP_States.txt
The custom implementation required extensive modifications to the
stock SDP that were not merged back to the SDP mainline. Thus
customizations included addition of a state engine, and development
of a failover solution.
This job is only for the state engine component. A new state engine
implementation can live independently of a failover solution, though a
failover solution would benefit from (require?) a state engine, as
you would only want to initiate failover if you knew what you were
failover over to was happy and healthy.
Key considerations regarding merging the earlier State Engine:
* The state engine was a v1.0 solution, which made the SDP require
more specialized knowledge to configure and operate. (This is
true for any relaiable automated failover solution). That was
fine for the initial customer, which had a team of crack Perforce
admins willing and interested in learning the details of the SDP,
and also willing to understand the state engine itself. But as
implemented, it "raises the bar" on knoweldge required to
recover the SDP from failure scenarios (though it also handles
many failure scenarios better).
* It's doesn't support Windows, being heavily based in bash
shell scripts. Thus far we've maintained the SDP with
mostly equivalent functionality, albeit different
implmementations, across platforms. This would be a big
divergence from that with this bash implementation of the
state engine. (Folks who have wanted to rewrite the SDP
from the ground up in Ruby, here's your chance!)
* It's a big change, and shoudn't go back in without a massive
update to the automated SDP regression testing. The custom
variant itself was tested extensively in one environment.
* It was written for 2010.2 specifcially, the "1.0" release
for Perforce replication technology. Thus, it's based on
old-fashioned SDP journal-replay methods, with no smarts for,
or reliance on, p4d replication. Nowadays replication is far
more polished.
* The customizations are not available on The Workshop; they were
done on a private server. Merging is not possible, but given the
extent of change, starting from scratch with a new design is
a better approach.
DevNotes:
This job, SDP-28, was originally named job000323.
2012-03-09 giles_rainy_brown: Having had a look through the scripts,
it would be nice to have something about the journal number
held in the state file; this would make it easier (and therefore
quicker) for Tech Support to work out where things may have gone
wrong.
Type: Feature