# The form data below was edited by super-tom_tyler # Perforce Workshop Jobs # # Job: The job name. 'new' generates a sequenced job number. # Status: Job status; [open/closed/suspended]. Required # Project: The project this job is for. Required. # Severity: [A/B/C] (A is highest) Required. # ReportedBy The user who created the job. Can be changed. # ReportedDate: The date the job was created. Automatic. # ModifiedBy: The user who last modified this job. Automatic. # ModifiedDate: The date this job was last modified. Automatic. # OwnedBy: The owner, responsible for doing the job. Optional. # Description: Description of the job. Required. # DevNotes: Developer's comments. Optional. # Type: Type of job; [Bug/Feature]. Required. Job: SDP-28 Status: suspended Project: perforce-software-sdp Severity: C ReportedBy: tom_tyler ReportedDate: 2017/04/02 17:24:00 ModifiedBy: super-tom_tyler ModifiedDate: 2017/04/02 17:24:00 Description: Add a state engine to the SDP. THE IDEA: The SDP currently has no awareness of state. This leads to some undesirable behaviors when scheduled jobs run when the system is not in a good state, e.g. after prior failures. Making it state-aware could also prevent issues with, for example, two admins both trying to run a live checkpoint at the same time. RELEVANT HISTORY: A custom "state aware" SDP was developed and deployed an SDP customer as part of a larger effort to develop an automated (though human-initiated) custom failover solution. Follwoing deep discussions about how the SDP worked and should work in a variety of failure scenarios, it was deteremined that a state engine was necessary to simplify failover procedures. Failover procedures would then involve manual work to deterimine the initial state of things to start a failover, which would otherwise require manual review of logs and insight into inner workings of the SDP. A list of SDP states is here: https://swarm.workshop.perforce.com/files/guest/tom_tyler/sw/main/p4failover/src/SDP_States.txt The custom implementation required extensive modifications to the stock SDP that were not merged back to the SDP mainline. Thus customizations included addition of a state engine, and development of a failover solution. This job is only for the state engine component. A new state engine implementation can live independently of a failover solution, though a failover solution would benefit from (require?) a state engine, as you would only want to initiate failover if you knew what you were failover over to was happy and healthy. Key considerations regarding merging the earlier State Engine: * The state engine was a v1.0 solution, which made the SDP require more specialized knowledge to configure and operate. (This is true for any relaiable automated failover solution). That was fine for the initial customer, which had a team of crack Perforce admins willing and interested in learning the details of the SDP, and also willing to understand the state engine itself. But as implemented, it "raises the bar" on knoweldge required to recover the SDP from failure scenarios (though it also handles many failure scenarios better). * It's doesn't support Windows, being heavily based in bash shell scripts. Thus far we've maintained the SDP with mostly equivalent functionality, albeit different implmementations, across platforms. This would be a big divergence from that with this bash implementation of the state engine. (Folks who have wanted to rewrite the SDP from the ground up in Ruby, here's your chance!) * It's a big change, and shoudn't go back in without a massive update to the automated SDP regression testing. The custom variant itself was tested extensively in one environment. * It was written for 2010.2 specifcially, the "1.0" release for Perforce replication technology. Thus, it's based on old-fashioned SDP journal-replay methods, with no smarts for, or reliance on, p4d replication. Nowadays replication is far more polished. * The customizations are not available on The Workshop; they were done on a private server. Merging is not possible, but given the extent of change, starting from scratch with a new design is a better approach. DevNotes: This job, SDP-28, was originally named job000323. 2012-03-09 giles_rainy_brown: Having had a look through the scripts, it would be nice to have something about the journal number held in the state file; this would make it easier (and therefore quicker) for Tech Support to work out where things may have gone wrong. Type: Feature