jobs/SDP-563

SDP-563

Details
Comments 5

Status: Open
Project: perforce-software-sdp
Severity: C
Reported By: ashaikh
Reported Date: 4 years ago
Modified By: C. Thomas Tyler
Modified Date: 2 years ago
Owned By: ashaikh
Dev Notes: Changed from 'Bug' to 'Feature'. See discussion:
https://swarm.workshop.perforce.com/jobs/SDP-563#comments
Also changed header line to match that this is a feature request.
Component: core-unix
Type: Feature

	Flag as Task C. Thomas Tyler commented 4 years ago The daily_checkpoint.sh script isn't intended to work on a forwarding replica; it should only be run on a master or an edge server. I set the Component field of this job to 'doc' to add the needed clarification. Here are some bits: First, none of the SDP scripts interfere with 'p4d pull' real-time replication. It sets standards and conventions for how replication is setup, but none of the scripts do anything to a replica once it is running. (Well, except for load_checkpoint.sh that blasts and reseeds). For forwarding replicas, there are a few scripts you might choose, depending on your goals. You may want to use either sync_replica.sh (or replica_cleanup.sh), and optionally request_replica_checkpoint.sh. sync_replica.sh: This keeps the offline_db on the replica in sync with the master, by rsyncing checkpionts from the master and replaying to the offline_db, as well as doing various and sundry tasks like log rotation and compression and cleanup. Since it pulls checkpoints taken on the master, it is only appropriate if the replica is not filtered in any way. As an alternative, the replica_cleanup.sh script skips the rsync and the offline_db maintenance, and just cleanup stuff. There is also request_replica_checkpoint.sh. That's pretty much just a wrapper to the 'p4 admin checkpoint -Z' command. That will cause the replica to execute a checkpoint on the next rotation detected from the replica's P4TARGET server. This is ideal for forwarding replicas that are filtered, and also can be used for the moral equivalent of a live checkpoint on an edge server (when combined with rotate_journal.sh run on the master to trigger the checkpoint on the edge to start when you want). Note that taking checkpoints of unfiltered forwarding replicas is NOT recommended. If you want to offload checkpoint creation from the master, use a standby replica instead. However, that is not recommended, as checkpoints taken using daily_checkpoint.sh using offline_db of the master use the simplest possible mechanism, and thus the most reliable. That said, taking checkpoints on standby replicas is a viable alternative. Also note that the 'p4 failover' command only supports failing over to replicas of type 'standby' or 'forwarding-standby'. Unfiltered replicas (with a Services value of 'replica' or 'forwarding-replica') don't really need checkpoints. Filtered replicas or forwarding replicas may be worth checkpointing as they have a data set different from the master, albeit a mere subset of data from the master.
	Reply ·0

	Flag as Task C. Thomas Tyler commented 4 years ago I'll also make a change so that daily_checkpoint.sh generates an error immediately if run on a server type that it wasn't intended to run on.
	Reply ·0

	Flag as Task ashaikh commented 4 years ago Thanks for the detailed explanation. So would the checkpoint created by using the request_replica_checkpoint.sh be different than the checkpoint generated by adding support to the SDP script to replay the filtered replica journals to it's offline db and creating a checkpoint off of that? The drawback I see from requesting a checkpoint using 'p4 admin checkpoint -Z' is that if we have a large filtered replica server, replaying journals and creating a checkpoint could take a significant amount of time. During this time users would not be able to use the server until the process finishes right?
	Reply ·0

	Flag as Task C. Thomas Tyler commented 4 years ago That's correct! And if your forwarding replica is filtered and actively used, I can see why you'd want daily_checkpoint.sh to run on it -- to have local offline checkpoints of the filtered replica's data set. I can think of two things that might help: Come up with a variation on the sync_replica.sh them that works on filtered replicas. Make daily_checkpoint.sh "just work" on a filtered replica (per the original request). Since a filtered replica is a strict subset of the master, you could always created a new seed checkpoint from the master data set (using something like: p4d_1 -r /p4/1/offline_db -J off -z -P FilteredReplicaServerID -jd /p4/1/checkpoints/p4_1.ckp.FilteredReplicaServerID.NNN.gz' where NNN is found by something like: p4d_1 -r /p4/1/offline_db -k db.counters -jd - \| grep '@db.counters@ @journal@'\|cut -d '@' -f 8 So, Option 1 would be based on that. The advantage would be a new seed checkpoint taken on the master's offline_db would be the most reliable. Option 2 would be replaying local (filtered) journals to the local offline_db, and generating local offline checkpoints. That should work, but it's a copy-of-a-copy thing, and might suffer some fidelity loss in certain (rare) situations. So, I changed this component from 'doc' back to 'core-unix' (though with implied doc changes needed), since this would take a code change to implement, now that I understand the use case more fully. I also changed it form 'Bug' to 'Feature' the current script is intentionally not working for filtered replicas; adding support for that is something new.
	Reply ·0

	Flag as Task ashaikh commented 4 years ago Sounds good. I went ahead and updated the backup_functions.sh to support replicas. This has been tested on several servers and works as expected. https://swarm.workshop.perforce.com/reviews/26875 Thanks!
	Reply ·0