SDP-341 | Critical recreate_db_checkpoint.sh bug with shared /hxdepots shared. This bug... won't impact many customers due to it involving an unlikely sequence of events. But the impact is high if it hits, and addressing the issue is critical. == Background == This bug has been in the SDP since 2016.2.21193 (December 2, 2016), and affects versions between 2016.2.21193 (December 2, 2016) and 2018.1.23583 (2018/02/08), inclusive. Versions older or newer are unaffected, and the Windows SDP is unaffected. The issue is in script the recreate_db_checkpoint.sh. The default SDP crontab calls this script only twice a year, and is sometimes disabled entirely as it is optional. The script replaces live databases in P4ROOT with fresh, regenerated-from-a-checkpoint databases from the offline_db tree maintained by the SDP. The default crontab calls recreate_db_checkpoint.sh twice per year, on the first Saturday in January and July at 6:05 PM on the master server's time zone. The issue only occurs when the following are true: * The storage volume used for archive files is shared (e.g. via NFS or SAN) across a master and its HA server. * A failover from the master server to the HA replica has been done. * The recreate_db_checkpoint.sh script runs accidentally on the out-of-commission master (e.g. via a cron that everyone forgot about still running on the old master). The negative impact occurs after a failover-then-failback situation, when the script is run on the old master, but (due to shared storage) rotates database symlinks on the new master server machine. It is not likely to hit many customers, but when it does, the impact is an outage and needing to recover from a checkpoint and journal. (Luckily, those are always available with the SDP). === A QUICK FIX === Customers should DELETE these two scripts from the installation: /p4/common/bin/recreate_db_checkpoint.sh /p4/common/bin/recreate_db_sync_replica.sh Then remove any calls to these two scripts to it in the crontab of the OS account under which Perforce runs on any and all Perforce server machines. This OS account is typically 'perforce' or 'p4admin'. If you are not comfortable with the SDP, this is a fast, safe, easy fix. It only requires login access to the machine and OS file permissions sufficient to delete the scripts. It can be applied immediately by anyone with login access to the Perforce server machine. It does NOT require an SDP update. After making this change, the HA replica that shares archvies with its master server must be reseeded from the latest checkpoint on the master. This quick fix will remove the capability to occasionally replace live databases with fresh ones regenerated from a checkpoint. That functionality is non-critical to most customers. === THE QUICK SDP PATCH === A quick SDP patch has been be release that simply deletes this script and references to it in the crontab and documentation. (A fixed version of the script will likely re-appear in a future release). === A BETTER, MORE SOPHISTICATED FIX === For customers who want to preserve the capability to routinely replace live databases with fresh ones regenerated from a checkpoint, a workaround can be done by making a change to the SDP structure rather than deleting the recreate_db_checkpiont.sh script. The solution outlined below has been proven to work. If you are comfortable with the SDP, this is the best fix. Details: Since the early days of the SDP in 2007, it has been structured so that the /p4 directory was on the root volume (/), and the individual SDP instance-specific directories, e.g. /p4/1, were on the storage volume used for archive files (often named /hxdepots or /depotdata, but can be different at any given customer site). The instance-specific directories contained a mix of regular directories (for things stored on the archive files volume) and symlinks. To fix this issue, restructure it so that the /p4 directoryand instance-specific directories like /p4/1 are ALL on the root volume (/). The instance-specific directories contains only symlinks and .p4tickets/.p4trust files in this structure. This fix can be applied manually, and does not require an SDP upgrade. Further, it will work with future versions of the SDP, as this structural change to the symlink and directory structure of /p4 and /p4/N directories was on track to be included in a future release of the SDP for performance reasons prior to detection of this bug. (The performance benefit is ensuring that access to latency-sensitive /p4/N/root does not pay a high latency tax going thru a /p4/N symlink on a shared storage volume). === FUTURE FIX === A future SDP release will provide a fix that preserves the capability to routinely replace live databases with fresh ones regenerated from a checkpoint. Customers will need to update to the latest SDP to get the new version when it is available. « | |
SDP-366 | Optimize display of Support messaging. | |
Add Job |