p4verify.py — Functional Overview and CLI Parameters =================================================== Purpose ------- p4verify.py is a Perforce (Helix Core) verification driver intended to run on a Perforce server host configured with the Server Deployment Package (SDP). It: 1) Verifies shelved changelists first (shelves verification). 2) Verifies depots and/or the whole server in changelist “windows” (ranges). 3) Can gate/pace verification launches to avoid overloading replication by: - Pull queue size (p4 pull -ls) with optional limit auto-tuning - Replica journal lag (p4 pull -ljv) to ensure the replica is not too far behind 4) Optionally auto-tunes local verify concurrency based on observed throughput. 5) Produces separate logs for summary, full verify output, errors, shelves output, and tuning events. High-level Workflow ------------------- 1. SDP Environment Loading - Uses /p4/common/bin/p4_vars to load P4PORT/P4USER/etc into the process environment. - Validates required SDP helper scripts are present (backup_functions.sh). 2. Logging Setup - Requires the LOGS environment variable to be set. Log files are created under $LOGS. - Prior log files from previous runs are removed at startup. 3. Shelves Verification (first) - Finds shelved changelists using p4 changes -s shelved (tries -Mj first, then -ztag, then text parsing). - For each shelved changelist N, runs: p4 -Ztrack -s verify --only MISSING -q[ t? ] -S @=N - Shelves output is written ONLY to p4verify-shelves.log while shelves are running. - After all shelves are completed, p4verify-shelves.log is appended ONCE into verify_output.log between “BEGIN/END SHELVES” markers. 4. Main Verify (depots/server) - If --shelves-only is used: skips all depot/server verification. - If --depot is used: verifies only that depot. * spec depots: verifies //DEPOT/... (no changelist windowing) * unload depots: runs verify -U (only MISSING) to validate archive content * other depots: verifies submitted changelists in that depot using window ranges - If no --depot is provided: performs a server-wide windowed verify across //...@lo,hi then verifies all spec depots and unload depots as appropriate. 5. Launch Gating (pacing) Before launching each verify process (including shelves), the script can pause launching NEW verify work if either condition is not satisfied: A) Journal-lag gating (p4 pull -ljv) - Reads master/replica journal state and computes: behind = master_journal - replica_journal - If behind > --journal-behind-max (default 1), the script pauses launching new verify processes until behind <= threshold. - While paused, it still reaps any completed verify jobs so logs continue to update. - If parsing fails or p4 pull -ljv fails, journal gating is disabled for the run (to avoid deadlocks). B) Pull-queue gating (p4 -Ztrack -Mj pull -ls) - Reads replicaTransfersTotal (or parses File transfers: ... total) - If pull_total > --pull-queue-limit (default 1000), pauses launching new verify processes until pull_total <= limit. - Optional auto-tuning adjusts the pull limit based on the “lapse” time measured by -Ztrack: * Fast response -> raise to max (3000) * Slow response -> reduce by 10% (down to min 500) - If it cannot read pull queue total, pull gating is disabled for the run. 6. Window Sizing (optional adaptive) - The script verifies changelists in windows, e.g. //...@1,10 then //...@11,20 etc. - If adaptive window sizing is enabled (default), it estimates seconds-per-changelist using an EMA and chooses the next window size to target --target-seconds runtime per job (bounded by --min-window and --max-window). 7. Local Concurrency (optional auto-tuning) - Verification jobs run as multiple subprocesses controlled by a local throttle. - If auto-thread tuning is enabled (default), the script periodically evaluates throughput (changelists per second) and nudges concurrency up/down to improve throughput, bounded by --max-threads. 8. Completion and Findings Scan - All verify command stdout/stderr is captured into verify_output.log as “BEGIN/END” blocks. - A post-pass scans verify_output.log for “BAD!”, “MISSING!”, or “p4 help max” and reports those lines in the summary log. - Exit codes: 0: OK 1: Verify command failure(s) occurred 2: Verify findings detected (BAD/MISSING/max), but commands may still have succeeded Outputs / Log Files ------------------- All logs are written beneath $LOGS: - verify_output.log Full output from verify commands, including BEGIN/END markers per job. Includes shelves section appended once after shelves completion. - p4verify.log Summary log: high-level progress, gating pauses/resumes, tuning adjustments, final status. - p4verify_errors.log Errors log: failed command invocations, stderr details, and tails of failing jobs. - p4verify-shelves.log Shelves-only output log while shelves verification is running. Later appended into verify_output.log once. - p4verify_autotune.csv CSV log of tuning events (START, END, PULL_LIMIT changes, CONCURRENCY changes, etc.) Important Environment / SDP Dependencies ---------------------------------------- - LOGS must be set in the environment (SDP sets this in normal operation). - /p4/common/bin/p4_vars must exist. - P4PORT and P4USER must be set after loading p4_vars. - Optionally runs $P4CBIN/p4login if present (best-effort). CLI Usage --------- Typical usage: ./p4verify.py Examples: # Verify shelves only: ./p4verify.py myinst --shelves-only # Verify only a depot: ./p4verify.py myinst --depot depotname # Verify server-wide starting at changelist 500000: ./p4verify.py myinst --min-changelist 500000 # Disable journal gating and pull gating: ./p4verify.py myinst --journal-behind-max -1 --pull-queue-limit 0 # Increase initial concurrency and allow tuner up to 300: ./p4verify.py myinst -P 180 --max-threads 300 Parameters (CLI) ---------------- Positional: instance SDP instance name. If omitted, defaults to $SDP_INSTANCE. Required unless $SDP_INSTANCE is set. General: -P, --concurrency INT (default: 120) Initial max number of verify subprocesses launched concurrently. This is also the starting point for auto-thread tuning. -D, --debug (flag) Enables debug logging (includes LAUNCH lines and additional command traces). Shelves / Selection: -s, --shelves-only (flag) Run shelves verification only; skip depot/server verification. -c, --min-changelist NUM (optional) Lower bound changelist for windowed verification. - For server-wide verify: verifies //...@min,head - For depot verify: uses min as the starting changelist for depot windowing. --depot DEPOTNAME (optional) Verify only this depot. Behavior depends on depot type: - spec depot: verifies //DEPOT/... (no windowing) - unload depot: uses verify -U (only missing) on //DEPOT/... - other depots: verifies submitted changelists with windowing over //DEPOT/...@lo,hi --no-transfer (flag) Forces verify without -t (transfer disabled). This mirrors SDP behavior when SHAREDDATA is true. Windowing / Adaptive window sizing: --window-size INT (default: 10) Initial window size for changelist windowed verification. --no-adaptive-window (flag) Disables adaptive window sizing. Window size remains fixed at --window-size. --target-seconds INT (default: 180) Target runtime per verify job (used by adaptive window sizing). --min-window INT (default: 10) Minimum window size when adaptive windowing is enabled. --max-window INT (default: 24) Maximum window size when adaptive windowing is enabled. Concurrency auto-tuning: --auto-threads (flag; default: enabled) Enables auto-tuning of verify concurrency based on throughput. --no-auto-threads (flag) Disables auto-tuning of verify concurrency. --max-threads INT (default: 200) Upper bound for concurrency when auto-thread tuning is enabled. Pull-queue gating (replication transfer queue): --pull-queue-limit INT (default: 1000) Pauses launching new verify jobs when pull_total exceeds this limit, using: p4 -Ztrack -Mj pull -ls Set to 0 to disable pull queue gating and its auto-tuning. --pull-check-interval INT (default: 60) Interval in seconds between pull queue checks (and also the sleep period while paused). Journal-lag gating (replication journal catch-up): --journal-behind-max INT (default: 1) Pauses launching new verify jobs if the replica is more than this many journals behind the master, based on: p4 pull -ljv Set to -1 to disable journal-lag gating. --journal-check-interval INT (default: 30) Interval in seconds between journal-lag checks (and also the sleep period while paused). Notes / Operational Considerations ---------------------------------- - “Pausing verify launches” means the script stops launching new verify subprocesses; it does not kill already-running verify processes. - If either gating mechanism fails to read or parse its signal (pull queue total or journal state), that gating mechanism is disabled for the remainder of the run to avoid deadlock. - SHAREDDATA=true (in the environment) will automatically disable transfer (-t) even if --no-transfer is not set. - The script deliberately avoids verifying edge servers (SERVER_TYPE p4d_edge or p4d_edgerep), because those should run in cache mode and are typically not verified in this workflow.