# Session Log — 2026-06-16 Work on `trim_excess_metadata.sh`. Bumped from **v2.2.0** to **v3.0.0**. ## Ask 1. Review ToDo.md and implement all feasible items. 2. Set up a realistic lab test environment (full simulation of post-divestiture standalone server). 3. Run the script live against the test environment and verify results. 4. Add `p4 snap` phase to resolve lazy-copy archive leakage. ## Key Decisions - **Version**: 3.0.0 — major change warranting major version bump. - **Shelf delete failures**: Keep as errors (not journal patch). User may handle via journal patch manually later. - **Server spec to keep**: `p4d_ffr_gf` (the filtered forwarding replica spec). - **Style**: All spec-piping operations use temp files (`mktemp`) so the spec content is visible in error messages. - **p4 snap**: New Phase 17. Without `-snap` flag, propose commands; with `-snap`, execute. Always run snap BEFORE applying the journal patch. ## Lab Environment ### Topology (Battle School Workshop — 5 servers, same subnet) | Host | Role | ServerID | Port | |------|------|----------|------| | p4c-bos-01 | Commit (master) | commit.p4demo.1 | 1999 | | p4c-bos-02 | HA standby | p4d_ha_bos | 1999 | | p4c-nyc-03 | DR standby → sacrificed for test | p4d_fs_nyc → p4d_ffr_gf → p4d_commit_gf | 1999 | | p4c-syd-04 | Edge | p4d_edge_syd | 1999 | | p4c-syd-05 | Edge HA | p4d_ha_edge_syd | 1999 | All SSH accessible as `perforce` OS user without password. ### Test Setup Procedure (Lab 0 → trim test environment on nyc-03) **Starting state**: Standard Battle School Lab 0 reset. **Step 1**: Add `gf` site tag. ```bash if ! grep -q ^gf /p4/common/config/SiteTags.cfg 2>&1; then echo 'gf: GF - Filtered Forwarding Replica test site' >> /p4/common/config/SiteTags.cfg fi ``` **Step 2**: Run mkrep.sh (from bos-01, as perforce): ```bash mkrep.sh -t ffr -s gf -r p4c-nyc-03 -i 1 ``` Creates: server spec `p4d_ffr_gf`, service user `svc_p4d_ffr_gf`, all configurables. **Step 3**: Add RevisionDataFilter and ArchiveDataFilter to `p4d_ffr_gf` server spec: ``` RevisionDataFilter: //jam/... //pb/... ArchiveDataFilter: //jam/... //pb/... ``` **Step 4**: Rotate journal, create filtered seed checkpoint, stop p4d_fs_nyc, reset ServerID to p4d_ffr_gf, load checkpoint. ```bash # On bos-01: rotate_journal.sh 1 p4d_1 -r /p4/1/offline_db -P p4d_ffr_gf -J off -Z \ -jd /p4/1/checkpoints.ffr_gf/p4_1.ckp.ffr_gf.1.gz # On nyc-03: sudo systemctl stop p4d_1 mkdir -p /p4/1/checkpoints.ffr_gf echo 'p4d_ffr_gf' > /p4/1/root/server.id load_checkpoint.sh 1 /p4/1/checkpoints.ffr_gf/p4_1.ckp.ffr_gf.1.gz sudo systemctl start p4d_1 ``` **Fix service user password expiry** (dm.user.resetpassword=1 triggers on new users): ```bash # On bos-01: p4 passwd svc_p4d_ffr_gf # set a known password # On nyc-03 — log the service user in: p4 -p 1999 -u svc_p4d_ffr_gf login # enter password set above ``` **Step 5**: Blast depot archives on nyc-03: ```bash ssh perforce@p4c-nyc-03 'rm -rf /p4/1/depots/*' ``` **Step 6**: Pull filtered archives via p4verify: ```bash ssh perforce@p4c-nyc-03 'p4verify.sh 1' ``` Expected: only //jam/... and //pb/... archive files pulled. But note: lazy-copy archives may land in /p4/1/depots/depot/ even though //depot/... is filtered — this is correct behaviour (see Lazy Copy section below). **Step 7**: Promote to standalone commit: ```bash # On nyc-03: sudo systemctl stop p4d_1 echo 'p4d_commit_gf' > /p4/1/root/server.id # new ID — avoids inheriting db.replication=readonly sudo systemctl start p4d_1 # Clear auth.id so the standalone handles its own auth (no longer points to central auth cluster): P4PORT=localhost:1999 p4 configure unset auth.id # Do NOT touch run.users.authorize — just ensure you are logged in: P4PORT=localhost:1999 p4 login # enter admin password ``` Verify: `p4 -p p4c-nyc-03:1999 info` → `ServerID: p4d_commit_gf, Services: standard` **Step 8**: Create test config files (on bos-01, in /home/perforce/tem/): ```bash cat > .p4config.gf < keep_users.gf.txt echo "testers" > keep_groups.gf.txt # Randall_Scott is sole member → tests last-group-member fix ``` **Step 9**: Run the script: ```bash # Dry run first: bash trim_excess_metadata.sh gf # Then live: bash trim_excess_metadata.sh gf -y # Then with snap: bash trim_excess_metadata.sh gf -y -snap ``` ## Lazy Copy / Archive Leakage (CRITICAL) ### Background Perforce uses a **lazy copy** mechanism for branching: instead of physically copying archive files, db.rev records in the new path point to the *existing* archive in the original path via db.storage. No physical file copy happens at branch time. ### Consequence for filtered replication When a RevisionDataFilter keeps `//jam/...` but filters out `//depot/...`, p4d during `p4verify.sh` will pull **all archives needed to make //jam/... fully accessible** — including archives physically stored in `/p4/1/depots/depot/` that serve as the backing storage for files in `//jam/...` via lazy copy. This is correct and expected. **After the filtered replica is promoted to standalone:** - `//depot/...` has 0 db.rev records (filtered) → appears empty - `//depot/...` db.storage still has entries for lazy-copy-source archives - Physical archives exist in `/p4/1/depots/depot/` that `//jam/...` depends on - `p4 depot -df depot` → fails with "isn't empty of archive contents" **This is observed in our lab:** - `/p4/1/depots/depot/`: 78 archive files (Jam MAIN src) — needed by `//jam/...` - `p4 snap -n //jam/...` confirms: 50+ files in `//jam/...` are lazy copies from `//depot/Jam/MAIN/src/` ### Resolution: p4 snap + journal patch 1. **`p4 snap //jam/... //pb/...`** — physically copies archives into their target depot directories, breaks lazy-copy chains. After snap: jam/ has 348 archive files, pb/ has 441 (up from 4 and 9 respectively). 2. **Journal patch** — removes depot spec entries from db.domain via `p4d -jr ` while offline. 3. **After snap AND journal patch**: archives in `/p4/1/depots/depot/` are no longer referenced by any db.storage record for kept paths → can be safely deleted. ### DO NOT do before snapping: - `rm -rf /p4/1/depots/depot/` — would corrupt `//jam/...` by removing its lazy-copy backing archives. ## Changes Made to trim_excess_metadata.sh (v2.2.0 → v3.0.0) ### Bug fixes - **Phase 7 last-group-member retry**: when `p4 user -df` fails with "last member of group X", script extracts group name, adds `p4admin` as Owner via temp file, retries user deletion. - **Duplicate Phase 2 block**: removed duplicate client message block. - **3× typo** `"excepot"` → `"except"` in Phases 2/3/4. - **`p4 fix -d` syntax**: was `p4 fix -d -c CL -j Job` (invalid) → `p4 fix -d -c CL Job` (positional). - **`p4 server -d` syntax**: was `p4 server -f -d serverID` (invalid -f) → `p4 server -d serverID`. - **Shelved CL failure counting**: was `msg` (not counted) → `errmsg` (counted in ErrorCount). ### New phases implemented - **Phase 9**: Job/fix cleanup — deletes fixes first (`p4 fix -d -c CL Job`), then jobs (`p4 job -df`); resets jobspec from `default_jobspec.p4s`. - **Phase 11 (enhanced)**: Failed front-door depot deletions appended to journal patch file (`trim_excess_metadata..jnl`); summary notes path and apply command. - **Phase 12**: Server spec cleanup — deletes all except `p4d_ffr_gf`. - **Phase 13**: Remote spec cleanup — deletes all. - **Phase 14**: Typemap reset — writes empty `Typemap:\n` via temp file. - **Phase 15**: Triggers reset — copies `default_triggers.p4s` via temp file. - **Phase 16**: Protections cleanup — rebuilds table keeping lines for kept users/groups; appends super entries for p4admin and perforce. - **Phase 17**: Snap lazy copies — proposes `p4 snap ///...` for each non-empty local/stream depot. Executes with `-snap` flag. ### Operator Tips added to -man 1. Dry run first. 2. Space for journal bloat. 3. Run snap (with `-snap`) BEFORE applying journal patch. 4. After snap, filtered-out depot archive dirs can be safely deleted. 5. Apply journal patch while p4d is offline. ## Live Test Results (on p4c-nyc-03) ### Run 1 (bugs present) - Exit 39 = 33 fix failures (wrong -j syntax) + 6 server spec failures (wrong -f flag) - Journal patch created with 5 depot entries (HR, depot, gwt, gwt-streams, system) ✅ - Phases 1-10, 13-16 all worked correctly ### Run 2 (after syntax fixes) - Exit 0, no errors, 1 warning (Stream phase not implemented) ✅ - Jobs: 17 deleted (plus 24 in run 1), Fixes: 33 deleted, Server specs: 6 deleted ✅ ### Run 3 (shelve error counting fix) - Exit 4 = 4 shelved CL failures (CLs in filtered depots, no shelved content to delete) ✅ - These are expected and intentionally kept as errors per user decision. ### Run 4 (with -snap) - `p4 snap //jam/...` → 50+ lazy copies resolved from //depot/Jam/MAIN/src/ - `p4 snap //pb/...` → lazy copies resolved - jam/: 4 → 348 archive files; pb/: 9 → 441 archive files ✅ - depot/ archives now orphaned (safe to delete after journal patch applied) ### Post-run manual steps - Deleted orphaned archives: `ssh perforce@p4c-nyc-03 "rm -rf /p4/1/depots/depot/"` ✅ (HR, gwt, gwt-streams, system dirs were already absent — never pulled by p4verify) - Applied journal patch to delete 5 depot specs from db.domain (see below) ✅ - Final live depot list: Perforce (remote), jam, pb, spec, unload ✅ ## Journal Patch Format — CRITICAL FINDINGS ### Perforce journal verb glossary (from Tom, confirmed by testing) | Verb | Meaning | Notes | |--------|--------------------|-------| | `@rv@` | replace version | Write a full DB record (live journal format) | | `@dv@` | **delete value** | Delete a DB record — requires ALL fields | | `@pv@` | put version | Write a full DB record (checkpoint format) | | `@dl@` | delete library | Delete a versioned file/archive — NOT for DB records | | `@ex@` | execute/commit | Controls when buffered data is flushed to DB during live replication — **NEVER use in patches** | | `@vv@` | verify value | Triggers journal sequence check against db.counters — **NEVER use in patches** | ### Correct format for depot spec deletion ``` @dv@ 8 @db.domain@ @@ @@ @@ @@ @@ @@ @@ @@ @@ ``` - All fields must be present, even if they are empty placeholders (`@@` for strings, `0` for ints). - `8` = db.domain table schema version. - Get the exact record via: `p4d -r -jd - db.domain | grep " @db.domain@ @@ "` - Replace `@pv@` with `@dv@` on the line extracted from `p4d -jd` output. - The script now does this automatically (fetches full record at journal patch generation time). ### Multi-record patch files work correctly Multiple `@dv@` records in a single `.jnl` file are processed correctly by `p4d -jr` as long as each record has all required fields. Early testing failures were caused by key-only truncated records: the second line was being consumed as continuation fields of the first record. ### Apply command ``` p4d -r -jr .jnl ``` P4ROOT is shown in the script summary output (from `p4 -ztag -F %serverRoot% info -s`). ### What does NOT work (documented for reference) - `@dl@ 8 @db.domain@ @HR@` — "Bad transaction marker!" (`@dl@` = delete library, not DB record) - `@dv@ @db.domain@ @HR@` (no version #) — "Table HR not known" (parses key as table name) - `@dv@ 8 @db.domain@ @HR@` with `-jrF` — "Bad opcode 'db.domain'" (wrong flag for journal format) - Key-only `@dv@` (missing trailing fields) in a multi-record file — second record silently consumed as fields of first - `@ex@` between records — `@ex@` controls live-replication buffer flushing; it is bad news in patches and stops further processing ## p4d Notes - `dm.user.resetpassword=1`: causes newly created service users to require password reset. Fix: `p4 passwd ` then `p4 login ` from replica. - `p4 -ztag -F %fileCount% sizes -sah` returns `""` (empty string, not "0") for unreachable remote depots. Code guards against this using `== 0` comparison (empty string ≠ "0") — keeps remote depot as "non-empty", which is safe (we don't delete it). - `p4 depot -df` checks db.storage (not just db.rev) for archive content. A depot with 0 db.rev records but db.storage entries will refuse deletion. This is the correct behaviour; use journal patch instead. - Promotion from ffr → standalone: change `server.id` to a NEW ID that has no scoped configurables. Reusing `p4d_ffr_gf` would inherit `db.replication=readonly`. - `run.users.authorize=1`: do NOT remove this configurable. It is a security control. Instead, ensure the operator running trim_excess_metadata.sh is logged in (valid ticket) before running the script. Use `p4login -v` or `p4 login` manually first. The script calls `p4 users` and other commands that require auth — a valid ticket is sufficient. - `auth.id`: when promoting a replica to standalone, the `auth.id` configurable (pointing to the central auth cluster) must be cleared (`p4 configure unset auth.id`) so the standalone server handles its own authentication. This is a one-time setup step, not something the trim script does. ## Files in /home/perforce/tem/ (not in P4 depot) - `.p4config.gf` — P4PORT=p4c-nyc-03:1999, P4USER=perforce, P4TICKETS=/home/perforce/tem/.p4tickets - `keep_users.gf.txt` — contains only `perforce` - `keep_groups.gf.txt` — contains only `testers` (tests last-group-member scenario) ## Bugs Found and Fixed (Session Continuation) ### Bug 1: SSH stdin consumes while-loop input (Phase 11 loop terminates early) - **Symptom**: Phase 11 only examined 1 depot (the first) regardless of how many were present. - **Root cause**: The `while read -r DepotData; done < "$TmpFile"` loop passes its stdin file descriptor to the `ssh` command (run to do `p4d -jd` dump on remote host). SSH reads from stdin, consuming all remaining depot lines from TmpFile. The while loop then exits early. - **Fix**: Added `ssh -n` (redirects SSH stdin from /dev/null) and added `< /dev/null` on the local `p4d -jd` invocation. - **Impact**: Previously, only the first empty depot got a journal patch entry. All subsequent depots were silently skipped. ### Bug 2: `grep -m1` truncates multi-line db.domain records - **Symptom**: Journal patch replay failed: "End of input in middle of word! Bad quoting in journal file at line 2!" - **Root cause**: Some depot descriptions contain embedded newlines. In `p4d -jd` output, the `@text@` field starts on one line and its closing `@` is on the next. `grep -m1` only captured the first line, leaving an unclosed `@` field. - **Fix**: Replaced `grep -m1` with an awk that collects the full logical record — from `@pv@`/`@rv@` start through the next `@pv@`/`@rv@` or EOF. - **Affected depots in this lab**: HR, gwt-streams, system (descriptions have trailing newline). ### Bug 3: awk `exit` triggers END block (double-printing records) - **Symptom**: Each depot appeared twice in the journal patch: once as `@dv@` and once as `@pv@`. - **Root cause**: In awk, calling `exit` from within a rule still executes the END block. The record was printed once in the rule (on `exit`) and again in the END block. - **Fix**: Added a `found=1` flag set before `exit`, and guarded the END block with `!found`. ### Resulting journal patch format (correct) Multi-line records now produce valid multi-line `@dv@` entries, e.g.: ``` @dv@ 8 @db.domain@ @HR@ 100 @@ @@ @@ @@ @bruno@ 1297219747 1297219747 0 @Stream depot for Doc review @ @@ @@ 0 ``` ### Successful end-to-end test result - Phase 11: All 5 empty depot specs (HR, depot, gwt, gwt-streams, system) added to journal patch ✅ - Phase 17: `p4 snap //jam/... //pb/...` resolved all lazy copies ✅ (jam: 4→348, pb: 9→441 archive files) - Journal patch applied: `p4d -r /p4/1/root -jr ` → exit 0 ✅ - Post-patch `p4 depots`: only jam, pb, Perforce (remote), spec, unload remain ✅ - `p4 verify //jam/... //pb/...`: clean ✅ - Note: `/p4/1/depots/depot/` still has 78 archive files — these are now safe to delete since snap resolved all lazy-copy references. ### Note: `p4d -jr NONEXISTENT_FILE` is harmless Tom confirmed: a `p4d -jr` call with a non-existent file simply exits with an error and does NOT affect the running database or require a restart. The "Recovering from..." message is normal p4d startup output. ### Note: Journal patch file must be on the p4d server's filesystem The `p4d -r ROOT -jr FILE` command reads FILE from the LOCAL filesystem of the machine running p4d. If p4d is on a remote host (nyc-03) and the script runs on bos-01, the operator must SCP or otherwise transfer the .jnl file to the remote host before applying. ## p4 snap — Context: "Deep Rename" Operation Tom provided useful context on where `p4 snap` fits in the broader SDP toolbox. A **deep rename** (making it look like a file always had its new path, including all historical revisions) uses a trio of commands: ```bash p4 duplicate //depot/old/path/... //depot/new/path/... # copy history to new path p4 snap //depot/new/path/... //depot/old/path/... # break lazy copy: give new path its own archives p4 obliterate //depot/old/path/... # remove the source path entirely ``` The `snap` step is what severs the lazy-copy link — after snap, `//depot/new/path/...` has its own physical archive files and no longer depends on the old path's archives. This makes the subsequent `obliterate` safe (nothing left pointing into the old archives). In our divestiture handling: - We skip `duplicate` (the kept depots already existed with full history) - We run `snap //jam/... //pb/...` to give those depots their own physical archives (severing lazy-copy links into the filtered-out `//depot/...` archives) - We skip `obliterate` and instead remove the depot spec via journal patch, then `rm -rf` the orphaned archive dirs (Phase 18) This makes Phase 17+18 the moral equivalent of step 2+3 of a deep rename. The `duplicate` step (step 1) already happened implicitly when the filtered forwarding replica was populated via `p4verify`. **Note**: Deep renames involving streams or top-level depot renames are more complex admin operations; the above describes the basic non-stream case. ## Phase 17b Redesign — Eliminate p4d -jd Back-Door (Changes 55) ### Motivation The original Phase 11 used `p4d -jd db.domain` (back-door) to dump the database and extract verbatim journal records for constructing `@dv@` delete entries. This requires knowing P4ROOT, having p4d in PATH, and ideally being on the p4d server host — none of which can be assumed in non-SDP customer environments. ### New architecture **Primary path (p4 storage -d, p4d 2021.1+):** 1. Phase 11: Try `p4 depot -df` — if fails, add to `FailedDepots[]` (don't journal-patch yet) 2. Phase 17: `p4 snap` for kept depots (resolves lazy-copy chains) 3. Phase 17b: For each failed depot: - `p4 storage -d -y ///...` — removes orphaned db.storage entries - Retry `p4 depot -df` — should now succeed (db.storage is clean) - No offline window, no p4d access, no journal patch **Fallback path (journal patch via p4 dbschema):** If `p4 storage -d` fails (unavailable or non-zero exit): - Call `p4 dbschema db.domain` — returns authoritative schema version + field types - Build `@dv@` record dynamically: `int*/intv` fields → `0`, `text/key` fields → `@@` - Only the depot name (key field) is set to its real value - Hardcoded schema v8 field-type list as final fallback if `p4 dbschema` also fails - Still requires operator to apply patch offline: `p4d -r -jr .jnl` ### p4 dbschema db.domain output (p4d 2025.2) Schema version 8, 14 attrs (1 key + 13 non-key): | Field | Name | Type | Format | Placeholder | |-------|----------|------|--------|-------------| | 0 | DOname | key | string | (key — real value) | | 1 | DOtype | int8 | string | 0 | | 2 | DOextra | text | string | @@ | | 3 | DOmount | text | string | @@ | | 4 | DOmount2 | text | string | @@ | | 5 | DOmount3 | text | string | @@ | | 6 | DOowner | key | string | @@ | | 7 | DOupdate | intv | string | 0 | | 8 | DOaccess | intv | string | 0 | | 9 | DOoptions| int | integer| 0 | | 10 | DOdesc | text | string | @@ | | 11 | DOstream | key | string | @@ | | 12 | DOserverid| key | string | @@ | | 13 | DOcontents| int | string | 0 | Generated `@dv@` record format: ``` @dv@ 8 @db.domain@ @@ 0 @@ @@ @@ @@ @@ 0 0 0 @@ @@ @@ 0 ``` ### Variables removed - `P4PortHost`, `P4ServerAddr`, `DomainDumpFile` — no longer needed ### Variables added - `FailedDepots[]`, `FailedDepotMaps[]`, `FailedDepotTypes[]` — deferred from Phase 11 - `DepotsStorageCleaned` — counter for Phase 17b successes - `DbDomainSchema`, `DbDomainSchemaVer`, `DbDomainFieldTypes[]` — fetched once in Phase 17b ### UNTESTED — requires fresh lab run The two new paths have NOT yet been tested against a live p4d: 1. Does `p4 storage -d -y ///...` succeed when db.storage entries have non-zero lbrRefCount (which may be stale after RevisionDataFilter)? 2. Does `p4d -jr` accept `@dv@` with DOtype=0 (placeholder)? The correct stored value is 100 for all observed depot types. ## Test Setup Automation (Change 54) Created `test/` directory with two scripts: - `test/setup_lab.sh` — full Lab 0 → trim-ready setup (12 phases, idempotent) - `test/run_trim_test.sh` — dry run → live run → snap run; prints journal patch commands - `test/README.md` — full documentation including Battle School dependency ## Current State of Changes | Change | Description | |--------|-------------| | 52 | Bugs 1-3 fixed (SSH stdin, grep -m1 multi-line, awk exit+END) | | 53 | Phase 18 added (orphaned archive cleanup, P4DepotRoot) | | 54 | test/ directory: setup_lab.sh + run_trim_test.sh + README.md | | 55 | Phase 17b: p4 storage -d primary + p4 dbschema fallback (removes p4d -jd) | | 56 | Session close-out: keep files, resume notes | | 57 | Session log update: Phase 17b design notes | | 58 | Tom: Added .p4ignore sample file | --- ## 2026-06-17 — Lab Reset + setup_lab.sh Debug Session ### Lab State at Session Start Lab was reset to Lab 0 baseline (journal counter at 43 due to overnight daily_checkpoint.sh cron — harmless). nyc-03 returned to `p4d_fs_nyc` as expected. ### First Run of setup_lab.sh — Bugs Found Ran `bash test/setup_lab.sh`. Several bugs found and fixed: #### Bug A: P4CONFIG overrides SDP shell environment **Symptom:** Phase 3 (`p4 server -o p4d_ffr_gf`) failed: ``` Access for user 'tom_tyler' has not been enabled by 'p4 protect'. ``` **Cause:** The calling shell had `P4CONFIG` pointing to `.p4config.local`, which sets `P4USER=tom_tyler`. The script's own `p4` calls inherited this override — bypassing the SDP shell's `P4USER=perforce`. **Fix:** Added `unset P4CONFIG` near the top of `setup_lab.sh` (before any `p4` calls). Unset is the right approach; setting `export P4USER=perforce` would also work but is less robust. With P4CONFIG unset, the SDP shell environment (`P4USER=perforce`, `P4PORT=1999`) takes effect naturally. #### Bug B: load_checkpoint.sh argument order wrong **Symptom:** Phase 6 failed: ``` Error: Specified checkpoint does not exist: 1 ``` **Cause:** Script called `load_checkpoint.sh ${SDP_INSTANCE} '${FILTERED_CKP}'` (instance first), but the correct SDP calling convention is checkpoint-file first: ``` load_checkpoint.sh -i -y ``` The `-y` flag is also required to suppress interactive confirmation prompts. **Fix:** Changed to `load_checkpoint.sh '${FILTERED_CKP}' -i ${SDP_INSTANCE} -y`. #### Bug C: Phase 6 idempotency checked server.id file, not live p4d **Cause:** Phase 6 would skip the checkpoint load if `server.id` already said `p4d_ffr_gf`. But the server.id file is written *before* `load_checkpoint.sh` runs — so a failed load left the file in place, causing the phase to be incorrectly skipped on re-run. **Fix:** Changed idempotency check to connect to p4d on nyc-03 and verify it responds with the expected ServerID. If p4d is down or returns a different ID, the full setup runs. #### Bug D: Phase 10 same server.id idempotency problem Same as Bug C but for the promotion phase. Changed to check live p4d responds as `p4d_commit_gf` with `services=standard`. #### Bug E: Phase 10 — p4 configure unset auth.id before p4login **Cause:** `p4 configure unset auth.id` requires an authenticated connection. The original order was: configure first, then p4login. This would fail silently (had `|| true`). **Fix:** Swapped order — `p4login -v 1` first, then `p4 configure unset auth.id`. #### Bug F: Phase 9 p4verify.sh ran without perforce user ticket on FFR **Cause:** Phase 7 only logged in the *service user* (`svc_p4d_ffr_gf`). Phase 9 runs `p4verify.sh` as the perforce OS user, which requires a valid ticket for the FFR. With `run.users.authorize=1` and `security=4`, this would fail or fall back to anonymous access. **Fix:** Added `p4login -v 1` call on nyc-03 in Phase 7, after the service user login. With `rpl.forward.login=1` on the FFR, this login is forwarded to the commit server and produces a ticket that the FFR also accepts. ### All Bugs Fixed in Change 59 (pending) All six bugs fixed in `test/setup_lab.sh`. Script now needs a clean lab run to verify. Second lab reset requested before session close. ### Phase 17b still untested The primary path (`p4 storage -d -y`) and the `@dv@` fallback path in Phase 17b have still not been tested against a live p4d. This remains the top priority for the next session. --- ## 2026-06-17 — Continued: setup_lab.sh Debug + Phase 17b Live Validation ### Additional Bug Found: Phase 2 Idempotency (Bug G) `p4 server -o ` always returns a template with `ServerID: ` in it (p4d behavior, not a bug). The idempotency check `grep "^ServerID:"` therefore always fired, and mkrep.sh was always skipped. Fix: use `p4 server --exists -o ` which errors if the spec does not exist. ### Additional Bug Found: Phase 6 Missing MD5 File (Bug H) `load_checkpoint.sh` requires an accompanying `.md5` file. The scp only copied the `.gz` file. Fix: also copy `${FILTERED_CKP}.md5`. Also: the existing read-only checkpoint file on nyc-03 caused scp to fail on re-run — fixed by `rm -f` before scp. ### Additional Bug Found: `services` vs `serverServices` ztag field (Bug I) `p4 -ztag info -s` returns the field as `serverServices`, not `services`. This caused the verification check in Phase 12 of setup_lab.sh and the pre-flight check in run_trim_test.sh to always show a warning. Fixed in both scripts. ### Additional Bug Found: Stale Auth Ticket After Promotion (Bug J) After promoting nyc-03 to standalone and unsetting `auth.id`, the ticket in `.p4tickets` (issued via the FFR's `rpl.forward.login=1`) was no longer valid. `p4login` (SDP script) targets bos-01 and does not help here. Fix: use raw `p4 login -a < .p4passwd.p4_1.admin` with `P4CONFIG` set to `.p4config.gf` at end of Phase 11 in setup_lab.sh, and similarly in run_trim_test.sh pre-flight. ### setup_lab.sh Now Runs to Completion ✅ All 12 phases completed successfully. Final state verified: - ServerID=p4d_commit_gf, Services=standard ✅ - Archive counts: 78 files in depot/, 4 in jam/, 9 in pb/ ✅ ### trim test run_trim_test.sh ✅ — All 3 Passes **Pass 1 (dry run): exit 0** ✅ **Pass 2 (live run): exit 4** ✅ (4 shelved CLs expected to fail) **Pass 3 (snap run): exit 4** ✅ ### Phase 17b — p4 storage -d Behavior (CONFIRMED) `p4 storage -d` only removes entries where `lbrRefCount = 0` (truly orphaned). In our test scenario, all 5 filtered-out depots had storage entries with non-zero lbrRefCount (e.g., //HR/draft/401k.rtf had lbrRefCount 3). These are NOT orphaned from p4d's perspective — they are referenced by storagesx or other tables. Result: `p4 storage -d -y` ran without error ("Storage entries removed"), but `p4 depot -df` still refused to delete the depot because non-zero-refcount entries remain. Phase 17b correctly fell back to journal patch for all 5 depots. Key insight: `p4 storage -d` helps when there are ZERO-refcount orphans (which can occur after snap resolves lazy copies). In this scenario there were none (the storage table was populated from the full checkpoint, not from lazy copies). ### Journal Patch Format — CONFIRMED WORKING ✅ Generated patch: ``` @dv@ 8 @db.domain@ @HR@ 0 @@ @@ @@ @@ @@ 0 0 0 @@ @@ @@ 0 @dv@ 8 @db.domain@ @depot@ 0 @@ @@ @@ @@ @@ 0 0 0 @@ @@ @@ 0 @dv@ 8 @db.domain@ @gwt@ 0 @@ @@ @@ @@ @@ 0 0 0 @@ @@ @@ 0 @dv@ 8 @db.domain@ @gwt-streams@ 0 @@ @@ @@ @@ @@ 0 0 0 @@ @@ @@ 0 @dv@ 8 @db.domain@ @system@ 0 @@ @@ @@ @@ @@ 0 0 0 @@ @@ @@ 0 ``` Applied with: `p4d -r /p4/1/root -jr ` — exit 0 ✅ DOtype=0 (placeholder, real value is 100) was accepted without error. Post-patch depot list: Perforce (remote), jam, pb, spec, unload — exactly expected ✅ ### p4 verify //jam/... //pb/... — CLEAN ✅ 348 files in //jam/..., 441 files in //pb/... — zero MISSING or BADDIGEST. Orphaned archive dirs removed: HR, depot, gwt, gwt-streams, system. Remaining: jam/, pb/, spec/ ✅ ### Change Summary for This Session | Bug | Fix | |-----|-----| | G | Phase 2 idempotency: `p4 server --exists -o ` | | H | Phase 6: copy .md5 alongside .gz; rm -f existing files before scp | | I | `serverServices` not `services` in p4 ztag info output (setup_lab.sh + run_trim_test.sh) | | J | Fresh ticket in .p4tickets via raw `p4 login -a` after standalone promotion | ### Script Readiness Assessment (v3.0.0) trim_excess_metadata.sh v3.0.0 has been tested end-to-end: - All phases run correctly - Phase 17b correctly handles the case where p4 storage -d is insufficient - Journal patch (@dv@ format) is confirmed valid - p4 verify clean after full trim + patch application The script is functionally complete and ready for customer shipment. Remaining concern: the customer's data is much larger; the journal patch approach does not require any back-door access and scales linearly with number of depots to delete. --- ## Change 61: Phase 16b — Spec Depot Obliterate/Delete (v3.1.0) ### Problem After a trim run, the `spec` depot (singleton depot type "spec") had content in it. Root cause: Phases 14 (typemap), 15 (triggers), 16 (protections) each write new versioned entries to the spec depot — writing typemap.p4s, (no triggers in test), and protect.p4s history. If Phase 11 obliterated the spec depot content, Phase 16 would write new entries after the obliterate, leaving the depot non-empty and non-deletable. ### Fix Added **Phase 16b**: runs AFTER Phase 16 (protections), obliterates spec depot content, then tries front-door delete → storage cleanup → journal patch (same cascade as other depots). Also handled in Phase 11 restructuring: - `remote` depot type: `p4 depot -d` (no `-f` needed — metadata only, no archives) - `spec` depot type: record name/map to `SpecDepots[]`/`SpecDepotMaps[]` arrays for deferred Phase 16b - `unload` depot type: `p4 depot -df` directly (no snap needed; content from pre-filter period) - `local`/`stream`: existing flow (check empty, keep or attempt delete) ### Test Results (nyc-03, 2026-06-17) **Non-snap live run:** - Phase 11: spec depot recorded for Phase 16b - Phase 16b: `p4 obliterate -y //spec/...` → purged 2 revisions (protect.p4s#1, #2) - `p4 depot -df spec` → failed (orphaned db.storage entries from filtered checkpoint) - `p4 storage -d` → no-op (lbrRefCount=1 entries, not zero; from filtered checkpoint) - spec added to FailedDepots for Phase 17b **Snap run (after above live run):** - Phase 16b: obliterate (1 more revision from Phase 16 rerun) → depot delete fails again - Phase 17b: `p4 storage -d -y //spec/...` → no-op (still lbrRefCount=1) - Journal patch generated: `@dv@ 8 @db.domain@ @spec@ 0 @@ @@ @@ @@ @@ 0 0 0 @@ @@ @@ 0` - Patch applied via `p4d -r /p4/1/root -jr spec_patch.jnl` - `p4 depots` → only jam, pb remain ✅ - `p4 verify //...` → 0 MISSING ✅ - Phase 18: `rm -rf /p4/1/depots/spec` applied manually → clean ### Why orphaned lbrRefCount=1 entries persist The filtered checkpoint includes db.storage entries for spec depot files (branch/*.p4s, client/*.p4s, etc.) with lbrRefCount=1, but NO corresponding db.rev records were replicated (filtered). So: - `p4 obliterate` can't purge them (no db.rev to target) - `p4 storage -d` won't remove them (lbrRefCount != 0) - Only journal patch (remove db.domain) + rm -rf (physical files) resolves this This is the same root cause as all other "foreign" depots (HR, depot, gwt, etc.). The db.storage orphan entries remain in the database, but are harmless — they reference a depot that no longer exists. p4d does not enforce db.storage integrity against deleted db.domain entries. ### Version bump v3.0.0 → v3.1.0 for Phase 16b addition and Phase 11 case restructuring. --- ## Session Close — 2026-06-17 19:16 Session closing for lab reset. Changes submitted: - Change 61: Phase 16b (spec depot deferred obliterate/delete) ### State at close - `trim_excess_metadata.sh` v3.1.0 — change 61 submitted, DVCS clean - `test/setup_lab.sh` — working end-to-end (all bugs A-J fixed) - `test/run_trim_test.sh` — working end-to-end (all 3 passes validated) ### What has NOT been tested yet (next session) A full end-to-end cycle from Lab 0 after v3.1.0 changes — specifically: - The `remote` depot type (Phase 11 `p4 depot -d` branch) - The `unload` depot type (Phase 11 `p4 depot -df` branch) - The `spec` depot type (Phase 16b) — tested in isolation but not from Lab 0 - All of the above in a single run_trim_test.sh invocation ### Next session procedure ```bash cd /home/perforce/tem P4CONFIG=.p4config.local p4 fetch # pull change 61 bash test/setup_lab.sh # Lab 0 → trim-ready (all 12 phases) export P4CONFIG=/home/perforce/tem/.p4config.gf bash test/run_trim_test.sh # dry + live + snap ``` --- ## Change 63: Full Reset Validation — setup_lab.sh Bug K + v3.1.0 Fix (2026-06-17) ### Bug K: Phase 3 RevisionDataFilter idempotency (setup_lab.sh) **Symptom**: After lab reset and re-run, `p4 verify //jam/...` showed MISSING files; `p4 snap` failed with "open for read: ...yyacc,v: No such file or directory". **Root cause**: Phase 3 idempotency check `grep -q "^RevisionDataFilter:"` matched the EMPTY field that p4 always outputs in server spec templates. So Phase 3 ALWAYS skipped, leaving RevisionDataFilter empty. The unfiltered checkpoint included ALL metadata; p4verify only pulled jam/pb archives (6 pb files) but not the depot/ backing files for lazy copies. **Additional root cause**: Even with the correct filter, `p4 verify -q //depot/...` on the FFR returns "no such file(s)" because //depot/... has no revision data in the FFR's RevisionDataFilter. The depot/ backing archives ARE pulled by p4verify.sh (SDP) when it verifies //jam/... and //pb/... — because the FFR's db.storage entries reference depot/ archive paths, and the FFR fetches them from master to satisfy verify. **Fix A**: Phase 3 idempotency check changed to detect a non-empty value: ```bash if grep -qE "^[[:space:]]+//" "$SPEC_TMP" && grep -A5 "^RevisionDataFilter:" "$SPEC_TMP" | grep -q "^[[:space:]]//"; then ``` **Fix B**: Phase 3 filter insertion changed from append (broken) to sed in-place: ```bash sed -i "s|^RevisionDataFilter:\$|RevisionDataFilter:\n\t//jam/...\n\t//pb/...|" "$SPEC_TMP" sed -i "s|^ArchiveDataFilter:\$|ArchiveDataFilter:\n\t//jam/...\n\t//pb/...|" "$SPEC_TMP" ``` **Fix C**: Phase 9 — depot/ backing archives missing from nyc-03 The `p4 verify -q //depot/...` step was added but found to be a no-op (FFR has no //depot/... revision data). Investigation showed: with correct filter, p4verify.sh DOES pull the depot/ backing archives (78 files in depot/, including yyacc,v) because the FFR's db.storage references those paths. Archive counts after correct setup: - /p4/1/depots/depot/: 78 files (backing archives for lazy copies) - /p4/1/depots/jam/: 4 files (directly-modified jam revisions) - /p4/1/depots/pb/: 9 files (directly-modified pb revisions) Zero MISSING files confirmed before trim test. ### v3.1.0 Fix: Phase 16b error count correction **Symptom**: Snap run exit code was 5 instead of expected 4. **Root cause**: Phase 16b called `errmsg` when spec depot fell through to journal patch fallback. Phase 17b uses `msg` for the same scenario on other depots. Inconsistency. **Fix**: Changed Phase 16b spec depot journal patch message from `errmsg` to `msg`. ### Full End-to-End Test Results (v3.1.0, 2026-06-17) **Setup**: setup_lab.sh ran to completion (all 12 phases clean). Archive counts correct. **Trim test** (3 passes): | Pass | Exit | Status | |------|------|--------| | Dry run | 0 | ✅ | | Live run | 4 | ✅ | | Snap run | 4 | ✅ | **Depot type handling (all branches exercised)**: - `remote` (Perforce depot): `p4 depot -d` → deleted ✅ - `unload` (unload depot): `p4 depot -df` → deleted ✅ - `spec` (spec depot): Phase 16b obliterate → journal patch fallback ✅ - `local`/`stream` (HR, depot, gwt, gwt-streams, system): snap → storage cleanup → journal patch ✅ **Snap**: 2 depots snapped (jam, pb) — no failures ✅ **Journal patch**: 6 entries (HR, depot, gwt, gwt-streams, system, spec) ✅ **Post-patch**: `p4 depots` → jam + pb only; `p4 verify //jam/... //pb/...` → 0 MISSING ✅ ### Script Readiness Assessment (v3.1.0) trim_excess_metadata.sh v3.1.0 is **fully validated end-to-end**: - All depot types handled correctly - snap works (backing archives confirmed present after correct filter setup) - Journal patch confirmed valid and accepted by p4d -jr - p4 verify clean after full trim + patch + rm -rf - Exit codes consistent and documented **Ready for customer shipment.** --- ## Session Continuation — 2026-06-17 Evening (Changes 64–66, Ship) ### Changes | Change | Description | |--------|-------------| | 64 | Document streams limitation and Phase 10 empty-identifier noise in -man | | 65 | Clarify "Empty identifier not allowed" root cause (p4d bug job101555/P4-19364) | | 66 | Add BACKGROUND section to -man: full divestiture process overview | ### Change 64: Document known limitations Added **KNOWN LIMITATIONS** section to `-man` output covering: 1. **Phase 8 (Stream cleanup)**: Deferred to future version. Stream spec deletion requires careful ordering of parent/child relationships, mainline vs. virtual streams, and stream history. No impact for environments using only classic/local depots (the current production use case). Manual workaround: `p4 stream -df ` after confirming no clients are mapped. 2. **Phase 10 "Empty identifier not allowed"**: Benign noise from submitted CLs with empty description fields. CLs not deleted; does not affect trim or data integrity. Also updated Phase 8 runtime message from alarming `**NOT IMPLEMENTED**` to clearer `(deferred)`. ### Change 65: Clarify empty-identifier root cause Expanded KNOWN LIMITATIONS entry for Phase 10 errors to reference the specific p4d defect: - **job101555 / P4-19364**: "Unable to delete empty submitted changelist" - Those CLs cannot be deleted until a p4d fix is available - Cleanup deferred until p4d bug is resolved; no impact on trim operation ### Change 66: BACKGROUND section Added new **BACKGROUND** section to `-man` output before DESCRIPTION, covering the full divestiture workflow for operators unfamiliar with the process: 1. Configure the FFR with `RevisionDataFilter` scoped to divested depot paths 2. Wait for replication to stabilize; run `p4 verify` to confirm 0 MISSING archives 3. Promote FFR to standalone commit server (filtered checkpoint load, unset auth.id, services=standard) 4. Run this script to trim excess spec-level metadata 5. Post-trim validation (verify archives, apply journal patch, remove orphaned dirs, manual steps) References `test/setup_lab.sh` and `test/run_trim_test.sh` as a concrete end-to-end example. ### Final State: v3.1.0 Shipped All validation complete. `p4 push` executed (see below). | Check | Result | |-------|--------| | All known bugs fixed (A–K) | ✅ | | All depot types handled | ✅ | | Dry/live/snap exit codes: 0/4/4 | ✅ | | p4 verify 0 MISSING post-trim | ✅ | | -man BACKGROUND + KNOWN LIMITATIONS | ✅ | | DVCS submitted, pushed to public depot | ✅ | ### Open Issues (deferred to future versions) | Issue | Tracking | Priority | |-------|----------|----------| | Phase 8: Stream spec cleanup | Future version | Low (no streams in current prod data) | | Phase 10: Cannot delete empty-description CLs | p4d bug job101555/P4-19364 | Deferred until p4d fix | | Phase 10: Suppress per-CL error noise (184 lines) | Internal improvement | Low | | Extensions depot obliterate + cert cleanup | Manual step; EXTRA MANUAL STEPS documents this | Low | --- ## Session Wrap-Up — 2026-06-17 Late Evening (Changes 68–69) ### Changes | Change | Description | |--------|-------------| | 68 | v3.1.1: fix DepotsFailed decrement bug (DepotsFailed-=1 → (( DepotsFailed-- ))); regenerate command_summary.txt | | 69 | v3.1.2: prefer explicit arithmetic form DepotsFailed=$((DepotsFailed-1)); regenerate command_summary.txt | ### Bug: DepotsFailed decrement (found via shellcheck SC2276) `DepotsFailed-=1` is not valid bash syntax — bash has no `-=` compound assignment operator. Shellcheck reported SC2276 (error): "This is interpreted as a command name containing '='." The line was silently a no-op (failed command invocation; no `set -e`), so `DepotsFailed` was never decremented after a successful Phase 17b storage cleanup. The summary line "Pending (journal patch required)" would show an inflated count. No effect on exit code or correctness of depot deletion, snap, or journal patch. Fix (v3.1.1): `(( DepotsFailed-- ))` Revised (v3.1.2): `DepotsFailed=$((DepotsFailed-1))` — more explicit and readable. Both are equivalent; v3.1.2 preferred for readability. ### Final shipped version: v3.1.2 shellcheck passes cleanly. command_summary.txt regenerated and reflects v3.1.2.