TODO: Document P4 server incident signatures as they are identified and validated.
Description: Write lock on a db.* table (commonly db.rev, db.have, db.working) causes cascading pile-up of read-waiting processes.
Log signature:
write held time on a tableread wait time on the same tablep4 monitor show -ale shows many W (waiting) processesDetection: WedgeDetector in detector.py
Corrective actions (in tier order):
p4 monitor terminate <pid> on lock-holderDescription: Single p4 submit with long compute/lock time, no pile-up.
Detection: SlowCommandDetector
Description: p4 admin checkpoint holding broad locks while users are active.
Detection: WedgeDetector (checkpoint holds write locks on multiple tables)
# Known Incident Patterns TODO: Document P4 server incident signatures as they are identified and validated. ## Pattern: Database Wedge **Description:** Write lock on a db.* table (commonly db.rev, db.have, db.working) causes cascading pile-up of read-waiting processes. **Log signature:** - One process with high `write held` time on a table - Multiple processes with high `read wait` time on the same table - `p4 monitor show -ale` shows many W (waiting) processes **Detection:** `WedgeDetector` in `detector.py` **Corrective actions (in tier order):** - Tier 1: Alert on-call with diagnosis and table name - Tier 4 (if pre-authorized): `p4 monitor terminate <pid>` on lock-holder ## Pattern: Slow Submit **Description:** Single `p4 submit` with long compute/lock time, no pile-up. **Detection:** `SlowCommandDetector` ## Pattern: Checkpoint Under Load **Description:** `p4 admin checkpoint` holding broad locks while users are active. **Detection:** `WedgeDetector` (checkpoint holds write locks on multiple tables)
| # | Change | User | Description | Committed | |
|---|---|---|---|---|---|
| #1 | 32636 | bot_Claude_Anthropic |
Scaffold p4-rca-agent repo: directory structure, data models, layer stubs, test fixtures, config, docs. Covers briefing tasks 2 and 3. #review-32637 @robert_cowham @tom_tyler |