known-patterns.md #1

  • //
  • p4mona/
  • dev/
  • p4-rca-agent/
  • docs/
  • known-patterns.md
  • Markdown
  • View
  • Commits
  • Open Download .zip Download (1 KB)

Known Incident Patterns

TODO: Document P4 server incident signatures as they are identified and validated.

Pattern: Database Wedge

Description: Write lock on a db.* table (commonly db.rev, db.have, db.working) causes cascading pile-up of read-waiting processes.

Log signature:

  • One process with high write held time on a table
  • Multiple processes with high read wait time on the same table
  • p4 monitor show -ale shows many W (waiting) processes

Detection: WedgeDetector in detector.py

Corrective actions (in tier order):

  • Tier 1: Alert on-call with diagnosis and table name
  • Tier 4 (if pre-authorized): p4 monitor terminate <pid> on lock-holder

Pattern: Slow Submit

Description: Single p4 submit with long compute/lock time, no pile-up.

Detection: SlowCommandDetector

Pattern: Checkpoint Under Load

Description: p4 admin checkpoint holding broad locks while users are active.

Detection: WedgeDetector (checkpoint holds write locks on multiple tables)

# Known Incident Patterns

TODO: Document P4 server incident signatures as they are identified and validated.

## Pattern: Database Wedge

**Description:** Write lock on a db.* table (commonly db.rev, db.have, db.working)
causes cascading pile-up of read-waiting processes.

**Log signature:**
- One process with high `write held` time on a table
- Multiple processes with high `read wait` time on the same table
- `p4 monitor show -ale` shows many W (waiting) processes

**Detection:** `WedgeDetector` in `detector.py`

**Corrective actions (in tier order):**
- Tier 1: Alert on-call with diagnosis and table name
- Tier 4 (if pre-authorized): `p4 monitor terminate <pid>` on lock-holder

## Pattern: Slow Submit

**Description:** Single `p4 submit` with long compute/lock time, no pile-up.

**Detection:** `SlowCommandDetector`

## Pattern: Checkpoint Under Load

**Description:** `p4 admin checkpoint` holding broad locks while users are active.

**Detection:** `WedgeDetector` (checkpoint holds write locks on multiple tables)
# Change User Description Committed
#1 32636 bot_Claude_Anthropic Scaffold p4-rca-agent repo: directory structure, data models, layer stubs, test fixtures, config, docs.
Covers briefing tasks 2 and 3.
#review-32637 @robert_cowham @tom_tyler