p4-rca-agent-briefing.md #1

  • //
  • p4mona/
  • dev/
  • ai/
  • p4-rca-agent-briefing.md
  • Markdown
  • View
  • Commits
  • Open Download .zip Download (25 KB)

P4 RCA Agent — Project Briefing & Design Kickoff

Document type: Session notes + agent tasking brief
Prepared for: Claude Code CLI agent
Domain expert: C. Thomas Tyler (Perforce Consulting, SDP maintainer)
Date: 2026-04-25
Status: Pre-implementation — design phase


1. Project Vision

Build a locally-deployable, continuously-running AI agent for Perforce Helix Core (p4d) servers that:

  1. Monitors p4d server health in real time by consuming the P4LOG, system commands, and OS-level data
  2. Detects anomalies and known problem signatures (e.g., database wedge scenarios, runaway processes, lock pile-ups)
  3. Performs Root Cause Analysis (RCA) using an on-premise Small Language Model (SLM) — no cloud API calls, no metered token costs
  4. Recommends or takes corrective action according to a governed, safety-tiered policy

The agent must be safe to run on live production P4 servers. It is explicitly not an LLM-cloud solution. Everything runs locally on the server machine.


2. Background & Constraints

2.1 Environment

  • Target platform: Linux servers (Ubuntu, Rocky, SLES) running Perforce Helix Core (p4d)
  • Deployed under the SDP (Server Deployment Package) conventions where applicable
  • Must be compatible with root-level access for some operations (lsof, lslocks, core dump inspection, process management)
  • Some target servers may have no GPU — CPU-only inference must be viable
  • No continuous cloud API usage — cost and privacy prohibit it

2.2 Why SLM, Not LLM

Concern LLM (cloud) SLM (local)
Token cost at continuous operation Prohibitive None after deployment
Data privacy (customer server logs) Logs leave site Air-gapped
Latency for triggered analysis Network RTT Local (acceptable)
Customization via fine-tuning Expensive Feasible on single GPU
Reasoning quality Excellent Good-to-very-good (2024+ models)

2.3 Key Domain Knowledge: The Database Wedge

One of the primary target scenarios is the database wedge: a write lock on a specific P4 database table (db.*) causes a cascading pile-up of processes waiting behind the lock holder. This is:

  • Poorly understood — contributing factors are complex combinations of user behavior and load
  • Difficult to reproduce in sandbox environments (concurrency is lost when replaying journal data)
  • A real-world scenario best validated on production systems
  • Detectable from log patterns + p4 monitor show -ale + lsof/lslocks cross-reference

3. Existing Ecosystem — Do Not Reinvent These Wheels

3.1 go-libp4dlog (Robert Cowham, MIT License)

Repo: https://github.com/rcowham/go-libp4dlog

The authoritative Go library for parsing Perforce p4d text log files. This is the foundation everything else builds on. Import this library rather than writing a P4LOG parser from scratch.

Key tools bundled with it:

  • log2sql — parses P4LOG → SQLite database + VictoriaMetrics historical metrics. Supports JSON and SQL output modes. Can process gzipped logs, multiple files, live tailing.
  • p4locks — parses P4LOG → HTML timeline of table lock contention (read/write, wait/held per PID), with threshold filtering and table exclusion regex. This is directly relevant to wedge detection.
  • p4dpending — surfaces commands still in-flight (no completion record yet). Useful for detecting hung operations.

Output formats: SQLite .db, JSON, SQL INSERT statements, Graphite/VictoriaMetrics .metrics files.

Recommended integration approach: The library exposes a Go API that emits structured command records. Consider a Go component that imports go-libp4dlog and streams parsed records over a local socket or writes to SQLite, feeding a Python analysis daemon. This avoids re-implementing the P4LOG parser and inherits all its edge-case handling and test coverage.

3.2 p4prometheus (Perforce org, MIT License)

Repo: https://github.com/perforce/p4prometheus

Consumes go-libp4dlog for real-time log tailing and writes Prometheus metrics. This is the existing real-time monitoring layer. Pairs with Grafana for dashboards and alerting rules.

The gap this project fills: p4prometheus exposes metrics and alerts, but has no RCA reasoning or corrective action layer. The new agent would either run alongside p4prometheus (using it as a signal source via Prometheus queries) or replicate its log-tailing approach and add the intelligence layer directly.

3.3 p4dbeat (rcowham, MIT License)

Repo: https://github.com/rcowham/p4dbeat
Custom Elastic Beat — sends parsed log records to Elasticsearch. Alternative to the Prometheus path if the target deployment uses the Elastic stack.

3.4 Log Analysis KB Articles

  • https://portal.perforce.com/s/article/2514 — Basic P4 Server Log analysis
  • https://portal.perforce.com/s/article/3088 — Structured logs overview (which logs exist, how to enable, tradeoffs)
  • https://community.perforce.com/s/article/5470 — Additional background

3.5 P4LOG Configuration Prerequisites

For maximum log richness (required for meaningful analysis):

p4 configure set server=3   # Track all command events
p4 configure set track=1    # Enable database tracking records

Log rotation must be configured — server=3 + track=1 produces large logs quickly.

The P4LOG (a.k.a. "unstructured log") is always present and is the primary input. Structured logs (enabled individually via configurables) can supplement it but are off by default. Enable structured logs only where there's net value over the duplication cost.


4. Architecture

4.1 Layered Design

┌─────────────────────────────────────────────────────────────┐
│  Layer 4: Action Execution                                   │
│  Governed tool calls: alert, block user, restart, checkpoint │
│  Validates against policy; writes audit log; enforces        │
│  confidence thresholds                                       │
├─────────────────────────────────────────────────────────────┤
│  Layer 3: SLM Reasoning (Ollama / local inference)           │
│  Receives structured context bundle → produces RCA narrative │
│  + action recommendation. Activated by Layer 2 trigger only. │
│  Model: Phi-4 (MIT) or Mistral 7B (Apache 2.0) recommended  │
├─────────────────────────────────────────────────────────────┤
│  Layer 2: Anomaly Detection & Trigger                        │
│  Rule-based + statistical thresholds on structured signals   │
│  Decides WHEN SLM activates. Prevents continuous inference   │
│  on raw log noise.                                           │
├─────────────────────────────────────────────────────────────┤
│  Layer 1: Structured Signal Extraction                       │
│  go-libp4dlog (log parsing) + p4 monitor show -ale           │
│  + lsof + lslocks + syslog + p4d structured logs            │
│  Outputs: rolling SQLite window (last N minutes of events)   │
└─────────────────────────────────────────────────────────────┘

Key design principle: The SLM is NOT a streaming log consumer. It is an on-demand reasoner activated by the trigger layer. The signal extraction and anomaly detection layers run continuously and cheaply; the SLM runs only when there's something meaningful to reason about.

4.2 Context Bundle

When Layer 2 fires a trigger, Layer 3 receives a context bundle — a structured JSON document assembled from:

  • Relevant P4LOG window (parsed records for the anomaly time window, not raw text)
  • p4 monitor show -ale snapshot (current running commands, PIDs, users, duration)
  • lsof output filtered to p4d process
  • lslocks output filtered to p4d-relevant paths
  • Lock contention summary from p4locks-style analysis of the window
  • Relevant syslog entries from the same window
  • Any matching patterns from the RAG knowledge base (see §4.3)
  • Server metadata: p4d version, configurable settings, SDP instance if applicable

4.3 RAG Knowledge Base

Before considering fine-tuning, build a RAG corpus over:

  • P4 admin guide sections on locking, performance, configurables
  • KB articles on known wedge patterns and lock scenarios
  • SDP documentation and runbooks
  • Any accumulated incident history (annotated log segments)
  • p4 help output for commands commonly involved in incidents

This is the fastest path to domain-specific knowledge without touching model weights.

4.4 SLM Selection

Recommended baseline: Phi-4 (14B, MIT license)

  • Best-in-class reasoning for an SLM as of late 2024/early 2025
  • MIT license — no redistribution restrictions, clean for internal or product use
  • Runs on CPU-only server hardware at acceptable speed for triggered (not continuous) inference
  • Q4 quantization: ~8GB RAM, adequate for most server machines

Fallback for memory-constrained servers: Phi-3.5-mini (3.8B, MIT)

  • ~2GB RAM at Q4
  • Weaker reasoning but viable for classification/triage tasks

Runtime: Ollama (MIT)

  • Standard local SLM deployment on Linux
  • OpenAI-compatible API at localhost:11434
  • Model management, quantization, serving all handled
  • Install: curl -fsSL https://ollama.com/install.sh | sh && ollama pull phi4

5. Evaluation Harness Strategy

5.1 Annotation Over Simulation

Simulating real-world wedge scenarios in a sandbox is not reliably achievable (concurrency is lost in journal replay). The evaluation harness should instead be built around annotation of real log segments:

  1. Collect P4LOG segments from production incidents (with customer permission, sanitized as needed)
  2. Domain experts annotate: "this is a wedge scenario; root cause is X; correct action is Y; resolved at timestamp Z"
  3. Annotated segments become both eval test cases and eventual fine-tuning data

This is the standard approach for niche-domain NLP tasks where simulation is impractical.

5.2 Synthetic Log Fixtures for Unit Testing

For testing the parsing and detection layers, synthetic log fixtures are viable and appropriate. You know the log signature of a wedge — construct minimal log excerpts that exhibit those signatures and verify the detection layer fires correctly. These are not "simulations of user load"; they are unit tests of the signal extraction and trigger logic.

The go-libp4dlog project already has a test suite with log fixtures (p4dlog_test.go). Study those fixtures as examples of how to construct synthetic test inputs.

5.3 Journal Replay for State Seeding (Not Concurrency)

The domain expert's idea: seed a sandbox with a baseline checkpoint, then replay production journals to bring the server to a known-good state representative of a real production environment. This is valuable for:

  • Testing p4d version upgrade impact on performance
  • Testing hardware configuration changes (RAM, storage)
  • Establishing a realistic data state before applying synthetic scripted load

Limitation: journal replay is sequential — it does not reproduce the concurrent lock contention of the original production event. Pair journal-seeded state with load scripts (e.g., parallel p4 -x commands) that approximate the command pattern seen in production logs during the incident window.


6. Safety & Production Deployment Model

6.1 Corrective Action Ladder

Deploy autonomy incrementally. Never start at a high autonomy tier.

Tier Behavior When to advance
0. Observe Agent logs diagnosis + what it would have done. No output to humans. Baseline data collection
1. Alert Agent sends alert/page with diagnosis and recommended action. Human acts. After diagnosis quality is validated
2. Recommend with approval Agent presents proposed action in terminal or via alert; admin confirms with one keypress (timeout = no action). After recommendations are consistently sound
3. Act with timeout Agent acts after N minutes if no human response; conservative action set only (alert, never kill/restart). After approval-mode track record established
4. Autonomous within policy Specific high-confidence, well-understood scenarios (confirmed wedge + known-safe corrective action) get autonomous execution. Audit log always written. After extensive production validation

6.2 Audit Log

Every agent action (including recommended-but-not-taken actions) must be written to a structured audit log. Minimum fields:

  • Timestamp
  • Trigger type and raw signals
  • SLM diagnosis (full text)
  • Action recommended
  • Action taken (if any)
  • Operator response (if applicable)
  • Outcome (populated later if trackable)

6.3 Safety Invariants (Hard Rules)

The agent must never:

  • Modify depot data or metadata
  • Delete or truncate logs
  • Execute actions that could cause data loss
  • Operate without a written audit trail
  • Exceed its configured action policy tier without explicit reconfiguration

7. Implementation Phases

Phase 1: Signal Extraction & Detection (No SLM)

Goal: Build the structured data pipeline that feeds the eventual SLM layer. Get detection logic working and validated before adding AI.

Deliverables:

  1. p4log-tailer — Python daemon (or Go component importing go-libp4dlog) that:

    • Tails P4LOG in real time
    • Emits parsed command records as JSON to a local SQLite rolling window (configurable retention, default 15 minutes)
    • Handles log rotation gracefully
  2. p4-monitor-collector — Python script run on a configurable interval (default 30s) that:

    • Runs p4 monitor show -ale (requires appropriate permissions)
    • Runs lsof -p $(pgrep p4d) for p4d process
    • Runs lslocks filtered to p4d-relevant paths
    • Stores snapshots in the rolling SQLite database
  3. p4-anomaly-detector — Python module with pluggable detector classes:

    • WedgeDetector: lock wait time on specific tables exceeding threshold, combined with process pile-up count from monitor output
    • SlowCommandDetector: individual command compute or lock time exceeding configurable thresholds
    • ConnectionSpikeDetector: connection count anomaly (configurable σ threshold)
    • Each detector emits a structured CandidateIncident event if triggered
  4. Test suite with synthetic log fixtures covering:

    • Normal operation (no trigger)
    • Classic wedge signature
    • Slow p4 submit without wedge
    • Checkpoint-during-load scenario

Tech stack: Python 3.10+, SQLite (stdlib), watchdog for log tailing, subprocess for system commands. Bash wrappers for systemd integration.

SDP conventions to follow:

  • Logs to $LOGS directory
  • Config in /p4/common/config/ or instance-level equivalent
  • Scripts owned by perforce user; root-requiring operations via sudo with narrow sudoers rules
  • ShellCheck compliance for any bash components
  • set -u and defensive error handling throughout

Phase 2: SLM Integration

Goal: Wire the SLM reasoning layer onto the detection output from Phase 1.

Deliverables:

  1. Ollama setup and model management scripts — install, pull, and health-check automation appropriate for server environments (non-interactive, with fallback handling)

  2. p4-context-builder — Python module that, given a CandidateIncident event, assembles the full context bundle (log window, monitor snapshots, lock summary, metadata) into a structured prompt

  3. p4-rca-agent — Python module that:

    • Submits context bundle to local Ollama API
    • Parses SLM response into structured RCAResult (diagnosis, confidence, recommended_action, reasoning_summary)
    • Writes result to audit log
    • Emits alert (initially: writes to a structured incident log file; later: integrates with alerting)
  4. System prompt design for the SLM — a well-crafted domain-specific system prompt that encodes:

    • P4LOG record format and field meanings
    • Known lock contention patterns and their names
    • Available corrective actions and their risk levels
    • Output format specification (structured JSON response for machine parsing)

Phase 3: RAG Knowledge Base

Goal: Improve SLM accuracy by giving it retrieval access to domain documentation.

Deliverables:

  1. Document corpus pipeline — scripts to ingest and chunk:

    • P4 admin guide PDFs/HTML (from Perforce docs site)
    • KB articles (from portal.perforce.com, curated list)
    • SDP documentation
    • Incident annotations (from Phase 4)
  2. Vector store — chromadb (MIT, runs locally, no server required) with sentence-transformer embeddings. At the scale of P4 documentation this is trivially sized.

  3. RAG integration in p4-context-builder — at context assembly time, embed the candidate incident summary, retrieve top-K relevant document chunks, inject into the SLM prompt.


Phase 4: Annotation & Eval Harness

Goal: Build the feedback loop that enables continuous quality measurement and eventual fine-tuning.

Deliverables:

  1. p4-annotator CLI tool — given an incident ID from the audit log, present the full context bundle and SLM output, and allow the operator to:

    • Mark diagnosis as correct/incorrect/partial
    • Enter the actual root cause (free text + structured category)
    • Mark recommended action as appropriate/inappropriate/missing
    • Add notes
  2. Eval runner — given a set of annotated incidents, run the current agent against the same context bundles and score:

    • Diagnosis accuracy (correct category / correct specific cause)
    • Action recommendation accuracy
    • False positive rate (triggered on non-incident)
    • False negative rate (missed real incident) — requires negative examples
  3. Fine-tuning dataset export — when enough annotations accumulate (target: 200+ labeled examples), export in the format required for LoRA fine-tuning via Unsloth or Axolotl.


8. Repository Structure (Suggested)

p4-rca-agent/
├── CLAUDE.md                    # Agent governance file — read first
├── README.md
├── docs/
│   ├── architecture.md
│   ├── log-format-reference.md  # P4LOG field documentation
│   ├── known-patterns.md        # Documented incident signatures
│   └── safety-policy.md
├── p4rca/                       # Main Python package
│   ├── __init__.py
│   ├── tailer.py                # P4LOG tailing
│   ├── collector.py             # p4 monitor / lsof / lslocks collection
│   ├── detector.py              # Anomaly detection (pluggable detectors)
│   ├── context_builder.py       # Context bundle assembly
│   ├── rag.py                   # RAG retrieval
│   ├── agent.py                 # SLM invocation and response parsing
│   ├── actions.py               # Corrective action implementations
│   ├── audit.py                 # Audit log writer
│   └── models.py                # Data classes: CandidateIncident, RCAResult, etc.
├── scripts/
│   ├── install.sh               # Installation script (SDP-aware)
│   ├── p4rca-monitor.sh         # Wrapper for systemd service
│   └── setup-ollama.sh          # Ollama install + model pull
├── systemd/
│   └── p4rca-monitor.service
├── config/
│   └── p4rca.yaml.example       # Configuration template
├── tests/
│   ├── fixtures/                # Synthetic P4LOG excerpts for testing
│   │   ├── wedge_scenario.log
│   │   ├── slow_submit.log
│   │   ├── normal_operation.log
│   │   └── checkpoint_under_load.log
│   ├── test_detector.py
│   ├── test_tailer.py
│   └── test_context_builder.py
├── eval/
│   ├── annotator.py             # p4-annotator CLI
│   └── eval_runner.py
└── requirements.txt

9. CLAUDE.md for This Project

The agent should create a CLAUDE.md in the repo root with the following content as a starting point (expand as design decisions are made):

# CLAUDE.md — p4-rca-agent

## Project Purpose
AI-assisted Perforce Helix Core server monitoring, RCA, and corrective action agent.
Runs locally on p4d server machines. No cloud API calls during operation.

## Core Constraints
- Python 3.10+ for all Python code
- `set -u` and ShellCheck compliance for all bash
- All root-requiring operations must be narrowly scoped via sudoers
- No modifications to depot data under any circumstances
- Every agent action written to audit log before execution
- SLM runs via local Ollama API only — no external API calls

## Safety Rules (Non-Negotiable)
- The agent NEVER modifies depot data or metadata
- The agent NEVER deletes or truncates logs
- The agent NEVER operates without a written audit trail
- Corrective action tier is a configuration value; default is Tier 1 (alert only)

## Code Style
- Type hints on all function signatures
- Docstrings on all public classes and functions
- Dataclasses or Pydantic models for all structured data (CandidateIncident, RCAResult, etc.)
- No bare `except` clauses — always catch specific exceptions
- Logging via stdlib `logging` (not print) — log at appropriate levels

## Key External Dependencies
- go-libp4dlog (Go, MIT): P4LOG parsing library — prefer this over custom parsing
- Ollama: local SLM inference runtime
- chromadb: local vector store for RAG
- watchdog: Python log file tailing

## Testing
- pytest for all Python tests
- Synthetic log fixtures in tests/fixtures/ for deterministic testing
- Run tests before committing: `pytest tests/`

## SDP Conventions
- Scripts that run as services: owned by `perforce` user
- Logs: to $LOGS (SDP convention) or /var/log/p4rca/ if SDP not present
- Config: /p4/common/config/p4rca.yaml (SDP) or /etc/p4rca/p4rca.yaml

10. Immediate First Tasks for the Agent

In priority order — start here:

  1. Read this document fully before writing any code.

  2. Scaffold the repository structure (§8) with empty files and stubs. Create CLAUDE.md (§9).

  3. Define data models in p4rca/models.py — CandidateIncident, RCAResult, MonitorSnapshot, LockEvent. Use Python dataclasses or Pydantic. These are the contracts between layers; getting them right before writing layer implementations avoids rework.

  4. Implement and test detector.py first, using synthetic fixture logs. This is the highest-leverage piece and the most testable without a live server. Implement WedgeDetector first — it's the primary use case.

  5. Implement tailer.py — P4LOG file tailing. Evaluate whether to shell out to go-libp4dlog's log2sql binary (JSON output mode) or implement a Python parser. Given the complexity of P4LOG edge cases, strongly prefer using log2sql --json as a subprocess rather than writing a Python parser.

  6. Implement collector.py — p4 monitor show -ale, lsof, lslocks collection. Handle permission errors gracefully (log, continue, don't crash). Make the collection interval configurable.

  7. Implement audit.py — before any action layer exists, the audit log writer should be complete and tested. It's the safety net.

  8. Wire Phase 1 end-to-end with a simple CLI harness that runs the tailer + collector + detector loop and prints CandidateIncident events to stdout. Validate against a real (or realistic captured) P4LOG before proceeding to Phase 2.

  9. Do not begin Phase 2 (SLM integration) until Phase 1 is fully tested and the context bundle format is stable. The SLM integration is easy; getting the signal extraction right is the hard part.


11. Open Questions for Design Review

These should be revisited with the domain expert before finalizing:

  • What is the target set of p4d versions? (Affects which structured logs and configurables are available)
  • What is the minimum hardware specification for target deployment servers? (Governs SLM size choice)
  • Should the agent integrate with existing p4prometheus/Grafana deployments, or run fully standalone?
  • Are there specific corrective actions that are pre-authorized for autonomous execution in any scenario, even in early tiers? (e.g., sending an internal alert is always safe; is there a p4 monitor terminate case that's ever acceptable at tier 1?)
  • What is the data handling policy for log content that may appear in RAG corpus or fine-tuning data? (Customer log sanitization requirements)
  • Should the agent be SDP-aware (detect SDP layout and adapt paths) or require explicit configuration?

End of briefing document. Agent: begin with §10, Task 1.

# P4 RCA Agent — Project Briefing & Design Kickoff

**Document type:** Session notes + agent tasking brief  
**Prepared for:** Claude Code CLI agent  
**Domain expert:** C. Thomas Tyler (Perforce Consulting, SDP maintainer)  
**Date:** 2026-04-25  
**Status:** Pre-implementation — design phase

---

## 1. Project Vision

Build a locally-deployable, continuously-running AI agent for Perforce Helix Core (p4d) servers that:

1. **Monitors** p4d server health in real time by consuming the P4LOG, system commands, and OS-level data
2. **Detects** anomalies and known problem signatures (e.g., database wedge scenarios, runaway processes, lock pile-ups)
3. **Performs Root Cause Analysis (RCA)** using an on-premise Small Language Model (SLM) — no cloud API calls, no metered token costs
4. **Recommends or takes corrective action** according to a governed, safety-tiered policy

The agent must be safe to run on live production P4 servers. It is explicitly **not** an LLM-cloud solution. Everything runs locally on the server machine.

---

## 2. Background & Constraints

### 2.1 Environment

- Target platform: Linux servers (Ubuntu, Rocky, SLES) running Perforce Helix Core (`p4d`)
- Deployed under the SDP (Server Deployment Package) conventions where applicable
- Must be compatible with root-level access for some operations (`lsof`, `lslocks`, core dump inspection, process management)
- Some target servers may have no GPU — CPU-only inference must be viable
- No continuous cloud API usage — cost and privacy prohibit it

### 2.2 Why SLM, Not LLM

| Concern | LLM (cloud) | SLM (local) |
|---|---|---|
| Token cost at continuous operation | Prohibitive | None after deployment |
| Data privacy (customer server logs) | Logs leave site | Air-gapped |
| Latency for triggered analysis | Network RTT | Local (acceptable) |
| Customization via fine-tuning | Expensive | Feasible on single GPU |
| Reasoning quality | Excellent | Good-to-very-good (2024+ models) |

### 2.3 Key Domain Knowledge: The Database Wedge

One of the primary target scenarios is the **database wedge**: a write lock on a specific P4 database table (`db.*`) causes a cascading pile-up of processes waiting behind the lock holder. This is:
- Poorly understood — contributing factors are complex combinations of user behavior and load
- Difficult to reproduce in sandbox environments (concurrency is lost when replaying journal data)
- A real-world scenario best validated on production systems
- Detectable from log patterns + `p4 monitor show -ale` + `lsof`/`lslocks` cross-reference

---

## 3. Existing Ecosystem — Do Not Reinvent These Wheels

### 3.1 `go-libp4dlog` (Robert Cowham, MIT License)
**Repo:** https://github.com/rcowham/go-libp4dlog

The authoritative Go library for parsing Perforce p4d text log files. This is the foundation everything else builds on. **Import this library rather than writing a P4LOG parser from scratch.**

Key tools bundled with it:
- **`log2sql`** — parses P4LOG → SQLite database + VictoriaMetrics historical metrics. Supports JSON and SQL output modes. Can process gzipped logs, multiple files, live tailing.
- **`p4locks`** — parses P4LOG → HTML timeline of table lock contention (read/write, wait/held per PID), with threshold filtering and table exclusion regex. This is directly relevant to wedge detection.
- **`p4dpending`** — surfaces commands still in-flight (no completion record yet). Useful for detecting hung operations.

Output formats: SQLite `.db`, JSON, SQL INSERT statements, Graphite/VictoriaMetrics `.metrics` files.

**Recommended integration approach:** The library exposes a Go API that emits structured command records. Consider a Go component that imports `go-libp4dlog` and streams parsed records over a local socket or writes to SQLite, feeding a Python analysis daemon. This avoids re-implementing the P4LOG parser and inherits all its edge-case handling and test coverage.

### 3.2 `p4prometheus` (Perforce org, MIT License)
**Repo:** https://github.com/perforce/p4prometheus

Consumes `go-libp4dlog` for **real-time** log tailing and writes Prometheus metrics. This is the existing real-time monitoring layer. Pairs with Grafana for dashboards and alerting rules.

The gap this project fills: `p4prometheus` exposes metrics and alerts, but has no RCA reasoning or corrective action layer. The new agent would either run alongside `p4prometheus` (using it as a signal source via Prometheus queries) or replicate its log-tailing approach and add the intelligence layer directly.

### 3.3 `p4dbeat` (rcowham, MIT License)
**Repo:** https://github.com/rcowham/p4dbeat  
Custom Elastic Beat — sends parsed log records to Elasticsearch. Alternative to the Prometheus path if the target deployment uses the Elastic stack.

### 3.4 Log Analysis KB Articles
- https://portal.perforce.com/s/article/2514 — Basic P4 Server Log analysis
- https://portal.perforce.com/s/article/3088 — Structured logs overview (which logs exist, how to enable, tradeoffs)
- https://community.perforce.com/s/article/5470 — Additional background

### 3.5 P4LOG Configuration Prerequisites

For maximum log richness (required for meaningful analysis):
```
p4 configure set server=3   # Track all command events
p4 configure set track=1    # Enable database tracking records
```
Log rotation must be configured — `server=3 + track=1` produces large logs quickly.

The P4LOG (a.k.a. "unstructured log") is always present and is the primary input. Structured logs (enabled individually via configurables) can supplement it but are off by default. Enable structured logs only where there's net value over the duplication cost.

---

## 4. Architecture

### 4.1 Layered Design

```
┌─────────────────────────────────────────────────────────────┐
│  Layer 4: Action Execution                                   │
│  Governed tool calls: alert, block user, restart, checkpoint │
│  Validates against policy; writes audit log; enforces        │
│  confidence thresholds                                       │
├─────────────────────────────────────────────────────────────┤
│  Layer 3: SLM Reasoning (Ollama / local inference)           │
│  Receives structured context bundle → produces RCA narrative │
│  + action recommendation. Activated by Layer 2 trigger only. │
│  Model: Phi-4 (MIT) or Mistral 7B (Apache 2.0) recommended  │
├─────────────────────────────────────────────────────────────┤
│  Layer 2: Anomaly Detection & Trigger                        │
│  Rule-based + statistical thresholds on structured signals   │
│  Decides WHEN SLM activates. Prevents continuous inference   │
│  on raw log noise.                                           │
├─────────────────────────────────────────────────────────────┤
│  Layer 1: Structured Signal Extraction                       │
│  go-libp4dlog (log parsing) + p4 monitor show -ale           │
│  + lsof + lslocks + syslog + p4d structured logs            │
│  Outputs: rolling SQLite window (last N minutes of events)   │
└─────────────────────────────────────────────────────────────┘
```

**Key design principle:** The SLM is NOT a streaming log consumer. It is an on-demand reasoner activated by the trigger layer. The signal extraction and anomaly detection layers run continuously and cheaply; the SLM runs only when there's something meaningful to reason about.

### 4.2 Context Bundle

When Layer 2 fires a trigger, Layer 3 receives a **context bundle** — a structured JSON document assembled from:

- Relevant P4LOG window (parsed records for the anomaly time window, not raw text)
- `p4 monitor show -ale` snapshot (current running commands, PIDs, users, duration)
- `lsof` output filtered to p4d process
- `lslocks` output filtered to p4d-relevant paths
- Lock contention summary from `p4locks`-style analysis of the window
- Relevant syslog entries from the same window
- Any matching patterns from the RAG knowledge base (see §4.3)
- Server metadata: p4d version, configurable settings, SDP instance if applicable

### 4.3 RAG Knowledge Base

Before considering fine-tuning, build a RAG corpus over:
- P4 admin guide sections on locking, performance, configurables
- KB articles on known wedge patterns and lock scenarios
- SDP documentation and runbooks
- Any accumulated incident history (annotated log segments)
- `p4 help` output for commands commonly involved in incidents

This is the fastest path to domain-specific knowledge without touching model weights.

### 4.4 SLM Selection

**Recommended baseline: Phi-4 (14B, MIT license)**
- Best-in-class reasoning for an SLM as of late 2024/early 2025
- MIT license — no redistribution restrictions, clean for internal or product use
- Runs on CPU-only server hardware at acceptable speed for triggered (not continuous) inference
- Q4 quantization: ~8GB RAM, adequate for most server machines

**Fallback for memory-constrained servers: Phi-3.5-mini (3.8B, MIT)**
- ~2GB RAM at Q4
- Weaker reasoning but viable for classification/triage tasks

**Runtime: Ollama (MIT)**  
- Standard local SLM deployment on Linux
- OpenAI-compatible API at `localhost:11434`
- Model management, quantization, serving all handled
- Install: `curl -fsSL https://ollama.com/install.sh | sh && ollama pull phi4`

---

## 5. Evaluation Harness Strategy

### 5.1 Annotation Over Simulation

Simulating real-world wedge scenarios in a sandbox is not reliably achievable (concurrency is lost in journal replay). The evaluation harness should instead be built around **annotation of real log segments**:

1. Collect P4LOG segments from production incidents (with customer permission, sanitized as needed)
2. Domain experts annotate: "this is a wedge scenario; root cause is X; correct action is Y; resolved at timestamp Z"
3. Annotated segments become both eval test cases and eventual fine-tuning data

This is the standard approach for niche-domain NLP tasks where simulation is impractical.

### 5.2 Synthetic Log Fixtures for Unit Testing

For testing the parsing and detection layers, synthetic log fixtures are viable and appropriate. You know the log signature of a wedge — construct minimal log excerpts that exhibit those signatures and verify the detection layer fires correctly. These are not "simulations of user load"; they are unit tests of the signal extraction and trigger logic.

The `go-libp4dlog` project already has a test suite with log fixtures (`p4dlog_test.go`). Study those fixtures as examples of how to construct synthetic test inputs.

### 5.3 Journal Replay for State Seeding (Not Concurrency)

The domain expert's idea: seed a sandbox with a baseline checkpoint, then replay production journals to bring the server to a known-good state representative of a real production environment. This is valuable for:
- Testing p4d version upgrade impact on performance
- Testing hardware configuration changes (RAM, storage)
- Establishing a realistic data state before applying synthetic scripted load

Limitation: journal replay is sequential — it does not reproduce the concurrent lock contention of the original production event. Pair journal-seeded state with load scripts (e.g., parallel `p4 -x` commands) that approximate the command pattern seen in production logs during the incident window.

---

## 6. Safety & Production Deployment Model

### 6.1 Corrective Action Ladder

Deploy autonomy incrementally. Never start at a high autonomy tier.

| Tier | Behavior | When to advance |
|---|---|---|
| 0. Observe | Agent logs diagnosis + what it would have done. No output to humans. | Baseline data collection |
| 1. Alert | Agent sends alert/page with diagnosis and recommended action. Human acts. | After diagnosis quality is validated |
| 2. Recommend with approval | Agent presents proposed action in terminal or via alert; admin confirms with one keypress (timeout = no action). | After recommendations are consistently sound |
| 3. Act with timeout | Agent acts after N minutes if no human response; conservative action set only (alert, never kill/restart). | After approval-mode track record established |
| 4. Autonomous within policy | Specific high-confidence, well-understood scenarios (confirmed wedge + known-safe corrective action) get autonomous execution. Audit log always written. | After extensive production validation |

### 6.2 Audit Log

Every agent action (including recommended-but-not-taken actions) must be written to a structured audit log. Minimum fields:
- Timestamp
- Trigger type and raw signals
- SLM diagnosis (full text)
- Action recommended
- Action taken (if any)
- Operator response (if applicable)
- Outcome (populated later if trackable)

### 6.3 Safety Invariants (Hard Rules)

The agent must never:
- Modify depot data or metadata
- Delete or truncate logs
- Execute actions that could cause data loss
- Operate without a written audit trail
- Exceed its configured action policy tier without explicit reconfiguration

---

## 7. Implementation Phases

### Phase 1: Signal Extraction & Detection (No SLM)

**Goal:** Build the structured data pipeline that feeds the eventual SLM layer. Get detection logic working and validated before adding AI.

**Deliverables:**

1. **`p4log-tailer`** — Python daemon (or Go component importing `go-libp4dlog`) that:
   - Tails P4LOG in real time
   - Emits parsed command records as JSON to a local SQLite rolling window (configurable retention, default 15 minutes)
   - Handles log rotation gracefully

2. **`p4-monitor-collector`** — Python script run on a configurable interval (default 30s) that:
   - Runs `p4 monitor show -ale` (requires appropriate permissions)
   - Runs `lsof -p $(pgrep p4d)` for p4d process
   - Runs `lslocks` filtered to p4d-relevant paths
   - Stores snapshots in the rolling SQLite database

3. **`p4-anomaly-detector`** — Python module with pluggable detector classes:
   - `WedgeDetector`: lock wait time on specific tables exceeding threshold, combined with process pile-up count from monitor output
   - `SlowCommandDetector`: individual command compute or lock time exceeding configurable thresholds
   - `ConnectionSpikeDetector`: connection count anomaly (configurable σ threshold)
   - Each detector emits a structured `CandidateIncident` event if triggered

4. **Test suite** with synthetic log fixtures covering:
   - Normal operation (no trigger)
   - Classic wedge signature
   - Slow `p4 submit` without wedge
   - Checkpoint-during-load scenario

**Tech stack:** Python 3.10+, SQLite (stdlib), `watchdog` for log tailing, `subprocess` for system commands. Bash wrappers for systemd integration.

**SDP conventions to follow:**
- Logs to `$LOGS` directory
- Config in `/p4/common/config/` or instance-level equivalent
- Scripts owned by `perforce` user; root-requiring operations via sudo with narrow sudoers rules
- ShellCheck compliance for any bash components
- `set -u` and defensive error handling throughout

---

### Phase 2: SLM Integration

**Goal:** Wire the SLM reasoning layer onto the detection output from Phase 1.

**Deliverables:**

1. **Ollama setup and model management scripts** — install, pull, and health-check automation appropriate for server environments (non-interactive, with fallback handling)

2. **`p4-context-builder`** — Python module that, given a `CandidateIncident` event, assembles the full context bundle (log window, monitor snapshots, lock summary, metadata) into a structured prompt

3. **`p4-rca-agent`** — Python module that:
   - Submits context bundle to local Ollama API
   - Parses SLM response into structured `RCAResult` (diagnosis, confidence, recommended_action, reasoning_summary)
   - Writes result to audit log
   - Emits alert (initially: writes to a structured incident log file; later: integrates with alerting)

4. **System prompt design** for the SLM — a well-crafted domain-specific system prompt that encodes:
   - P4LOG record format and field meanings
   - Known lock contention patterns and their names
   - Available corrective actions and their risk levels
   - Output format specification (structured JSON response for machine parsing)

---

### Phase 3: RAG Knowledge Base

**Goal:** Improve SLM accuracy by giving it retrieval access to domain documentation.

**Deliverables:**

1. **Document corpus pipeline** — scripts to ingest and chunk:
   - P4 admin guide PDFs/HTML (from Perforce docs site)
   - KB articles (from portal.perforce.com, curated list)
   - SDP documentation
   - Incident annotations (from Phase 4)

2. **Vector store** — `chromadb` (MIT, runs locally, no server required) with sentence-transformer embeddings. At the scale of P4 documentation this is trivially sized.

3. **RAG integration in `p4-context-builder`** — at context assembly time, embed the candidate incident summary, retrieve top-K relevant document chunks, inject into the SLM prompt.

---

### Phase 4: Annotation & Eval Harness

**Goal:** Build the feedback loop that enables continuous quality measurement and eventual fine-tuning.

**Deliverables:**

1. **`p4-annotator`** CLI tool — given an incident ID from the audit log, present the full context bundle and SLM output, and allow the operator to:
   - Mark diagnosis as correct/incorrect/partial
   - Enter the actual root cause (free text + structured category)
   - Mark recommended action as appropriate/inappropriate/missing
   - Add notes

2. **Eval runner** — given a set of annotated incidents, run the current agent against the same context bundles and score:
   - Diagnosis accuracy (correct category / correct specific cause)
   - Action recommendation accuracy
   - False positive rate (triggered on non-incident)
   - False negative rate (missed real incident) — requires negative examples

3. **Fine-tuning dataset export** — when enough annotations accumulate (target: 200+ labeled examples), export in the format required for LoRA fine-tuning via Unsloth or Axolotl.

---

## 8. Repository Structure (Suggested)

```
p4-rca-agent/
├── CLAUDE.md                    # Agent governance file — read first
├── README.md
├── docs/
│   ├── architecture.md
│   ├── log-format-reference.md  # P4LOG field documentation
│   ├── known-patterns.md        # Documented incident signatures
│   └── safety-policy.md
├── p4rca/                       # Main Python package
│   ├── __init__.py
│   ├── tailer.py                # P4LOG tailing
│   ├── collector.py             # p4 monitor / lsof / lslocks collection
│   ├── detector.py              # Anomaly detection (pluggable detectors)
│   ├── context_builder.py       # Context bundle assembly
│   ├── rag.py                   # RAG retrieval
│   ├── agent.py                 # SLM invocation and response parsing
│   ├── actions.py               # Corrective action implementations
│   ├── audit.py                 # Audit log writer
│   └── models.py                # Data classes: CandidateIncident, RCAResult, etc.
├── scripts/
│   ├── install.sh               # Installation script (SDP-aware)
│   ├── p4rca-monitor.sh         # Wrapper for systemd service
│   └── setup-ollama.sh          # Ollama install + model pull
├── systemd/
│   └── p4rca-monitor.service
├── config/
│   └── p4rca.yaml.example       # Configuration template
├── tests/
│   ├── fixtures/                # Synthetic P4LOG excerpts for testing
│   │   ├── wedge_scenario.log
│   │   ├── slow_submit.log
│   │   ├── normal_operation.log
│   │   └── checkpoint_under_load.log
│   ├── test_detector.py
│   ├── test_tailer.py
│   └── test_context_builder.py
├── eval/
│   ├── annotator.py             # p4-annotator CLI
│   └── eval_runner.py
└── requirements.txt
```

---

## 9. CLAUDE.md for This Project

The agent should create a `CLAUDE.md` in the repo root with the following content as a starting point (expand as design decisions are made):

```markdown
# CLAUDE.md — p4-rca-agent

## Project Purpose
AI-assisted Perforce Helix Core server monitoring, RCA, and corrective action agent.
Runs locally on p4d server machines. No cloud API calls during operation.

## Core Constraints
- Python 3.10+ for all Python code
- `set -u` and ShellCheck compliance for all bash
- All root-requiring operations must be narrowly scoped via sudoers
- No modifications to depot data under any circumstances
- Every agent action written to audit log before execution
- SLM runs via local Ollama API only — no external API calls

## Safety Rules (Non-Negotiable)
- The agent NEVER modifies depot data or metadata
- The agent NEVER deletes or truncates logs
- The agent NEVER operates without a written audit trail
- Corrective action tier is a configuration value; default is Tier 1 (alert only)

## Code Style
- Type hints on all function signatures
- Docstrings on all public classes and functions
- Dataclasses or Pydantic models for all structured data (CandidateIncident, RCAResult, etc.)
- No bare `except` clauses — always catch specific exceptions
- Logging via stdlib `logging` (not print) — log at appropriate levels

## Key External Dependencies
- go-libp4dlog (Go, MIT): P4LOG parsing library — prefer this over custom parsing
- Ollama: local SLM inference runtime
- chromadb: local vector store for RAG
- watchdog: Python log file tailing

## Testing
- pytest for all Python tests
- Synthetic log fixtures in tests/fixtures/ for deterministic testing
- Run tests before committing: `pytest tests/`

## SDP Conventions
- Scripts that run as services: owned by `perforce` user
- Logs: to $LOGS (SDP convention) or /var/log/p4rca/ if SDP not present
- Config: /p4/common/config/p4rca.yaml (SDP) or /etc/p4rca/p4rca.yaml
```

---

## 10. Immediate First Tasks for the Agent

In priority order — start here:

1. **Read this document fully before writing any code.**

2. **Scaffold the repository structure** (§8) with empty files and stubs. Create `CLAUDE.md` (§9).

3. **Define data models** in `p4rca/models.py` — `CandidateIncident`, `RCAResult`, `MonitorSnapshot`, `LockEvent`. Use Python dataclasses or Pydantic. These are the contracts between layers; getting them right before writing layer implementations avoids rework.

4. **Implement and test `detector.py`** first, using synthetic fixture logs. This is the highest-leverage piece and the most testable without a live server. Implement `WedgeDetector` first — it's the primary use case.

5. **Implement `tailer.py`** — P4LOG file tailing. Evaluate whether to shell out to `go-libp4dlog`'s `log2sql` binary (JSON output mode) or implement a Python parser. Given the complexity of P4LOG edge cases, **strongly prefer using `log2sql --json` as a subprocess** rather than writing a Python parser.

6. **Implement `collector.py`** — `p4 monitor show -ale`, `lsof`, `lslocks` collection. Handle permission errors gracefully (log, continue, don't crash). Make the collection interval configurable.

7. **Implement `audit.py`** — before any action layer exists, the audit log writer should be complete and tested. It's the safety net.

8. **Wire Phase 1 end-to-end** with a simple CLI harness that runs the tailer + collector + detector loop and prints `CandidateIncident` events to stdout. Validate against a real (or realistic captured) P4LOG before proceeding to Phase 2.

9. **Do not begin Phase 2 (SLM integration) until Phase 1 is fully tested and the context bundle format is stable.** The SLM integration is easy; getting the signal extraction right is the hard part.

---

## 11. Open Questions for Design Review

These should be revisited with the domain expert before finalizing:

- What is the target set of p4d versions? (Affects which structured logs and configurables are available)
- What is the minimum hardware specification for target deployment servers? (Governs SLM size choice)
- Should the agent integrate with existing p4prometheus/Grafana deployments, or run fully standalone?
- Are there specific corrective actions that are pre-authorized for autonomous execution in any scenario, even in early tiers? (e.g., sending an internal alert is always safe; is there a `p4 monitor terminate` case that's ever acceptable at tier 1?)
- What is the data handling policy for log content that may appear in RAG corpus or fine-tuning data? (Customer log sanitization requirements)
- Should the agent be SDP-aware (detect SDP layout and adapt paths) or require explicit configuration?

---

*End of briefing document. Agent: begin with §10, Task 1.*
# Change User Description Committed
#1 32635 bot_Claude_Anthropic Add web agent session briefing document for p4-rca-agent project kickoff