Foundations — Phase 1

R1: Prompt Cache Boundary

Tier 1 Planned Split ~4,000 tokens of static instruction from per-turn dynamic context. API applies cache_control at the boundary.

Cache Boundary Architecture

Static Prefix (cacheable)

Identity

Agent persona, name, version

Tool Policy

Edit protocol, bash restrictions

Risk Taxonomy

R4 decision tree (Phase 4)

Length Anchors

R8 numeric limits (Phase 1)

Safety Rules

OWASP, security constraints

Git Workflow

Commit, PR, branch patterns

Dynamic Suffix (per-turn)

<env> Block

CWD, platform, shell, OS

Workspace

Folder structure, git state

AGENTS.md

Project-specific instructions

Skills Context

Loaded skills + MCP servers

Memory (R5)

Persistent facts from prior sessions

Model Variant (R7)

Per-family prompt patches

static_prefix ~4,000 tokens — identical across all sessions

cache_control { type: "ephemeral" } at join point

anthropic_ttl 5 minutes — active coding stays warm

target_hit_rate >80% active (<2min turns), ~30-40% mixed

savings_per_10 ~$0.50–$0.60 on Opus (4K tokens × 90% discount × 9 hits)

return_type { staticPrefix: string, dynamicSuffix: string } — tuple, not marker

phase_0_spike verify cache_control via OpenRouter per provider

version_header  for A/B tracking

Session data: Real logs show ~83K input tokens per system-reminder cycle. Reminder loops fire when no tracked action occurs, compounding waste across long sprints. Caching the static prefix alone eliminates ~4K × 90% discount per hit. See Session Insights for the full analysis.

R12: Feature Flag Infrastructure

Tier 3 Exists Already built in apps/cli/src/feature-flags/. Extended with 10 new flags gating all Phase 2–6 features.

SUBQ_CODE_HOOKS SUBQ_CODE_SUBAGENTS SUBQ_CODE_TTSR SUBQ_CODE_MEMORY SUBQ_CODE_SESSION_TREE SUBQ_CODE_MODEL_ROLES SUBQ_CODE_CROSS_AGENT_RULES SUBQ_CODE_VERIFICATION SUBQ_CODE_MCP SUBQ_CODE_AUTONOMOUS

Security-critical flags (HOOKS, SUBAGENTS, AUTONOMOUS) require dual opt-in: remote flag and local config in ~/.subq/settings.json. Remote flag alone cannot enable these features.

R8: Numeric Length Anchors

Tier 3 Promoted to Phase 1 Three-line change in factory.ts with zero risk and immediate measurable savings.

inter_tool ≤40 words between tool calls (raised from 25 — 25 too tight)

final_response ≤100 words unless task requires more detail

update_style one sentence per update is almost always enough

gpt_fallback qualitative: “one sentence between tool calls”

escape_clause unless user asked for explanation, analysis, or docs

target_savings ≥5% output token reduction (S7)

Research finding: Hard numeric constraints consistently outperform qualitative across all models. “Be concise” → 47-sentence response. “Answer in 3 sentences max” → 3-sentence response. 18-model benchmark confirms.

R2 → R18: Hooks → Extension API

Tier 1 Supersession Composable lifecycle hooks (R2) evolving into full Extension API (R18) with 30+ events and tool/command registration.

R2: Shell hooks → R18: Extension API

event PreToolUse → pi.on("tool_call")

event PostToolUse → pi.on("tool_execution_end")

event SessionStart → pi.on("session_start")

event SessionEnd → pi.on("session_shutdown")

timeout 10s default — kill + warning on hang

conflict first block wins, systemMessages concat, last updatedInput wins

parallel Promise.allSettled() within same event

debug ~/.subq/agent/hooks.log + subq hooks --dry-run

CRITICAL-1

Hook Command Injection

Unsanitized file paths become shell metacharacters. Fix: pass context via stdin as JSON, user-level hooks only, strip API keys from child process env.

Before Phase 2

{ decision: "approve" } { decision: "approve", systemMessage } { decision: "approve", updatedInput } { decision: "block", reason }

R17: Cross-Agent Rule Discovery

Tier 4 Novel Discover rules from 7 agent config formats. Normalize, deduplicate, priority-based injection.

.subq/rules/

Native SubQ rules. Highest priority. First-wins dedup.

.claude/

CLAUDE.md + commands. Second priority.

.cursor/rules/

Cursor rule files. Third priority.

.codex/ .gemini/ .windsurf/ .cline/

Additional formats. Added based on demand.

alwaysApply Rules

Injected into dynamic prompt. 2KB per rule, 10KB total limit.

Glob-Scoped Rules

Injected only when agent works on matching files.

TTSR-Triggered Rules

Rules with ttsrTrigger registered as stream monitors.

CRITICAL-2

Rule Prompt Injection

Repo ships crafted alwaysApply: true rules. Fix: project-level rules untrusted, user-level only for alwaysApply. Reject project-level ttsrTrigger regexes.

Before Phase 2

R13: TTSR — Streamed Rules

Tier 4 Novel Zero-context-cost rules monitoring output stream via regex. Abort, inject <system-interrupt>, retry.

stream pi.on("message_update") fires per streaming frame

scan delta-scan only — track lastCheckedPosition per rule

threshold run regexes every 50 new characters

match first regex match by character position wins

action inject <system-interrupt> with VIOLATION / CORRECTION / ACTION

discard remove partial output from history before retry

keep wrap partial in <partial-output status="interrupted">

cap 3 total retries per session across all rules

scope per-session state — each rule fires at most once

ux ⟳ Rule "[name]" triggered, regenerating…

Zero Context Cost

Stream-Level Enforcement

TTSR rules consume no prompt tokens until triggered. Rules exist only in the stream monitor—never injected into context unless a violation occurs.

Safety Guard

Correction Content

Correction content must come from the rule definition, not from triggering context. All corrections logged for audit.