Phase 1–2 Deep Dive

Foundations

Prompt caching, feature flags, length anchors, lifecycle hooks, cross-agent rules, and TTSR. The platform layer everything else builds on.

R1: Prompt Cache Boundary

Tier 1 Planned Split ~4,000 tokens of static instruction from per-turn dynamic context. API applies cache_control at the boundary.

Cache Boundary Architecture
Static Prefix (cacheable)

Identity

Agent persona, name, version

Tool Policy

Edit protocol, bash restrictions

Risk Taxonomy

R4 decision tree (Phase 4)

Length Anchors

R8 numeric limits (Phase 1)

Safety Rules

OWASP, security constraints

Git Workflow

Commit, PR, branch patterns

Dynamic Suffix (per-turn)

<env> Block

CWD, platform, shell, OS

Workspace

Folder structure, git state

AGENTS.md

Project-specific instructions

Skills Context

Loaded skills + MCP servers

Memory (R5)

Persistent facts from prior sessions

Model Variant (R7)

Per-family prompt patches

cache economics
static_prefix ~4,000 tokens — identical across all sessions
cache_control { type: "ephemeral" } at join point
anthropic_ttl 5 minutes — active coding stays warm
target_hit_rate >80% active (<2min turns), ~30-40% mixed
savings_per_10 ~$0.50–$0.60 on Opus (4K tokens × 90% discount × 9 hits)
return_type { staticPrefix: string, dynamicSuffix: string } — tuple, not marker
phase_0_spike verify cache_control via OpenRouter per provider
version_header <!-- subq-prompt-v1.0.0 --> for A/B tracking
Session data: Real logs show ~83K input tokens per system-reminder cycle. Reminder loops fire when no tracked action occurs, compounding waste across long sprints. Caching the static prefix alone eliminates ~4K × 90% discount per hit. See Session Insights for the full analysis.

R12: Feature Flag Infrastructure

Tier 3 Exists Already built in apps/cli/src/feature-flags/. Extended with 10 new flags gating all Phase 2–6 features.

Feature Flags (10 new gates)
SUBQ_CODE_HOOKS SUBQ_CODE_SUBAGENTS SUBQ_CODE_TTSR SUBQ_CODE_MEMORY SUBQ_CODE_SESSION_TREE SUBQ_CODE_MODEL_ROLES SUBQ_CODE_CROSS_AGENT_RULES SUBQ_CODE_VERIFICATION SUBQ_CODE_MCP SUBQ_CODE_AUTONOMOUS
Security-critical flags (HOOKS, SUBAGENTS, AUTONOMOUS) require dual opt-in: remote flag and local config in ~/.subq/settings.json. Remote flag alone cannot enable these features.

R8: Numeric Length Anchors

Tier 3 Promoted to Phase 1 Three-line change in factory.ts with zero risk and immediate measurable savings.

length anchors
inter_tool ≤40 words between tool calls (raised from 25 — 25 too tight)
final_response ≤100 words unless task requires more detail
update_style one sentence per update is almost always enough
gpt_fallback qualitative: “one sentence between tool calls”
escape_clause unless user asked for explanation, analysis, or docs
target_savings ≥5% output token reduction (S7)
Research finding: Hard numeric constraints consistently outperform qualitative across all models. “Be concise” → 47-sentence response. “Answer in 3 sentences max” → 3-sentence response. 18-model benchmark confirms.

R2 → R18: Hooks → Extension API

Tier 1 Supersession Composable lifecycle hooks (R2) evolving into full Extension API (R18) with 30+ events and tool/command registration.

R2: Shell hooks R18: Extension API
hook lifecycle
event PreToolUse → pi.on("tool_call")
event PostToolUse → pi.on("tool_execution_end")
event SessionStart → pi.on("session_start")
event SessionEnd → pi.on("session_shutdown")
timeout 10s default — kill + warning on hang
conflict first block wins, systemMessages concat, last updatedInput wins
parallel Promise.allSettled() within same event
debug ~/.subq/agent/hooks.log + subq hooks --dry-run

CRITICAL-1

Hook Command Injection

Unsanitized file paths become shell metacharacters. Fix: pass context via stdin as JSON, user-level hooks only, strip API keys from child process env.

Before Phase 2
Hook Response (Discriminated Union)
{ decision: "approve" } { decision: "approve", systemMessage } { decision: "approve", updatedInput } { decision: "block", reason }

R17: Cross-Agent Rule Discovery

Tier 4 Novel Discover rules from 7 agent config formats. Normalize, deduplicate, priority-based injection.

.subq/rules/

Native SubQ rules. Highest priority. First-wins dedup.

.claude/

CLAUDE.md + commands. Second priority.

.cursor/rules/

Cursor rule files. Third priority.

.codex/ .gemini/ .windsurf/ .cline/

Additional formats. Added based on demand.

alwaysApply Rules

Injected into dynamic prompt. 2KB per rule, 10KB total limit.

Glob-Scoped Rules

Injected only when agent works on matching files.

TTSR-Triggered Rules

Rules with ttsrTrigger registered as stream monitors.

CRITICAL-2

Rule Prompt Injection

Repo ships crafted alwaysApply: true rules. Fix: project-level rules untrusted, user-level only for alwaysApply. Reject project-level ttsrTrigger regexes.

Before Phase 2

R13: TTSR — Streamed Rules

Tier 4 Novel Zero-context-cost rules monitoring output stream via regex. Abort, inject <system-interrupt>, retry.

ttsr flow
stream pi.on("message_update") fires per streaming frame
scan delta-scan only — track lastCheckedPosition per rule
threshold run regexes every 50 new characters
match first regex match by character position wins
action inject <system-interrupt> with VIOLATION / CORRECTION / ACTION
discard remove partial output from history before retry
keep wrap partial in <partial-output status="interrupted">
cap 3 total retries per session across all rules
scope per-session state — each rule fires at most once
ux ⟳ Rule "[name]" triggered, regenerating…

Zero Context Cost

Stream-Level Enforcement

TTSR rules consume no prompt tokens until triggered. Rules exist only in the stream monitor—never injected into context unless a violation occurs.

Safety Guard

Correction Content

Correction content must come from the rule definition, not from triggering context. All corrections logged for audit.