Cost & ROI Analysis

Economics

Cost optimization ranking, cache economics, model role savings, success metrics, and resource requirements. The business case for every requirement.

Cost Optimization Ranking

Ranked by ROI—implement in this order for maximum cost impact.

Prerequisite: All session logs report cost: {total: 0}. No cost optimization can be measured until instrumentation is fixed. No current requirement covers cost telemetry. See Session Insights.
#
Feature
Savings
Effort
Phase
1
R8: Numeric Length Anchors
5–10%
Trivial
P1
2
R1: Prompt Caching
10–15%
Low
P1
3
R16: Model Roles
30%
Medium
P3
4
R7: Model Variants
3–5%
Low
P4
5
R6: Verification Agent
Negative
Medium
P6a
R6 adds cost but prevents costly mistakes. Verification doubles the token spend on checked work, but catching a production bug in verification is orders of magnitude cheaper than catching it in production.

Cache Economics

Prefix-based caching across all three major providers. Universal rule: maximize identical byte prefix length.

$0.50–$0.60
Saved per 10-turn Opus session
90%
Cache read discount (Anthropic)
93.9%
Cost savings vs Sonnet 4.6 (DeepSeek case study)
cache economics
anthropic 5-min TTL, $15/MTok input, 90% discount on cache reads
openai prefix-based, automatic, ~50% discount on cache reads
deepseek prefix-based, ~85% hit rate achievable (Reasonix case study)
busters dynamic dates, nondeterministic tool schemas, per-repo paths
fix sort tool schemas alphabetically by name
fix deterministic serialization across requests
fix stable content first, append-only history
variant ~100 uncached tokens/turn beats 4 separate cached prefixes

Model Role Savings

Route read-only operations to cheaper models. 60% of exploration turns use smol → 30% total session cost reduction.

≥30%
Session cost reduction with model roles
Role
Operations
Cost Tier
% of Turns
smol
grep, find, read, ls, context_pack
Low
~60%
default
edit, write, bash, implementation
Standard
~30%
slow
architecture, complex debugging
Premium
~5%
commit
commit messages, changelogs
Low
~5%

Success Metrics

13 measurable targets across all 6 phases. Each metric has a defined measurement method.

ID
Metric
Target
Phase
Measurement
S1
Prompt cache hit rate (5+ turn sessions)
>60%
P1
OpenRouter response headers
S2
Hook-based policy enforcement without forking
Works
P2
Integration test
S3
Parallel subagent research while chatting
Works
P3
E2E test
S4
Risk self-assessment without deny-lists
Works
P4
Behavioral audit
S5
Session cold-start context reduction
≥40%
P5
Context-gathering tool calls in first 3 turns
S6
Independent verification for non-trivial work
Works
P6a
Verification pass rate audit
S7
Output token spend reduction
≥5%
P4
Token logging in telemetry
S8
TTSR false-positive rate
<5%
P2
Rule trigger audit log
S9
Memory extraction accuracy
≥3/10
P5
Manual audit of extracted facts
S10
Session fork/resume round-trip
Works
P5
Fork + resume integration test
S11
Model role cost reduction (exploration)
≥30%
P3
Cost tracking per session
S12
Cross-agent rule discovery formats
≥3
P2
Format coverage test
S13
Parallel isolated tasks with correct merge
≥5
P3
Worktree + merge integration test

Resource Requirements

~10 weeks total with 1–2 developers. Phase 6 is parallelizable across 2 developers.

Phase 1 — 1 week

1 developer. Prompt refactoring (R1), feature flag config (R12), length anchors (R8). Low risk, immediate value.

Phase 2 — 2 weeks

1–2 developers. Hook system (R2/R18) + TTSR (R13) requires streaming knowledge. Cross-agent rules (R17) parallelizable.

Phase 3 — 2 weeks

1–2 developers. Subagent orchestration (R3/R19) is the most complex phase. Model roles (R16) parallelizable.

Phase 4 — 1 week

1 developer. Prompt-only changes: risk taxonomy (R4), model variants (R7). A/B testing methodology needed.

Phase 5 — 2 weeks

1–2 developers. Memory pipeline (R5/R14) + session tree (R15). LLM extraction quality requires iteration.

Phase 6 — 2 weeks

2 developers. Split into 6a (R6+R10, verification+budget) and 6b (R9+R11+R20, MCP+autonomy+commit). Fully parallel.