You build a documentation assistant. Users ask questions; your app passes a 15,000-token system prompt (the full docs plus a few-shot persona) plus the question to Claude Sonnet 4.6 and returns the answer. At 500 API calls a day, the bill lands at $797/month. That number feels wrong, so you open the Anthropic console and it is right.
Here is how to get it to $127/month with two configuration changes and no architecture rewrite.
The baseline
Scenario: 500 API calls/day on Sonnet 4.6 ($3/1M input, $15/1M output), each call sending:
- 15,000 tokens: static system prompt (docs, persona, examples)
- 200 tokens: the user's question
- 500 tokens output
| Token type | Daily volume | Rate | Daily cost |
|---|---|---|---|
| Input (all) | 7,600,000 | $3/1M | $22.80 |
| Output | 250,000 | $15/1M | $3.75 |
| Total | $26.55/day |
Monthly: $796.50. That 15,000-token system prompt is billed on every single request. It never changes, yet you pay full price for it 500 times a day.
Optimization 1: 5-minute prompt caching
Claude's caching works by tagging the static portion of your prompt with cache_control. Anthropic stores it for five minutes; any request arriving within that window pays $0.30/1M for those cached tokens instead of $3/1M. That is a 90% reduction on every token that repeats.
Writing the cache costs 1.25x the standard input rate ($3.75/1M for Sonnet 4.6). As the detailed break-even math shows, that write cost recovers after just 1.3 cache reads. Two requests sharing the same cache entry and you are already ahead.
At 500 calls/day spread across business hours, you average about one request every three minutes. That sits inside the five-minute TTL for most of the day, but overnight gaps and burst patterns burn more writes. A conservative estimate: 80% cache hit rate.
| Token type | Daily volume | Rate | Daily cost |
|---|---|---|---|
| Cache writes (20%) | 1,500,000 | $3.75/1M | $5.63 |
| Cache reads (80%) | 6,000,000 | $0.30/1M | $1.80 |
| Non-cached input | 100,000 | $3/1M | $0.30 |
| Output | 250,000 | $15/1M | $3.75 |
| Total | $11.48/day |
Monthly: $344.40, a 57% reduction. But you can do better.
Optimization 2: switch to the 1-hour cache
Anthropic shortened the default cache TTL from 60 minutes to 5 minutes in early 2026. That change quietly raised effective costs for any app with longer idle periods between requests, including overnight gaps.
The 1-hour TTL is still available as a paid option: cache writes cost 2x the standard input rate ($6/1M for Sonnet 4.6). More expensive per write, but you write far less often. If your traffic is steady during business hours, the 1-hour cache is worth the premium.
At 95% cache hit rate (achievable with consistent daytime usage patterns):
| Token type | Daily volume | Rate | Daily cost |
|---|---|---|---|
| Cache writes (5%) | 375,000 | $6/1M | $2.25 |
| Cache reads (95%) | 7,125,000 | $0.30/1M | $2.14 |
| Non-cached input | 100,000 | $3/1M | $0.30 |
| Output | 250,000 | $15/1M | $3.75 |
| Total | $8.44/day |
Monthly: $253.20, 68% less than the baseline. One cache write every 20 calls instead of every five.
Optimization 3: the batch API
If same-hour delivery is acceptable (internal tools, scheduled pipelines, overnight report generation), Anthropic's Message Batches API cuts all token costs by 50%. Most batches complete in under an hour. The 50% discount applies to cache writes, cache reads, and output tokens alike.
Stacked on the 1-hour cache at 95% hit rate:
| Token type | Daily volume | Rate | Daily cost |
|---|---|---|---|
| Cache writes (5%) | 375,000 | $3/1M | $1.13 |
| Cache reads (95%) | 7,125,000 | $0.15/1M | $1.07 |
| Non-cached input | 100,000 | $1.50/1M | $0.15 |
| Output | 250,000 | $7.50/1M | $1.88 |
| Total | $4.23/day |
Monthly: $126.90, an 84% reduction from $796.50.
Same 500 calls. Same prompts. Same output quality. The only tradeoff: responses queue and return within the hour rather than in real time.
The full picture
| Approach | Monthly cost | vs. baseline |
|---|---|---|
| No optimization | $796.50 | baseline |
| 5-min cache, 80% hit rate | $344.40 | -57% |
| 1-hour cache, 95% hit rate | $253.20 | -68% |
| 1-hour cache + batch | $126.90 | -84% |
The root cause of the $797 baseline is the same hidden cost that drives most inflated LLM bills: prompt bloat. A 15,000-token static prefix repeated on every request is the problem. Caching is the exact countermeasure.
Subscription vs API
Claude's subscription plans cover claude.ai and Claude Code usage — not raw API calls to your application. If you are building a product, you are on the pay-per-token API regardless of which subscription plan you hold.
| Plan | Price | Who it suits |
|---|---|---|
| Claude Free | $0/mo | Occasional personal use, message-limited |
| Claude Pro | $20/mo | ~45 Sonnet messages per 5-hour window; personal writing and coding |
| Claude Max 5x | $100/mo | ~225 messages/5-hour window; daily heavy Claude Code sessions |
| Claude Max 20x | $200/mo | ~900 messages/5-hour window; all-day agentic coding workflows |
| API direct | Pay-per-token | Building products, batch workloads, custom integrations |
Break-even for solo developers using Claude Code. At Max 5x ($100/month), you get roughly 225 Sonnet-equivalent messages per five-hour window. Running those through the API directly at Sonnet 4.6 rates costs about $0.04 to $0.08 per typical coding exchange (a few thousand input tokens, a few hundred output). At $0.06 average, 225 messages costs roughly $13 per five-hour window. Max 5x becomes cost-effective once you are consistently hitting your Pro limit and your personal coding workflow would otherwise cost over $100/month via the API.
For product builders shipping to real users: no subscription covers production API calls. You need the API, and optimizing it is the only lever you have.
See Anthropic's pricing page for current plan details and any changes to included usage limits.
One note on Opus 4.7
If you are on Opus 4.7 for higher output quality, its new tokenizer produces roughly 35% more tokens from the same input text compared to earlier models. The per-token rate is $5/$25 (input/output) vs $3/$15 for Sonnet 4.6, so the effective cost gap is larger than the rate difference alone suggests. The documentation assistant scenario above would start at roughly $4,200/month on Opus 4.7 with no optimization. Caching matters even more at that baseline.
Start with Sonnet 4.6 and test whether Opus-level output quality is actually needed for your use case. The LLMTest proxy gives you real cost-per-call data across both models so the decision is based on your actual prompts, not estimates.