LLM API Pricing · 2026
LLM API Pricing in 2026: Live Numbers from Every Provider
What every major LLM costs in 2026 and what it can actually do, in one table. GPT-5.5, Claude Opus 4.7, Gemini 2.5 Pro, Llama 4, DeepSeek, Grok, plus the smaller variants worth shipping. Pricing syncs daily from provider APIs. We're LLMTest, the AI proxy that benchmarks all of them on real tasks so you don't have to.
Verified 2026-05-02 · Pricing synced daily · Need capability data? See the LLM capabilities matrix.
Don't pick the perfect model. Ship it rough.
LLMTest is an AI proxy. On every call, we auto-pick the cheapest model that hits your quality bar. We also rewrite weak prompts, handle fallbacks when an API goes down, and run weekly benchmarks across 340+ models so we know what's actually working right now. Drop it in once. Ship features instead of comparing pricing tables.
Start optimizing| Model▲ | Input $/M▲ | Output $/M▲ | Context▲ | Vision | Tools | JSON | Cache | Max Out▲ |
|---|---|---|---|---|---|---|---|---|
|
GPT-5.5
Flagship
OpenAI · openai/gpt-5.5
|
$5.00 | $30.00 | 1.1M | ✓ | ✓ | ✓ | ✓ | 16K |
|
GPT-5
Flagship
OpenAI · openai/gpt-5
|
$1.25 | $10.00 | 400K | ✓ | ✓ | ✓ | ✓ | 16K |
|
GPT-4.1
Flagship
OpenAI · openai/gpt-4.1
|
$2.00 | $8.00 | 1.0M | ✓ | ✓ | ✓ | ✓ | 16K |
|
Claude Opus 4.7
Flagship
Anthropic · anthropic/claude-opus-4.7
|
$5.00 | $25.00 | 1M | ✓ | ✓ | ✓ | ✓ | 8K |
|
Claude Opus 4
Flagship
Anthropic · anthropic/claude-opus-4
|
$15.00 | $75.00 | 200K | ✓ | ✓ | ✓ | ✓ | 8K |
|
Gemini 2.5 Pro
Flagship
Google · google/gemini-2.5-pro
|
$1.25 | $10.00 | 1.0M | ✓ | ✓ | ✓ | ✓ | 66K |
|
Grok 4
Flagship
xAI · x-ai/grok-4
|
$3.00 | $15.00 | 256K | ✓ | ✓ | ✓ | — | 8K |
|
Sonar Pro
Flagship
Perplexity · perplexity/sonar-pro
|
$3.00 | $15.00 | 200K | — | — | ✓ | — | 8K |
|
o3
Reasoning
OpenAI · openai/o3
|
$2.00 | $8.00 | 200K | ✓ | ✓ | ✓ | ✓ | 100K |
|
o3-mini
Reasoning
OpenAI · openai/o3-mini
|
$1.10 | $4.40 | 200K | — | ✓ | ✓ | ✓ | 100K |
|
GPT-5 Mini
Mid
OpenAI · openai/gpt-5-mini
|
$0.25 | $2.00 | 400K | ✓ | ✓ | ✓ | ✓ | 8K |
|
GPT-4o
Mid
OpenAI · openai/gpt-4o
|
$2.50 | $10.00 | 128K | ✓ | ✓ | ✓ | ✓ | 16K |
|
Claude Sonnet 4.6
Mid
Anthropic · anthropic/claude-sonnet-4.6
|
$3.00 | $15.00 | 1M | ✓ | ✓ | ✓ | ✓ | 8K |
|
Claude Sonnet 4
Mid
Anthropic · anthropic/claude-sonnet-4
|
$3.00 | $15.00 | 1M | ✓ | ✓ | ✓ | ✓ | 8K |
|
Gemini 2.5 Flash
Mid
Google · google/gemini-2.5-flash
|
$0.30 | $2.50 | 1.0M | ✓ | ✓ | ✓ | ✓ | 66K |
|
Mistral Medium 3
Mid
Mistral · mistralai/mistral-medium-3
|
$0.40 | $2.00 | 131K | — | ✓ | ✓ | — | 8K |
|
GPT-5 Nano
Small
OpenAI · openai/gpt-5-nano
|
$0.05 | $0.40 | 400K | — | ✓ | ✓ | ✓ | 4K |
|
GPT-4o Mini
Small
OpenAI · openai/gpt-4o-mini
|
$0.15 | $0.60 | 128K | ✓ | ✓ | ✓ | ✓ | 16K |
|
Claude Haiku 4.5
Small
Anthropic · anthropic/claude-haiku-4.5
|
$1.00 | $5.00 | 200K | ✓ | ✓ | ✓ | ✓ | 8K |
|
Gemini 2.5 Flash Lite
Small
Google · google/gemini-2.5-flash-lite
|
$0.10 | $0.40 | 1.0M | ✓ | ✓ | ✓ | ✓ | 8K |
Prices in USD per million tokens. Audio input, batch API, and knowledge cutoff aren't shown here. Check provider docs for those.
How to choose: price is half the story
The cheapest model usually isn't the right one. If a model costs half as much but only gets things right 60% of the time, you're spending the savings on retries plus the humans who clean up its mess. Real-world cost is cost per correct answer, not cost per million tokens.
A few things the table above won't tell you:
- Match the model to the task. A flagship at coding can be mid-tier at translation. Our SQL benchmark caught a $0.15/M model beating a $5/M model 5× cleaner.
- Use prompt caching if your system prompt is stable across calls. It cuts production costs 60% to 80%. That's a bigger lever than picking a cheaper base model.
- Cascade for cost. Send each call to a small model first, escalate to a flagship only when the small one can't hack it. Almost no one does this. Why fallback chains beat single-model setups.
- Test on your data. No leaderboard predicts how a model behaves on your specific prompts. Not even ours. Pick the top 2 or 3 candidates and run 50 to 100 representative samples through each.
For the full framework, read How to choose an LLM in 2026: the definitive guide.