GPT-5 in production 2026: $2.13–$40 per 1,000 requests

"How much does GPT-5 cost?" has at least five correct answers right now. The sticker price is $1.25 per million input tokens and $10 per million output, but that number is the floor, not the bill. Your actual cost depends on which GPT-5 generation you're using, how long your prompts are, and whether you're in a position to use the batch API. A chat bot and a document extraction pipeline can be on identical pricing tiers and diverge 5x on cost per call.

This post does the workload math and shows what the real monthly numbers look like.

GPT-5 vs GPT-5.5: the decision that locks in your ceiling

OpenAI's GPT-5 family now spans several generations. The two you're realistically choosing between in mid-2026:

Model	Input	Output	Batch API
GPT-5	$1.25/M tokens	$10.00/M tokens	50% off
GPT-5.5	$5.00/M tokens	$30.00/M tokens	50% off

GPT-5.5 arrived in April 2026 at 4x the input cost and 3x the output cost of the original GPT-5. The quality gap is real on multi-step reasoning, complex code generation, and long-context extraction. It's not marketing. The question is whether your workload is actually bottlenecked on those capabilities.

For most extraction and summarization tasks, GPT-5 closes the gap once you add prompt caching. For frontier coding agents or tasks where output quality converts directly to user retention, GPT-5.5 earns its cost. For everything else, GPT-5 at $1.25/M input is the default choice.

Three workloads, exact numbers

These token counts represent realistic production call shapes. Adjust the input/output sizes if your calls are different; the per-token math scales linearly.

Chat (support bots, Q&A assistants)

Typical call: 500 input tokens (100-token system prompt, 200-token conversation history, 200-token user message) and 150 output tokens.

Model	Per call	Per 1,000 calls
GPT-5	$0.002125	$2.13
GPT-5.5	$0.007000	$7.00

Chat is input-light and output-light. At this token shape, output tokens account for 71% of your GPT-5 bill ($0.0015 of $0.002125) despite being only 23% of the tokens. The 8x output-to-input price ratio is significant even at low volumes. If your chat answers are getting longer (GPT-5.5 tends toward verbose, structured responses), that ratio worsens fast.

The prompt caching break-even analysis covers what caching does to the math here: a stable system prompt drops the effective input cost by 90% on repeated calls.

Document extraction (invoices, receipts, contracts to JSON)

Typical call: 2,000 input tokens (OCR'd document text), 200 output tokens (structured JSON response).

Model	Per call	Per 1,000 calls
GPT-5	$0.004500	$4.50
GPT-5.5	$0.016000	$16.00

Extraction workloads are input-heavy. The document dominates costs; the compact JSON output is a fraction of the bill. If you're not already on the batch API for extraction, you're leaving 50% on the table. Most extraction pipelines run on overnight batches or async queues, not real-time user requests.

At $2.25/1k calls on GPT-5 with batch, document extraction becomes cheaper than real-time chat on either model.

Summarization (long reports, transcripts, support ticket threads)

Typical call: 5,000 input tokens (a 4-page document), 500 output tokens.

Model	Per call	Per 1,000 calls
GPT-5	$0.011250	$11.25
GPT-5.5	$0.040000	$40.00

Summarization is the most expensive per-call shape, and where the model choice makes the biggest dollar difference. On GPT-5.5, $40/1k calls is a real number: 10,000 summaries per day runs to $12,000/month before any optimization. GPT-5 at $11.25/1k is dramatically more approachable, and the quality delta on summarization is smaller than on code or reasoning tasks.

Worked example: an invoice extraction product

You run a SaaS that extracts line items from supplier invoices. You're processing 5,000 invoices per day. Each invoice averages 1,500 input tokens and produces 180 output tokens of structured JSON.

Baseline, GPT-5 real-time API:

Token type	Daily volume	Rate	Daily cost
Input	7,500,000	$1.25/M	$9.38
Output	900,000	$10/M	$9.00
Total			$18.38

Monthly: $551

Your customers upload invoices through a portal; processing happens in a queue and results appear within minutes. No user is watching a spinner. Batch API is a natural fit.

With batch API (50% off, results within 24 hours):

Token type	Daily volume	Rate	Daily cost
Input	7,500,000	$0.625/M	$4.69
Output	900,000	$5.00/M	$4.50
Total			$9.19

Monthly: $276

One configuration change (submitting requests asynchronously instead of synchronously) cuts the bill from $551 to $276. That $275/month delta is $3,300 annually, and it doesn't touch your model version, prompt structure, or output quality. The three LLM costs nobody talks about covers what tends to push bills back up once you've made the obvious cuts.

If you ever need real-time extraction (a user clicks "process now" and waits), you pay the $551 rate for those calls and batch the rest. Most extraction workloads are 80% batch-eligible.

Subscription vs API

ChatGPT subscriptions and the API are separate products that serve different buyers:

Tier	Monthly	Access	Break-even vs GPT-5 API (chat)
ChatGPT Plus	$20	GPT-5.5 default, ~160 msg/3h, Deep Research (10/mo), Codex	~9,400 calls/mo
ChatGPT Pro	$100	5x quota, GPT-5.5 Thinking, 50 Deep Research sessions	~47,000 calls/mo
ChatGPT Pro (Max)	$200	Near-unlimited GPT-5.5, 1M context window, Sora priority	~94,000 calls/mo
API	pay-as-you-go	Programmatic access, full control, rate limits apply	Wins for production APIs

The key distinction: subscriptions cover your own usage on chatgpt.com, not API calls your users make. If you're building a feature that serves other people, every call goes through the API at standard rates. No subscription tier changes that.

Where Plus wins: if you use ChatGPT heavily for your own dev work (writing, debugging, research), $20/month for GPT-5.5 access is cheaper than roughly 9,400 equivalent API calls. That's $20 / $0.002125 per call. For a solo developer running 300+ personal queries per day, Plus pays for itself. For anyone shipping a product to users, API pricing is the only number that matters. Verify current pricing on ChatGPT's pricing page; rates have shifted frequently in 2026.

Where the bill surprises people

Output tokens cost 8x what input tokens cost. On GPT-5, $10/M output vs $1.25/M input. A model that gives longer answers is charging you disproportionately more. GPT-5.5 in particular tends toward detailed, structured responses even when the task didn't ask for them. If you're seeing higher-than-expected bills and haven't looked at your average output token count lately, that's where to start.

Batch doesn't always help. The batch API applies to requests submitted as a file and processed asynchronously within 24 hours. For real-time features (chat, instant answers, anything with a user waiting), batch isn't an option. Know which parts of your product are user-facing and which are pipeline jobs before you model your costs.

Model versioning creates surprise bill changes. If you're pinned to gpt-5 (the version identifier, not a floating alias), you stay on original GPT-5 pricing. If you're using a floating alias like gpt-5-latest, OpenAI may route you to a newer, pricier model tier. Check your API calls — the LLMTest proxy tracks this per-request so you can see version-level cost breakdowns without parsing provider logs manually.

Picking the right model for your budget

For chat under a few million calls per day: GPT-5 at $2.13/1k, with prompt caching on your system prompt and a spending cap set in your provider dashboard. The fallback chain pattern keeps costs bounded if a bug triggers runaway retries.

For extraction: GPT-5 plus batch API at $2.25/1k. Only upgrade to GPT-5.5 if extraction accuracy is actually the constraint; run a small quality eval first.

For summarization: GPT-5 at $11.25/1k real-time, or $5.63/1k on batch. GPT-5.5 at $40/1k is a hard sell unless you've confirmed the quality gap matters for your specific documents.

If you want to see these numbers against your actual traffic instead of estimates, routing through LLMTest gives you per-call breakdowns by model, token type, and endpoint, with no code changes beyond swapping the base URL.

Ship LLM features without burning your budget.

Related articles