Question 1

Why is output more expensive than input?

Accepted Answer

Because output tokens are computed sequentially — each one requires a forward pass through the model — while input tokens can be processed in a single batched pass. That asymmetry compounds with model size: for the largest models, generating 1,000 output tokens costs roughly 3–4× more compute than processing 1,000 input tokens. Pricing reflects that compute asymmetry.

Question 2

Is GPT-4o mini really 40× cheaper than GPT-4o?

Accepted Answer

On per-token list price, yes (input: $0.15 vs $2.50/1M; output: $0.60 vs $10/1M — both 16× cheaper). The actual cost ratio per task depends on whether mini gets the answer in fewer tokens or whether it requires more retries. For classification, extraction, simple chat, and summarization, mini is usually the right default. For complex reasoning or code generation, full GPT-4o pays for itself in fewer retries.

Question 3

What does '2000K context' mean for Gemini 1.5 Pro in practice?

Accepted Answer

It means you can put roughly 1,500,000 words of input into a single prompt and the model can attend to all of it. Use cases that benefit: full codebase Q&A, multi-document research, long-form transcript analysis. Practical caveat: most workflows don't need anywhere near that — 32K context handles 90% of chat-and-analysis tasks. Don't pay the Pro premium for context you won't use; Flash at 1M context is a fraction of the cost.

Question 4

How do I estimate monthly cost from my expected usage?

Accepted Answer

Sum your expected monthly input tokens × input rate + expected output tokens × output rate, all in millions. Example: 100M input + 30M output per month on Claude 3.5 Sonnet = 100 × $3 + 30 × $15 = $300 + $450 = $750/month. The /tools/ai-subscription-tco/ calculator covers the seat-cost side; this reference covers the API-cost side.

Question 5

Why aren't AWS Bedrock or Azure OpenAI prices in the table?

Accepted Answer

Because those providers re-host the same underlying models at their own pricing — sometimes identical to the original vendor, sometimes with markup, sometimes with reserved-capacity pricing instead of per-token. The table covers the source-of-truth provider for each model. If you're on AWS or Azure, check their respective price page; the model capability is the same as the source vendor.

Question 6

Why are there no DeepSeek, xAI Grok, or other recent models?

Accepted Answer

We list mainstream API options that have stable pricing pages and 6+ months of public availability. New entrants get added once their pricing is stable; the table is meant as a reliable reference, not a leaderboard of the latest releases. For the very latest, check each provider's blog or release notes — the relative cost ranks update slowly, so the cheap-cheaper-cheapest categories above are usually still right even when individual prices shift.

Model	Provider	Input	Output	Total / mo
Gemini 1.5 Flash-8B	Google	$0.938	$2.25	$3.19
Gemini 1.5 Flash	Google	$1.88	$4.50	$6.38
GPT-4o mini	OpenAI	$3.75	$9.00	$12.8
Mistral Small	Mistral	$5.00	$9.00	$14.0
Codestral	Mistral	$7.50	$13.5	$21.0
Claude 3 Haiku	Anthropic	$6.25	$18.8	$25.0
Claude 3.5 Haiku	Anthropic	$20.0	$60.0	$80.0
Gemini 1.5 Pro	Google	$31.3	$75.0	$106
Mistral Large	Mistral	$50.0	$90.0	$140
GPT-4o	OpenAI	$62.5	$150	$213
o1-mini	OpenAI	$75.0	$180	$255
Claude 3.5 Sonnet	Anthropic	$75.0	$225	$300
GPT-4 Turbo	OpenAI	$250	$450	$700
o1	OpenAI	$375	$900	$1,275
Claude 3 Opus	Anthropic	$375	$1,125	$1,500

Model	Context	Input $/1M	Output $/1M	Notes
GPT-4o	128K	$2.50	$10.00	Flagship multimodal; image input supported
GPT-4o mini	128K	$0.150	$0.600	Fast and cheap; recommended for most production traffic
GPT-4 Turbo	128K	$10.00	$30.00	Older flagship; consider GPT-4o instead at lower cost
o1	200K	$15.00	$60.00	Reasoning model; counts internal tokens too
o1-mini	128K	$3.00	$12.00	Cheaper reasoning

Model	Context	Input $/1M	Output $/1M	Notes
Claude 3.5 Sonnet	200K	$3.00	$15.00	Best general-purpose; default choice for most coding/analysis
Claude 3.5 Haiku	200K	$0.800	$4.00	Fast and cheap; close to Sonnet quality on simpler tasks
Claude 3 Opus	200K	$15.00	$75.00	Older flagship; mostly superseded by 3.5 Sonnet
Claude 3 Haiku	200K	$0.250	$1.25	Cheapest option; fine for classification/extraction

Model	Context	Input $/1M	Output $/1M	Notes
Gemini 1.5 Pro	2,000K	$1.25	$5.00	2M context — largest in this table; native multimodal (video, audio, image)
Gemini 1.5 Flash	1,000K	$0.075	$0.300	Cheapest mainstream model; 1M context
Gemini 1.5 Flash-8B	1,000K	$0.037	$0.150	Smallest Gemini variant; sub-$0.05/1M input

Model	Context	Input $/1M	Output $/1M	Notes
Mistral Large	128K	$2.00	$6.00	Mistral flagship; competitive vs. GPT-4o
Mistral Small	128K	$0.200	$0.600	Mid-tier
Codestral	32K	$0.300	$0.900	Code-specialized

AI Model API Pricing

Monthly cost calculator

OpenAI

Anthropic

Google

Mistral

How to read your result

When to use this tool

Methodology

FAQ