ZZenvestly
← All Tools
💸AI · Reference

AI Model API Pricing

Per-million-token API pricing across OpenAI, Anthropic, Google, and Mistral. Input cost, output cost, context window — sourced directly from each provider's pricing page.

Verified: 2026-05-06 · Per-million-token prices and context windows taken from each provider's official API pricing page. Verified April–May 2026.

Monthly cost calculator

Enter expected monthly token volume — see what each model would cost. All math runs in your browser.

Quick presets:

Cheapest
Gemini 1.5 Flash-8B
$3.19 / mo
Most expensive
Claude 3 Opus
$1,500 / mo
471× cheapest
ModelProviderInputOutputTotal / mo
Gemini 1.5 Flash-8BGoogle$0.938$2.25$3.19
Gemini 1.5 FlashGoogle$1.88$4.50$6.38
GPT-4o miniOpenAI$3.75$9.00$12.8
Mistral SmallMistral$5.00$9.00$14.0
CodestralMistral$7.50$13.5$21.0
Claude 3 HaikuAnthropic$6.25$18.8$25.0
Claude 3.5 HaikuAnthropic$20.0$60.0$80.0
Gemini 1.5 ProGoogle$31.3$75.0$106
Mistral LargeMistral$50.0$90.0$140
GPT-4oOpenAI$62.5$150$213
o1-miniOpenAI$75.0$180$255
Claude 3.5 SonnetAnthropic$75.0$225$300
GPT-4 TurboOpenAI$250$450$700
o1OpenAI$375$900$1,275
Claude 3 OpusAnthropic$375$1,125$1,500

Note: standard real-time pricing only. Batch and cached-input discounts (typically 50% off) are not applied. Prices verified 2026-05-06.

OpenAI

openai.com/api/pricing
ModelContextInput $/1MOutput $/1MNotes
GPT-4o128K$2.50$10.00Flagship multimodal; image input supported
GPT-4o mini128K$0.150$0.600Fast and cheap; recommended for most production traffic
GPT-4 Turbo128K$10.00$30.00Older flagship; consider GPT-4o instead at lower cost
o1200K$15.00$60.00Reasoning model; counts internal tokens too
o1-mini128K$3.00$12.00Cheaper reasoning

Anthropic

anthropic.com/pricing#api
ModelContextInput $/1MOutput $/1MNotes
Claude 3.5 Sonnet200K$3.00$15.00Best general-purpose; default choice for most coding/analysis
Claude 3.5 Haiku200K$0.800$4.00Fast and cheap; close to Sonnet quality on simpler tasks
Claude 3 Opus200K$15.00$75.00Older flagship; mostly superseded by 3.5 Sonnet
Claude 3 Haiku200K$0.250$1.25Cheapest option; fine for classification/extraction

Google

ai.google.dev/pricing
ModelContextInput $/1MOutput $/1MNotes
Gemini 1.5 Pro2,000K$1.25$5.002M context — largest in this table; native multimodal (video, audio, image)
Gemini 1.5 Flash1,000K$0.075$0.300Cheapest mainstream model; 1M context
Gemini 1.5 Flash-8B1,000K$0.037$0.150Smallest Gemini variant; sub-$0.05/1M input

Mistral

mistral.ai/technology
ModelContextInput $/1MOutput $/1MNotes
Mistral Large128K$2.00$6.00Mistral flagship; competitive vs. GPT-4o
Mistral Small128K$0.200$0.600Mid-tier
Codestral32K$0.300$0.900Code-specialized

How to read your result

The cost calculator at the top lets you punch in expected monthly token volume and compare what every model would cost for that exact workload. The bar chart sorts cheapest-to-most-expensive automatically. Use the presets (hobby chat, small SaaS, mid SaaS, heavy enterprise) as starting points if you don't yet know your usage.

Each row in the per-provider tables below is one model. The two cost columns ('Input $/1M' and 'Output $/1M') are the per-million-token prices. Output is almost always more expensive than input — typically 3–4× — because the model has to actually run inference to produce each output token, while input tokens are 'just read'.

'Context window' is the maximum number of tokens (input + output combined, roughly) the model can hold in a single request. 128K tokens ≈ 96,000 words ≈ a 350-page book. Gemini 1.5 Pro's 2M context is enough for entire codebases or multi-hour transcripts in a single prompt.

Cost-of-use math: for typical chat (≈500 input + 300 output tokens per turn), GPT-4o costs ~$0.0042 per turn, GPT-4o mini ~$0.00026 per turn, Gemini 1.5 Flash ~$0.0001 per turn — that's a 40× spread. For high-volume API workloads, model choice often matters more than prompt optimization.

When to use this tool

  • Estimating monthly API spend before committing to a model. Multiply expected (input + output) tokens per month by the per-million rate.
  • Comparing models on the cost-per-call dimension when capability is roughly equivalent. Claude Haiku vs. GPT-4o mini vs. Gemini Flash all cluster around the cheap end with similar capability for classification/extraction tasks.
  • Choosing the right context window for a workload. If you need 500K-token inputs (long documents, full codebases), Gemini 1.5 is the only mainstream choice; everyone else maxes at 128–200K.
  • Auditing API usage. Pull your last bill, divide by per-million rate, and verify the token count matches what you expect from your traffic.

Methodology

All prices are per million tokens, in USD, taken from each provider's official API pricing page (linked in the 'Source' column). Some providers also offer batch processing or cached input pricing at 50–80% discount; we list the standard real-time rate.

Context window is published in the same vendor pricing/spec page. We list it in thousands of tokens (K) for compact display. 1K tokens ≈ 750 English words.

API prices change every few months and tend to fall (the inference cost curve has been negative since 2023). Verified date on this page reflects the most recent re-check; we re-verify on a 60-day rolling cadence.

Limits we acknowledge: this table covers mainstream public APIs. We do not list provisioned-throughput pricing (used by Azure OpenAI, Bedrock, etc., where you pay for reserved capacity instead of per-token). For enterprise procurement comparisons, check the provider's dedicated-capacity pricing page directly.

Site-wide methodology framework: /methodology/ · Pre-publication standards: /editorial-standards/

FAQ

Why is output more expensive than input?

Because output tokens are computed sequentially — each one requires a forward pass through the model — while input tokens can be processed in a single batched pass. That asymmetry compounds with model size: for the largest models, generating 1,000 output tokens costs roughly 3–4× more compute than processing 1,000 input tokens. Pricing reflects that compute asymmetry.

Is GPT-4o mini really 40× cheaper than GPT-4o?

On per-token list price, yes (input: $0.15 vs $2.50/1M; output: $0.60 vs $10/1M — both 16× cheaper). The actual cost ratio per task depends on whether mini gets the answer in fewer tokens or whether it requires more retries. For classification, extraction, simple chat, and summarization, mini is usually the right default. For complex reasoning or code generation, full GPT-4o pays for itself in fewer retries.

What does '2000K context' mean for Gemini 1.5 Pro in practice?

It means you can put roughly 1,500,000 words of input into a single prompt and the model can attend to all of it. Use cases that benefit: full codebase Q&A, multi-document research, long-form transcript analysis. Practical caveat: most workflows don't need anywhere near that — 32K context handles 90% of chat-and-analysis tasks. Don't pay the Pro premium for context you won't use; Flash at 1M context is a fraction of the cost.

How do I estimate monthly cost from my expected usage?

Sum your expected monthly input tokens × input rate + expected output tokens × output rate, all in millions. Example: 100M input + 30M output per month on Claude 3.5 Sonnet = 100 × $3 + 30 × $15 = $300 + $450 = $750/month. The /tools/ai-subscription-tco/ calculator covers the seat-cost side; this reference covers the API-cost side.

Why aren't AWS Bedrock or Azure OpenAI prices in the table?

Because those providers re-host the same underlying models at their own pricing — sometimes identical to the original vendor, sometimes with markup, sometimes with reserved-capacity pricing instead of per-token. The table covers the source-of-truth provider for each model. If you're on AWS or Azure, check their respective price page; the model capability is the same as the source vendor.

Why are there no DeepSeek, xAI Grok, or other recent models?

We list mainstream API options that have stable pricing pages and 6+ months of public availability. New entrants get added once their pricing is stable; the table is meant as a reliable reference, not a leaderboard of the latest releases. For the very latest, check each provider's blog or release notes — the relative cost ranks update slowly, so the cheap-cheaper-cheapest categories above are usually still right even when individual prices shift.

Related Tools

Disclaimer: API pricing changes frequently and is set entirely by each provider. The table reflects the most recent verification date listed above. Always re-check the source pricing page before committing to a long-term cost plan. Prices are USD-denominated list rates and exclude any annual-commit discounts, batch discounts, or cached-input discounts the provider may offer.