AI Model Release Timeline
When each major AI model shipped, who released it, and what was notable. From ChatGPT's November 2022 launch through the most recent releases.
Verified: 2026-05-06 · 17 releases
- ·AnthropicFrontier
Claude 3.5 Sonnet (v2)
Improved reasoning + agentic tool-use; fastest sub-$5/1M frontier-class model at release.
Context: 200K tokens - ·OpenAIFrontier
GPT-4o (April refresh)
Latency reduction + voice mode v2.
Context: 128K tokens - ·GoogleFrontier
Gemini 2.0
Native multimodal (video + audio + image); 2M context generally available.
Context: 2,000K tokens - ·OpenAIReasoning
o1
First public reasoning model; counts internal 'thinking' tokens.
Context: 200K tokens - ·AnthropicCheap
Claude 3.5 Haiku
Sub-$1/1M input; close to Sonnet quality on simpler tasks.
Context: 200K tokens - ·OpenAICheap
GPT-4o mini
$0.15 input / $0.60 output per 1M; replaces GPT-3.5 Turbo as default cheap option.
Context: 128K tokens - ·MistralFrontier
Mistral Large 2
EU-based frontier alternative; improved code and math benchmarks.
Context: 128K tokens - ·AnthropicFrontier
Claude 3.5 Sonnet (v1)
First model to reliably top human-eval coding benchmarks.
Context: 200K tokens - ·OpenAIFrontier
GPT-4o
First mainstream multimodal (text+image+audio) frontier model; 50% cheaper than GPT-4 Turbo.
Context: 128K tokens - ·GoogleLong-context
Gemini 1.5 Pro
First widely-available 2M-token context window.
Context: 2,000K tokens - ·AnthropicFrontier
Claude 3 Opus / Sonnet / Haiku
Three-tier release (flagship, mid, cheap) — set the pricing template most providers now follow.
Context: 200K tokens - ·OpenAIFrontier
GPT-4 Turbo
First mainstream 128K context; significantly cheaper than original GPT-4.
Context: 128K tokens - ·GoogleFrontier
Gemini 1.0
Google's response to GPT-4; native multimodal training claim.
Context: 32K tokens - ·OpenAIFrontier
GPT-4 Turbo (preview)
Knowledge cutoff moved to April 2023; 3x cheaper than GPT-4 base.
Context: 128K tokens - ·AnthropicFrontier
Claude 2
First mainstream 100K context window.
Context: 100K tokens - ·OpenAIFrontier
GPT-4
Frontier capability jump; original 8K context, $30/1M input.
Context: 8K tokens - ·OpenAIFoundation
ChatGPT (GPT-3.5)
Public launch — the moment 'AI tools' became a consumer category.
Context: 4K tokens
How to read your result
The timeline reads top (newest) to bottom (oldest). Each entry is a model release; the vendor and category badges in the corner show who shipped it and what slot it filled (Frontier, Cheap, Reasoning, Long-context, Foundation).
Watch the cadence. From late 2022 through 2024, the gap between mainstream model releases was 4–6 months; from late 2024 through 2026, it's compressed to 6–10 weeks. The 'frontier' label moves between vendors roughly every release cycle.
Context window is the most legible single-number progression: 4K (ChatGPT launch) → 8K (GPT-4) → 100K (Claude 2) → 128K (GPT-4 Turbo) → 200K (Claude 3) → 1M (Gemini 1.5 Flash) → 2M (Gemini 1.5/2.0 Pro). Cost per million tokens has fallen roughly 10× over the same period.
When to use this tool
- ›Establishing context for a model decision. Knowing 'GPT-4o is from May 2025' helps when comparing against alternatives released since.
- ›Tracking vendor cadence. If your stack depends on Claude, the release pattern (every 4–6 months historically) tells you when to plan an upgrade evaluation.
- ›Researching capability evolution. Earlier entries provide reference points for 'how much has changed' — a 2026 mid-tier model is roughly equivalent to the late-2024 frontier.
- ›Writing about AI history. The timeline is a fact-checked source for 'when did X model ship and what was new about it'.
Methodology
Each entry is taken from the vendor's official release announcement (blog post, model-card publication, or pricing-page launch). We list the public availability date, not the research preview date.
We cover only mainstream public APIs and consumer products. Open-weights releases (Llama, Mistral 7B, Qwen) are noted separately when capabilities cross commercially-significant thresholds, but research-only releases are excluded.
Context window and category are taken from the vendor's spec page at release time. When a vendor later raises the context (e.g., GPT-4 base 8K → GPT-4-32K → GPT-4 Turbo 128K), we list the original release alongside the major bumps as separate entries.
Limits we acknowledge: 'frontier' is a judgment call. We use the label for models that, at release, set or matched the public state of the art on standard benchmarks. Reasonable people may disagree on edge cases — the label is a heuristic for 'top tier at the time', not a precise ranking.
Site-wide methodology framework: /methodology/ · Pre-publication standards: /editorial-standards/
FAQ
Why isn't Llama / Mistral 7B / Qwen on the timeline?
We focus on mainstream public APIs and consumer products. Open-weights models matter and are influential, but they require separate evaluation (you self-host or use a cloud provider's hosted version, with different cost/latency/privacy profiles). A separate open-weights timeline could be added in the future.
How is 'context window' a useful comparison?
It's the practical ceiling on how much input a model can handle in a single request. 4K (ChatGPT launch) is roughly 3,000 words — enough for a chat but not a document. 128K is roughly 96,000 words — enough for a small book. 2M is roughly 1.5 million words — enough for a full codebase or a multi-hour transcript. Different workflows require different ceilings; the timeline shows when each ceiling became publicly available.
What does the 'category' label mean?
Frontier = sets or matches state of the art at release. Cheap = significantly cheaper than the contemporary frontier, with usable but not flagship quality. Reasoning = explicitly trades latency for chain-of-thought (o1 family). Long-context = primary differentiator is handling much longer inputs (Gemini 1.5+). Foundation = pre-frontier-era foundational releases (ChatGPT, GPT-3.5).
How often is the timeline updated?
When a major release ships. We aim to add new entries within 1–2 weeks of a public-availability announcement. Refreshes to existing entries happen if a vendor later updates their spec (e.g., context window raise). The 'verified' date at the top of the page reflects the most recent re-check.
Why does the timeline start in November 2022?
Because that's when 'AI tools' became a consumer category — ChatGPT's public launch with GPT-3.5. There were many language models before (GPT-3, BERT, T5), but the November 2022 launch is the inflection point where mainstream usage of these tools began. Earlier model history is genuinely interesting but doesn't fit the consumer-tools framing of this site.