ZZenvestly
← All Tools
🗓️AI · Reference

AI Model Release Timeline

When each major AI model shipped, who released it, and what was notable. From ChatGPT's November 2022 launch through the most recent releases.

Verified: 2026-05-06 · 17 releases

  1. ·AnthropicFrontier

    Claude 3.5 Sonnet (v2)

    Improved reasoning + agentic tool-use; fastest sub-$5/1M frontier-class model at release.

    Context: 200K tokens
  2. ·OpenAIFrontier

    GPT-4o (April refresh)

    Latency reduction + voice mode v2.

    Context: 128K tokens
  3. ·GoogleFrontier

    Gemini 2.0

    Native multimodal (video + audio + image); 2M context generally available.

    Context: 2,000K tokens
  4. ·OpenAIReasoning

    o1

    First public reasoning model; counts internal 'thinking' tokens.

    Context: 200K tokens
  5. ·AnthropicCheap

    Claude 3.5 Haiku

    Sub-$1/1M input; close to Sonnet quality on simpler tasks.

    Context: 200K tokens
  6. ·OpenAICheap

    GPT-4o mini

    $0.15 input / $0.60 output per 1M; replaces GPT-3.5 Turbo as default cheap option.

    Context: 128K tokens
  7. ·MistralFrontier

    Mistral Large 2

    EU-based frontier alternative; improved code and math benchmarks.

    Context: 128K tokens
  8. ·AnthropicFrontier

    Claude 3.5 Sonnet (v1)

    First model to reliably top human-eval coding benchmarks.

    Context: 200K tokens
  9. ·OpenAIFrontier

    GPT-4o

    First mainstream multimodal (text+image+audio) frontier model; 50% cheaper than GPT-4 Turbo.

    Context: 128K tokens
  10. ·GoogleLong-context

    Gemini 1.5 Pro

    First widely-available 2M-token context window.

    Context: 2,000K tokens
  11. ·AnthropicFrontier

    Claude 3 Opus / Sonnet / Haiku

    Three-tier release (flagship, mid, cheap) — set the pricing template most providers now follow.

    Context: 200K tokens
  12. ·OpenAIFrontier

    GPT-4 Turbo

    First mainstream 128K context; significantly cheaper than original GPT-4.

    Context: 128K tokens
  13. ·GoogleFrontier

    Gemini 1.0

    Google's response to GPT-4; native multimodal training claim.

    Context: 32K tokens
  14. ·OpenAIFrontier

    GPT-4 Turbo (preview)

    Knowledge cutoff moved to April 2023; 3x cheaper than GPT-4 base.

    Context: 128K tokens
  15. ·AnthropicFrontier

    Claude 2

    First mainstream 100K context window.

    Context: 100K tokens
  16. ·OpenAIFrontier

    GPT-4

    Frontier capability jump; original 8K context, $30/1M input.

    Context: 8K tokens
  17. ·OpenAIFoundation

    ChatGPT (GPT-3.5)

    Public launch — the moment 'AI tools' became a consumer category.

    Context: 4K tokens

How to read your result

The timeline reads top (newest) to bottom (oldest). Each entry is a model release; the vendor and category badges in the corner show who shipped it and what slot it filled (Frontier, Cheap, Reasoning, Long-context, Foundation).

Watch the cadence. From late 2022 through 2024, the gap between mainstream model releases was 4–6 months; from late 2024 through 2026, it's compressed to 6–10 weeks. The 'frontier' label moves between vendors roughly every release cycle.

Context window is the most legible single-number progression: 4K (ChatGPT launch) → 8K (GPT-4) → 100K (Claude 2) → 128K (GPT-4 Turbo) → 200K (Claude 3) → 1M (Gemini 1.5 Flash) → 2M (Gemini 1.5/2.0 Pro). Cost per million tokens has fallen roughly 10× over the same period.

When to use this tool

  • Establishing context for a model decision. Knowing 'GPT-4o is from May 2025' helps when comparing against alternatives released since.
  • Tracking vendor cadence. If your stack depends on Claude, the release pattern (every 4–6 months historically) tells you when to plan an upgrade evaluation.
  • Researching capability evolution. Earlier entries provide reference points for 'how much has changed' — a 2026 mid-tier model is roughly equivalent to the late-2024 frontier.
  • Writing about AI history. The timeline is a fact-checked source for 'when did X model ship and what was new about it'.

Methodology

Each entry is taken from the vendor's official release announcement (blog post, model-card publication, or pricing-page launch). We list the public availability date, not the research preview date.

We cover only mainstream public APIs and consumer products. Open-weights releases (Llama, Mistral 7B, Qwen) are noted separately when capabilities cross commercially-significant thresholds, but research-only releases are excluded.

Context window and category are taken from the vendor's spec page at release time. When a vendor later raises the context (e.g., GPT-4 base 8K → GPT-4-32K → GPT-4 Turbo 128K), we list the original release alongside the major bumps as separate entries.

Limits we acknowledge: 'frontier' is a judgment call. We use the label for models that, at release, set or matched the public state of the art on standard benchmarks. Reasonable people may disagree on edge cases — the label is a heuristic for 'top tier at the time', not a precise ranking.

Site-wide methodology framework: /methodology/ · Pre-publication standards: /editorial-standards/

FAQ

Why isn't Llama / Mistral 7B / Qwen on the timeline?

We focus on mainstream public APIs and consumer products. Open-weights models matter and are influential, but they require separate evaluation (you self-host or use a cloud provider's hosted version, with different cost/latency/privacy profiles). A separate open-weights timeline could be added in the future.

How is 'context window' a useful comparison?

It's the practical ceiling on how much input a model can handle in a single request. 4K (ChatGPT launch) is roughly 3,000 words — enough for a chat but not a document. 128K is roughly 96,000 words — enough for a small book. 2M is roughly 1.5 million words — enough for a full codebase or a multi-hour transcript. Different workflows require different ceilings; the timeline shows when each ceiling became publicly available.

What does the 'category' label mean?

Frontier = sets or matches state of the art at release. Cheap = significantly cheaper than the contemporary frontier, with usable but not flagship quality. Reasoning = explicitly trades latency for chain-of-thought (o1 family). Long-context = primary differentiator is handling much longer inputs (Gemini 1.5+). Foundation = pre-frontier-era foundational releases (ChatGPT, GPT-3.5).

How often is the timeline updated?

When a major release ships. We aim to add new entries within 1–2 weeks of a public-availability announcement. Refreshes to existing entries happen if a vendor later updates their spec (e.g., context window raise). The 'verified' date at the top of the page reflects the most recent re-check.

Why does the timeline start in November 2022?

Because that's when 'AI tools' became a consumer category — ChatGPT's public launch with GPT-3.5. There were many language models before (GPT-3, BERT, T5), but the November 2022 launch is the inflection point where mainstream usage of these tools began. Earlier model history is genuinely interesting but doesn't fit the consumer-tools framing of this site.

Related Tools