Question 1

Why isn't Llama / Mistral 7B / Qwen on the timeline?

Accepted Answer

We focus on mainstream public APIs and consumer products. Open-weights models matter and are influential, but they require separate evaluation (you self-host or use a cloud provider's hosted version, with different cost/latency/privacy profiles). A separate open-weights timeline could be added in the future.

Question 2

How is 'context window' a useful comparison?

Accepted Answer

It's the practical ceiling on how much input a model can handle in a single request. 4K (ChatGPT launch) is roughly 3,000 words — enough for a chat but not a document. 128K is roughly 96,000 words — enough for a small book. 2M is roughly 1.5 million words — enough for a full codebase or a multi-hour transcript. Different workflows require different ceilings; the timeline shows when each ceiling became publicly available.

Question 3

What does the 'category' label mean?

Accepted Answer

Frontier = sets or matches state of the art at release. Cheap = significantly cheaper than the contemporary frontier, with usable but not flagship quality. Reasoning = explicitly trades latency for chain-of-thought (o1 family). Long-context = primary differentiator is handling much longer inputs (Gemini 1.5+). Foundation = pre-frontier-era foundational releases (ChatGPT, GPT-3.5).

Question 4

How often is the timeline updated?

Accepted Answer

When a major release ships. We aim to add new entries within 1–2 weeks of a public-availability announcement. Refreshes to existing entries happen if a vendor later updates their spec (e.g., context window raise). The 'verified' date at the top of the page reflects the most recent re-check.

Question 5

Why does the timeline start in November 2022?

Accepted Answer

Because that's when 'AI tools' became a consumer category — ChatGPT's public launch with GPT-3.5. There were many language models before (GPT-3, BERT, T5), but the November 2022 launch is the inflection point where mainstream usage of these tools began. Earlier model history is genuinely interesting but doesn't fit the consumer-tools framing of this site.

AI Model Release Timeline

Claude 3.5 Sonnet (v2)

GPT-4o (April refresh)

Gemini 2.0

o1

Claude 3.5 Haiku

GPT-4o mini

Mistral Large 2

Claude 3.5 Sonnet (v1)

GPT-4o

Gemini 1.5 Pro

Claude 3 Opus / Sonnet / Haiku

GPT-4 Turbo

Gemini 1.0

GPT-4 Turbo (preview)

Claude 2

GPT-4

ChatGPT (GPT-3.5)

How to read your result

When to use this tool

Methodology

FAQ

Why isn't Llama / Mistral 7B / Qwen on the timeline?

How is 'context window' a useful comparison?

What does the 'category' label mean?

How often is the timeline updated?

Why does the timeline start in November 2022?

Related Tools