Learn AI Concepts | Interactive Guide

This tier list reflects personal experience and testing across coding, reasoning, creative writing, and general chat. Your mileage may vary — every use case is different.

Disclaimer

This ranking only lists models I find relevant — not every model out there. The bottom tiers (D, F) are basically my wall of shame. Rankings are subjective and based on personal experience. Model quality changes rapidly, so this will evolve.

API only

Open Weight

Consumer GPU

Can your GPU run these models? Check the VRAM Calculator →

S-TierBest in class — exceptional across the board

GPT-5.3-Codex

API only

Smart, logical, and efficient. The best coding model available right now.

GPT-5.4

API only

1M context, 33% fewer hallucinations than 5.2, Tool Search system. Best for long-horizon professional work.

Claude Opus 4.6

API only

Expensive but worth it. Unmatched UI/UX design sense and creative instruction following.

Claude Sonnet 4.6

API only

Smart with remarkably low hallucination rates. Reliable everyday model.

A-TierExcellent — top picks for most tasks

Gemini 3 Flash

API only

Incredibly price efficient with excellent image and video understanding. Fast and smart.

Kimi K2.5

Open Weight

1T MoE → 32B active

Great visual understanding and swarm capabilities. Strong multimodal contender.

Qwen 3.5 35B-A3B

Consumer GPU

35B MoE → 3B active

MoE with only 3B active parameters — blazing fast inference with intelligence far above its compute cost. Self-hostable on 24GB GPUs.

Qwen 3.5 27B

Consumer GPU

27B dense

Dense model punching way above its weight. Benchmarks on par with DeepSeek 3.2 at a fraction of the size.

B-TierGood — solid performers with some trade-offs

DeepSeek 3.2

Open Weight

685B MoE → 37B active

Super cheap and smart, but slow. Great value for non-time-sensitive tasks.

MiniMax M2.5

Open Weight

230B MoE → 10B active

Outstanding value. Almost self-hostable — one of the best bang-for-buck models.

GLM 5

Open Weight

744B MoE → 40B active

Good agent capabilities with low hallucination rates. Solid all-rounder.

Grok 4.20

API only

Still unclear what it's actually good at. Questionable value proposition.

C-TierMediocre — notable flaws hold them back

Gemini 3.1 Flash Lite

API only

Very fast but significantly pricier than Flash Lite 2.5. Still capable, but hard to justify the cost increase.

D-TierDisappointing — overpromised, underdelivered

Gemini 3.1 Pro

API only

Benchmaxxed disappointment. Tops leaderboards but falls apart in real agent workflows. Google, how?

F-TierBottom of the barrel — just don't

Llama Maverick

Open Weight

400B MoE → 17B active

Meta's open model continues to underdeliver. Better alternatives exist everywhere.

Amazon Nova

API only

Whatever Amazon is doing here, it's not working. Just avoid.