This tier list reflects personal experience and testing across coding, reasoning, creative writing, and general chat. Your mileage may vary — every use case is different.
Disclaimer
This ranking only lists models I find relevant — not every model out there. The bottom tiers (D, F) are basically my wall of shame. Rankings are subjective and based on personal experience. Model quality changes rapidly, so this will evolve.
GPT-5.3-Codex
Smart, logical, and efficient. The best coding model available right now.
GPT-5.4
1M context, 33% fewer hallucinations than 5.2, Tool Search system. Best for long-horizon professional work.
Claude Opus 4.6
Expensive but worth it. Unmatched UI/UX design sense and creative instruction following.
Claude Sonnet 4.6
Smart with remarkably low hallucination rates. Reliable everyday model.
Gemini 3 Flash
Incredibly price efficient with excellent image and video understanding. Fast and smart.
Kimi K2.5
1T MoE → 32B active
Great visual understanding and swarm capabilities. Strong multimodal contender.
Qwen 3.5 35B-A3B
35B MoE → 3B active
MoE with only 3B active parameters — blazing fast inference with intelligence far above its compute cost. Self-hostable on 24GB GPUs.
Qwen 3.5 27B
27B dense
Dense model punching way above its weight. Benchmarks on par with DeepSeek 3.2 at a fraction of the size.
DeepSeek 3.2
685B MoE → 37B active
Super cheap and smart, but slow. Great value for non-time-sensitive tasks.
MiniMax M2.5
230B MoE → 10B active
Outstanding value. Almost self-hostable — one of the best bang-for-buck models.
GLM 5
744B MoE → 40B active
Good agent capabilities with low hallucination rates. Solid all-rounder.
Grok 4.20
Still unclear what it's actually good at. Questionable value proposition.
Gemini 3.1 Flash Lite
Very fast but significantly pricier than Flash Lite 2.5. Still capable, but hard to justify the cost increase.
Gemini 3.1 Pro
Benchmaxxed disappointment. Tops leaderboards but falls apart in real agent workflows. Google, how?
Llama Maverick
400B MoE → 17B active
Meta's open model continues to underdeliver. Better alternatives exist everywhere.
Amazon Nova
Whatever Amazon is doing here, it's not working. Just avoid.