Logge's Favourite Models

Beginner

Two models dropped on the same day. Both became instant favourites. Here's an honest, slightly obsessive breakdown.

Last updated: Feb 9, 2026

Disclaimer

This page is unapologetically biased. I use both of these models daily, I pay for both out of pocket, and I have strong opinions. Model releases move fast, so this will change. Side effects of reading may include API key generation and wallet anxiety.

Last updated: February 5, 2026

Two Models, One Obsession

On February 5, 2026, both Anthropic and OpenAI dropped their latest flagships within hours of each other. I've been using both non-stop since. Here's the honest breakdown from someone who actually ships code with these things.

The Two Champions

Different philosophies, both excellent. Here's what each one brings to the table.

Claude Opus 4.6

Anthropic

The deep thinker that codes like a senior engineer

Model IDclaude-opus-4-6
ReleasedFebruary 5, 2026
Context200K standard / 1M beta
Output128K tokens

Opus 4.6 is Anthropic's most capable model ever. It doubled the output limit to 128K, introduced a 1M token context window in beta, and brought two exclusive features: Adaptive Thinking (auto-adjusts reasoning depth) and Context Compaction (auto-summarizes old context for endless conversations). The coding improvements are massive — Terminal-Bench jumped from 59.8% to 65.4%, OSWorld from 66.3% to 72.7%, and ARC AGI 2 nearly doubled from 37.6% to 68.8%.

Adaptive Thinking

Dynamically adjusts reasoning depth based on task complexity. Four intensity levels: low, medium, high, and max. It decides when to think deeper without you asking.

Agent Teams

Powers multi-agent coding in Claude Code — one agent on frontend, another on API, a third on migration — all coordinating autonomously.

1M Token Context

First Opus model with a million-token window. Feed it an entire codebase and it can reason across all of it.

128K Output

Doubled from 64K. It can generate entire files, full test suites, and multi-page documents in a single response.

Source: Anthropic Opus page

GPT-5.3-Codex

OpenAI

The fast pragmatist that helped build itself

Model IDgpt-5.3-codex
ReleasedFebruary 5, 2026
Context~400K tokens
Output~128K tokens

GPT-5.3-Codex is OpenAI's first 'self-developing' model — early versions were used to debug their own training run. It unifies frontier coding performance (from GPT-5.2-Codex) with professional reasoning (from GPT-5.2) into a single model. It's 25% faster than its predecessor, uses half the tokens for equivalent tasks, and absolutely dominates Terminal-Bench 2 at 77.3%.

Interactive Steering

You can interact with it while it's working — ask questions, discuss approaches, and steer toward solutions in real time. It gives frequent progress updates.

Self-Developing

First model that was instrumental in building itself. Used internally to debug training, manage deployment, and optimize its own evaluation harness.

Token Efficient

Accomplishes results with less than half the tokens of predecessors. Your context budget goes further.

Personality Modes

Choose between 'Pragmatic' (terse, to-the-point) and 'Friendly' (conversational). No capability difference — purely style.

Source: OpenAI GPT-5.3-Codex launch

The Benchmark Showdown

Numbers don't lie, but they don't tell the whole story either. Here's how they stack up on the benchmarks that actually matter for coding.

BenchmarkOpus 4.6Codex 5.3
SWE-bench80.8%56.8%
Terminal-Bench 265.4%77.3%
OSWorld72.7%64.7%
GPQA Diamond91.3%
ARC AGI 268.8%
Humanity's Last Exam40.0%
Cybersecurity CTF77.6%

Dash means the benchmark wasn't reported by the vendor. SWE-bench Verified and SWE-bench Pro use different test sets, so direct comparison isn't meaningful.

When I Actually Use Each One

Theory is nice. Here's my actual workflow after using both daily since launch.

I reach for Opus 4.6 when...

My tool: Claude Code (CLI)

Opus 4.6 lives in my terminal via Claude Code. Agent teams, multi-file edits, deep reasoning — all from the command line. This entire site was built with it.

  • I need deep architectural reasoning across a large codebase — the 1M context window is unmatched
  • Writing complex multi-file features where the model needs to hold a lot of state
  • Code review and refactoring — Adaptive Thinking makes it genuinely careful
  • Agent teams for ambitious multi-part projects
  • Anything that benefits from extended thinking and careful step-by-step reasoning

I reach for GPT-5.3-Codex when...

My tool: Codex (App + CLI)

GPT-5.3-Codex powers the Codex app and CLI. Interactive steering mid-task, personality modes, and blazing speed make it perfect for rapid iteration.

  • Quick iteration on terminal-heavy workflows — it's blazing fast and Terminal-Bench scores show why
  • Interactive pair programming where I want to steer mid-task
  • High-volume tasks where token efficiency matters for cost
  • The Codex CLI for rapid scripting and one-shot tasks
  • Anything where I want speed over depth — it's 25% faster and it feels like it

The Wallet Situation

Let's talk about the elephant in the room.

Opus 4.6

$5 input / $25 output per million tokens. Same price as Opus 4.5 but with massively improved capabilities. Batch API at 50% off. Still premium territory — a heavy coding session can run $5-15.

GPT-5.3-Codex

API pricing not final yet, but the GPT-5-Codex family runs ~$1.25 input / $10 output. That's roughly 2.5x cheaper than Opus on input and 2.5x cheaper on output. Plus it uses fewer tokens for equivalent tasks.

Honest take: If you're cost-conscious, Codex wins handily. If you need maximum reasoning depth and don't mind paying for it, Opus is worth every cent. I use both because different tasks have different economics.

What They Share

Despite coming from rival labs, these models have converged on some important traits.

Agentic Excellence

Both models are built for agents — tool use, multi-step planning, and autonomous task completion are first-class capabilities.

Computer Use

Both can operate GUIs, fill forms, navigate apps. OSWorld scores of 72.7% (Opus) and 64.7% (Codex) show real-world desktop proficiency.

Extended Output

~128K token output limits on both. Generate entire codebases, full documentation, multi-file changes in a single response.

Released the Same Day

February 5, 2026. Both labs dropped their flagships within hours. The AI coding wars are real, and we developers are the winners.

The Honest Verdict

I don't have one favourite model anymore — I have two. Opus 4.6 is the model I trust for deep, careful work. It thinks before it acts, catches things I miss, and handles massive codebases with grace. GPT-5.3-Codex is the model I reach for when I need speed and pragmatism. It's fast, efficient, and the interactive steering feels like genuine pair programming. Together, they cover every coding scenario I encounter. The fact that they launched on the same day feels symbolic — the frontier isn't one model anymore, it's a toolkit. Pick the right tool for the job. Or, like me, use both and enjoy the best era of AI-assisted development we've ever seen.

Quick Reference

Claude Opus 4.6GPT-5.3-Codex
MakerAnthropicOpenAI
Context200K standard / 1M beta~400K tokens
Max Output128K tokens~128K tokens
Pricing (per MTok)$5 / $25 per million tokens~$1.25 / $10 per million tokens (expected)
Best ForDeep reasoning, code review, agent teamsFast iteration, terminal tasks, cost efficiency
PlatformsClaude.ai, API, AWS Bedrock, Vertex AI, Azure FoundryChatGPT, Codex App, CLI, IDE Extension (API coming soon)