Logge's Favourite Models

Disclaimer

This page is unapologetically biased. I use both of these models daily, I pay for both out of pocket, and I have strong opinions. Model releases move fast, so this will change. Side effects of reading may include API key generation and wallet anxiety.

Last updated: February 5, 2026

Two Models, One Obsession

On February 5, 2026, both Anthropic and OpenAI dropped their latest flagships within hours of each other. I've been using both non-stop since. Here's the honest breakdown from someone who actually ships code with these things.

The Two Champions

Different philosophies, both excellent. Here's what each one brings to the table.

Claude Opus 4.6

Anthropic

The deep thinker that codes like a senior engineer

Model IDclaude-opus-4-6

ReleasedFebruary 5, 2026

Context200K standard / 1M beta

Output128K tokens

Opus 4.6 is Anthropic's most capable model ever. It doubled the output limit to 128K, introduced a 1M token context window in beta, and brought two exclusive features: Adaptive Thinking (auto-adjusts reasoning depth) and Context Compaction (auto-summarizes old context for endless conversations). The coding improvements are massive — Terminal-Bench jumped from 59.8% to 65.4%, OSWorld from 66.3% to 72.7%, and ARC AGI 2 nearly doubled from 37.6% to 68.8%.

Adaptive Thinking

Dynamically adjusts reasoning depth based on task complexity. Four intensity levels: low, medium, high, and max. It decides when to think deeper without you asking.

Agent Teams

Powers multi-agent coding in Claude Code — one agent on frontend, another on API, a third on migration — all coordinating autonomously.

1M Token Context

First Opus model with a million-token window. Feed it an entire codebase and it can reason across all of it.

128K Output

Doubled from 64K. It can generate entire files, full test suites, and multi-page documents in a single response.

Source: Anthropic Opus page

GPT-5.3-Codex

OpenAI

The fast pragmatist that helped build itself

Model IDgpt-5.3-codex

ReleasedFebruary 5, 2026

Context~400K tokens

Output~128K tokens

GPT-5.3-Codex is OpenAI's first 'self-developing' model — early versions were used to debug their own training run. It unifies frontier coding performance (from GPT-5.2-Codex) with professional reasoning (from GPT-5.2) into a single model. It's 25% faster than its predecessor, uses half the tokens for equivalent tasks, and absolutely dominates Terminal-Bench 2 at 77.3%.

Interactive Steering

You can interact with it while it's working — ask questions, discuss approaches, and steer toward solutions in real time. It gives frequent progress updates.

Self-Developing

First model that was instrumental in building itself. Used internally to debug training, manage deployment, and optimize its own evaluation harness.

Token Efficient

Accomplishes results with less than half the tokens of predecessors. Your context budget goes further.

Personality Modes

Choose between 'Pragmatic' (terse, to-the-point) and 'Friendly' (conversational). No capability difference — purely style.

Source: OpenAI GPT-5.3-Codex launch

The Benchmark Showdown

Numbers don't lie, but they don't tell the whole story either. Here's how they stack up on the benchmarks that actually matter for coding.

Benchmark	Opus 4.6	Codex 5.3	Note
SWE-bench	80.8%	56.8%	Verified vs Pro (different test sets — not directly comparable)
Terminal-Bench 2	65.4%	77.3%	Codex dominates real terminal workflows
OSWorld	72.7%	64.7%	Opus leads on desktop automation tasks
GPQA Diamond	91.3%	—	Graduate-level science reasoning
ARC AGI 2	68.8%	—	Novel problem-solving (nearly 2x vs Opus 4.5)
Humanity's Last Exam	40.0%	—	The hardest test in AI — no tools
Cybersecurity CTF	—	77.6%	Capture-the-flag security challenges

Dash means the benchmark wasn't reported by the vendor. SWE-bench Verified and SWE-bench Pro use different test sets, so direct comparison isn't meaningful.

When I Actually Use Each One

Theory is nice. Here's my actual workflow after using both daily since launch.

I reach for Opus 4.6 when...

My tool: Claude Code (CLI)

Opus 4.6 lives in my terminal via Claude Code. Agent teams, multi-file edits, deep reasoning — all from the command line. This entire site was built with it.

I need deep architectural reasoning across a large codebase — the 1M context window is unmatched
Writing complex multi-file features where the model needs to hold a lot of state
Code review and refactoring — Adaptive Thinking makes it genuinely careful
Agent teams for ambitious multi-part projects
Anything that benefits from extended thinking and careful step-by-step reasoning

I reach for GPT-5.3-Codex when...

My tool: Codex (App + CLI)

GPT-5.3-Codex powers the Codex app and CLI. Interactive steering mid-task, personality modes, and blazing speed make it perfect for rapid iteration.

Quick iteration on terminal-heavy workflows — it's blazing fast and Terminal-Bench scores show why
Interactive pair programming where I want to steer mid-task
High-volume tasks where token efficiency matters for cost
The Codex CLI for rapid scripting and one-shot tasks
Anything where I want speed over depth — it's 25% faster and it feels like it

The Wallet Situation

Let's talk about the elephant in the room.

Opus 4.6

$5 input / $25 output per million tokens. Same price as Opus 4.5 but with massively improved capabilities. Batch API at 50% off. Still premium territory — a heavy coding session can run $5-15.

GPT-5.3-Codex

API pricing not final yet, but the GPT-5-Codex family runs ~$1.25 input / $10 output. That's roughly 2.5x cheaper than Opus on input and 2.5x cheaper on output. Plus it uses fewer tokens for equivalent tasks.

Honest take: If you're cost-conscious, Codex wins handily. If you need maximum reasoning depth and don't mind paying for it, Opus is worth every cent. I use both because different tasks have different economics.

What They Share

Despite coming from rival labs, these models have converged on some important traits.

Agentic Excellence

Both models are built for agents — tool use, multi-step planning, and autonomous task completion are first-class capabilities.

Computer Use

Both can operate GUIs, fill forms, navigate apps. OSWorld scores of 72.7% (Opus) and 64.7% (Codex) show real-world desktop proficiency.

Extended Output

~128K token output limits on both. Generate entire codebases, full documentation, multi-file changes in a single response.

Released the Same Day

February 5, 2026. Both labs dropped their flagships within hours. The AI coding wars are real, and we developers are the winners.

The Honest Verdict

I don't have one favourite model anymore — I have two. Opus 4.6 is the model I trust for deep, careful work. It thinks before it acts, catches things I miss, and handles massive codebases with grace. GPT-5.3-Codex is the model I reach for when I need speed and pragmatism. It's fast, efficient, and the interactive steering feels like genuine pair programming. Together, they cover every coding scenario I encounter. The fact that they launched on the same day feels symbolic — the frontier isn't one model anymore, it's a toolkit. Pick the right tool for the job. Or, like me, use both and enjoy the best era of AI-assisted development we've ever seen.

Quick Reference

	Claude Opus 4.6	GPT-5.3-Codex
Maker	Anthropic	OpenAI
Context	200K standard / 1M beta	~400K tokens
Max Output	128K tokens	~128K tokens
Pricing (per MTok)	$5 / $25 per million tokens	~$1.25 / $10 per million tokens (expected)
Best For	Deep reasoning, code review, agent teams	Fast iteration, terminal tasks, cost efficiency
Platforms	Claude.ai, API, AWS Bedrock, Vertex AI, Azure Foundry	ChatGPT, Codex App, CLI, IDE Extension (API coming soon)