Claude Opus 4.8

Overview

Claude Opus 4.8, released on May 28, 2026, is Anthropic's flagship model in the Claude 4 family. Built on Opus 4.7, it delivers meaningful gains in agentic coding, multidisciplinary reasoning, and self-awareness — approximately 4x less likely than its predecessor to let code flaws pass unremarked.

Key Features

  • Stronger Agentic Coding: Achieves 69.2% on SWE-bench Pro (+4.9 points over Opus 4.7) and 88.6% on SWE-bench Verified, with improved end-to-end task completion on the Super-Agent Benchmark.
  • Improved Honesty and Self-Awareness: Around 4x less likely to let code flaws pass unremarked; proactively flags uncertainties, catches its own mistakes, and pushes back on unsound plans before executing.
  • Dynamic Workflows (Research Preview): Enables Claude Code to plan and execute large-scale tasks using hundreds of parallel subagents, supporting codebase-scale migrations across hundreds of thousands of lines of code.
  • Mid-Conversation System Messages: The Messages API now accepts system messages mid-conversation while preserving prompt cache, reducing costs in agentic loops.
  • Better Long-Context Performance: Improved compaction handling for sustained conversations, with fewer derailments after context compaction in long-running agentic tasks.

Best Use Cases

  • Professional Software Engineering: With 69.2% on SWE-bench Pro — ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%) — well-suited for production-grade code generation, complex refactors, and autonomous bug resolution.
  • Large-Scale Agentic Workflows: Dynamic Workflows allow orchestrating hundreds of parallel subagents for tasks like codebase-wide migrations, making it effective for enterprise-scale automation.
  • High-Stakes Knowledge Work: Scores 1890 Elo on GDPval-AA and 57.9% on Humanity's Last Exam (with tools), suitable for financial analysis, legal document processing, and complex research synthesis.
  • Long-Running Autonomous Tasks: Improved compaction handling and self-monitoring make it reliable for extended agentic sessions that require sustained focus and consistency.

Capabilities and Limitations

CapabilityDescription
ReasoningGPQA Diamond: 93.6%. Humanity's Last Exam: 57.9% (with tools), 49.8% (without). USAMO 2026: 96.7%.
CodingSWE-bench Verified: 88.6%, SWE-bench Pro: 69.2%, SWE-bench Multilingual: 84.4%, Terminal-Bench 2.1: 74.6%.
AgenticCompletes every case end-to-end on the Super-Agent Benchmark. Finance Agent v2: 53.9%. Legal Agent Benchmark: record score.
Computer UseOnline-Mind2Web: 84%. Strong browser and desktop interaction capabilities.
MultimodalText and image input. Up to 16 megapixels on the long edge, up to 600 images or PDF pages per request.
Context Window1,000,000 tokens.
Max Output128,000 tokens.
Tool UseFull function calling, code execution, MCP support, adaptive thinking, effort control, dynamic workflows.
MultilingualStrong multilingual performance across major world languages.

Known Limitations

  • Terminal-Bench 2.1 trails GPT-5.5 (74.6% vs 78.2%).
  • GPQA Diamond marginally lower than Opus 4.7 (93.6% vs 94.2%).
  • New tokenizer (inherited from Opus 4.7) produces up to 35% more tokens for the same input text, meaning actual per-request costs may increase despite unchanged per-token pricing.
  • Image input only (no native audio or video input).
  • Dynamic Workflows remain in research preview and are limited to Claude Code.

Pricing

ModelInput (Credits/Token)Cache Write (Credits/Token)Cache Read (Credits/Token)Output (Credits/Token)
Claude Opus 4.85.006.250.5025.00
  • Prompt caching: Cache writes at 1.25x (5-minute TTL) or 2x (1-hour TTL) base input price; cache reads at 0.1x base input price. Minimum 1,024 tokens for caching.