Introduction: A New Standard for AI
Anthropic has officially released Claude Opus 4.5, positioning it as their most intelligent and efficient model to date, specifically optimized for coding, agentic tasks, and computer use. Available today via API and major cloud platforms, Opus 4.5 represents a significant leap forward in AI's ability to perform "real work" rather than just processing text.
Perhaps the most impactful news is the pricing strategy: $5 per million input tokens and $25 for output. This aggressive pricing makes Opus-level capabilities accessible to a much wider range of developers and enterprises than ever before.
Context: Outperforming Human Engineers
To grasp the significance of this release, one must look at Anthropic's internal benchmarks. The company tests prospective performance engineering candidates with a notoriously difficult 2-hour take-home exam. Claude Opus 4.5 was subjected to this same test and achieved a score higher than any human candidate ever evaluated within the time limit.
"Testers noted that Claude Opus 4.5 handles ambiguity and reasons about tradeoffs without hand-holding. They told us that, when pointed at a complex, multi-system bug, Opus 4.5 figures out the fix. [...] Overall, our testers told us that Opus 4.5 just 'gets it.'"
Assessment Team, Anthropic
It’s not just about writing code; it’s about understanding the problem. The model leads in 7 out of 8 programming languages on the SWE-bench Multilingual test and demonstrates superior vision and mathematical reasoning skills compared to its predecessors.
The Problem: Rigidity vs. Creativity
A historic limitation of LLMs is their inability to think outside the box when bound by strict rules. Opus 4.5 appears to have overcome this barrier through what can be described as "creative problem solving."
A cited example involves a benchmark (τ2-bench) where the AI acts as an airline service agent. The task was to modify a flight for a customer with a "basic economy" ticket, which policy strictly forbids changing. Most models fail or simply refuse.
Opus 4.5's Solution
The model analyzed the policies and found a legitimate loophole:
- The policy forbids flight changes for basic economy, but allows cabin upgrades.
- The policy allows flight changes for higher classes (standard economy/business).
The AI's strategy? Upgrade the cabin first (for a fee) and then modify the flight dates. Although the benchmark technically scored this as a failure (because it was unanticipated), it demonstrates an impressive capacity for lateral reasoning.
New for Developers: Efficiency and Control
Alongside the model, Anthropic is introducing significant updates to the Claude Developer Platform:
- "Effort" Parameter: Developers can now balance speed vs. capability. At a medium setting, Opus 4.5 matches Sonnet 4.5's best performance while using 76% fewer tokens. At maximum effort, it exceeds it significantly.
- Claude Code: Now available on desktop, allowing for parallel sessions (e.g., one agent fixes bugs while another updates docs).
- Memory and Context: Long conversations in apps no longer "hit a wall"; the system automatically summarizes earlier context to maintain continuity.
Safety and Robustness
Despite this increased "creativity," Anthropic claims Opus 4.5 is the most robustly aligned model released to date. Tests with Gray Swan show it is the industry's most resistant frontier model against "prompt injection" attacks (attempts to trick the AI into harmful behavior), providing crucial assurance for enterprise use.
For more technical details and to view the original performance charts, you can visit the official Anthropic announcement page.
Conclusion
Claude Opus 4.5 isn't just "faster": it's a model starting to show signs of engineering intuition. With aggressive pricing and integrated tools for Excel and Chrome (now available to Max/Team users), Anthropic is pushing AI from a support tool to an autonomous partner in the workflow.
FAQ
Here are answers to common questions about Claude Opus 4.5 and its capabilities.