News

Claude Opus 4.5: The AI That Outcodes Humans (And Costs Less)

Article Highlights:
  • Claude Opus 4.5 outscores human candidates in Anthropic engineering tests
  • Pricing set at $5/million input and $25/million output
  • World's best model for coding, agents, and computer use
  • New 'Effort' parameter to balance API costs and performance
  • Claude Code now available on desktop with parallel sessions
  • Industry-leading resistance against prompt injection attacks
  • Creative problem solving demonstrated in complex agent scenarios
Claude Opus 4.5: The AI That Outcodes Humans (And Costs Less)

Introduction: A New Standard for AI

Anthropic has officially released Claude Opus 4.5, positioning it as their most intelligent and efficient model to date, specifically optimized for coding, agentic tasks, and computer use. Available today via API and major cloud platforms, Opus 4.5 represents a significant leap forward in AI's ability to perform "real work" rather than just processing text.

Perhaps the most impactful news is the pricing strategy: $5 per million input tokens and $25 for output. This aggressive pricing makes Opus-level capabilities accessible to a much wider range of developers and enterprises than ever before.

Context: Outperforming Human Engineers

To grasp the significance of this release, one must look at Anthropic's internal benchmarks. The company tests prospective performance engineering candidates with a notoriously difficult 2-hour take-home exam. Claude Opus 4.5 was subjected to this same test and achieved a score higher than any human candidate ever evaluated within the time limit.

"Testers noted that Claude Opus 4.5 handles ambiguity and reasons about tradeoffs without hand-holding. They told us that, when pointed at a complex, multi-system bug, Opus 4.5 figures out the fix. [...] Overall, our testers told us that Opus 4.5 just 'gets it.'"

Assessment Team, Anthropic

It’s not just about writing code; it’s about understanding the problem. The model leads in 7 out of 8 programming languages on the SWE-bench Multilingual test and demonstrates superior vision and mathematical reasoning skills compared to its predecessors.

The Problem: Rigidity vs. Creativity

A historic limitation of LLMs is their inability to think outside the box when bound by strict rules. Opus 4.5 appears to have overcome this barrier through what can be described as "creative problem solving."

A cited example involves a benchmark (τ2-bench) where the AI acts as an airline service agent. The task was to modify a flight for a customer with a "basic economy" ticket, which policy strictly forbids changing. Most models fail or simply refuse.

Opus 4.5's Solution

The model analyzed the policies and found a legitimate loophole:

  1. The policy forbids flight changes for basic economy, but allows cabin upgrades.
  2. The policy allows flight changes for higher classes (standard economy/business).

The AI's strategy? Upgrade the cabin first (for a fee) and then modify the flight dates. Although the benchmark technically scored this as a failure (because it was unanticipated), it demonstrates an impressive capacity for lateral reasoning.

New for Developers: Efficiency and Control

Alongside the model, Anthropic is introducing significant updates to the Claude Developer Platform:

  • "Effort" Parameter: Developers can now balance speed vs. capability. At a medium setting, Opus 4.5 matches Sonnet 4.5's best performance while using 76% fewer tokens. At maximum effort, it exceeds it significantly.
  • Claude Code: Now available on desktop, allowing for parallel sessions (e.g., one agent fixes bugs while another updates docs).
  • Memory and Context: Long conversations in apps no longer "hit a wall"; the system automatically summarizes earlier context to maintain continuity.

Safety and Robustness

Despite this increased "creativity," Anthropic claims Opus 4.5 is the most robustly aligned model released to date. Tests with Gray Swan show it is the industry's most resistant frontier model against "prompt injection" attacks (attempts to trick the AI into harmful behavior), providing crucial assurance for enterprise use.

For more technical details and to view the original performance charts, you can visit the official Anthropic announcement page.

Conclusion

Claude Opus 4.5 isn't just "faster": it's a model starting to show signs of engineering intuition. With aggressive pricing and integrated tools for Excel and Chrome (now available to Max/Team users), Anthropic is pushing AI from a support tool to an autonomous partner in the workflow.

FAQ

Here are answers to common questions about Claude Opus 4.5 and its capabilities.

Introduction: A New Standard for AI Anthropic has officially released Claude Opus 4.5, positioning it as their most intelligent and efficient model to date, Evol Magazine