News

GPT-5.1-Codex-Max: The AI That Codes for 24 Hours Straight (Here's How)

Article Highlights:
  • New GPT-5.1-Codex-Max model optimized for agentic coding
  • 'Compaction' technology to handle millions of context tokens
  • Native support for Windows environments and improved CLI
  • 30% better token efficiency for equivalent performance
  • 79.9% accuracy on the SWE-Lancer benchmark
  • Capable of working autonomously for over 24 hours
  • Available now for Plus, Team, and Enterprise users
GPT-5.1-Codex-Max: The AI That Codes for 24 Hours Straight (Here's How)

Introduction: A New Frontier for Agentic Coding

OpenAI has just raised the bar in AI-assisted software development with the launch of GPT-5.1-Codex-Max. This new model isn't just an incremental update; it represents a qualitative leap toward coding agents capable of operating autonomously for extended periods. Available today within the Codex ecosystem, GPT-5.1-Codex-Max is designed to be faster, smarter, and extremely token-efficient.

The real revolution lies in its ability to handle complex, long-running workflows, transforming from a simple assistant into a reliable programming partner. Thanks to new memory management techniques, the model promises to tackle project-scale refactors and deep debugging sessions that were previously impossible due to context limits.

What is "Compaction" Technology?

Compaction is the cornerstone of GPT-5.1-Codex-Max. Unlike previous models limited by a fixed context window that forced the interruption of long tasks, this new model uses a dynamic process to "prune" conversation history while retaining crucial context.

In practical terms, this means the AI can work over millions of tokens in a single task. As the context window limit approaches, the system automatically compacts the session, freeing up space for new reasoning without losing the project's logical thread. This enables agentic loops that can last for hours, iterating on tests and fixes until the goal is achieved.

Performance and Benchmarks: The Numbers

Released data shows significant improvements over previous versions. GPT-5.1-Codex-Max was specifically trained on real-world software engineering tasks, such as PR creation, code review, and frontend development.

  • SWE-Lancer IC SWE: The model achieves 79.9% accuracy, a clear step up from GPT-5.1-Codex's 66.3%.
  • SWE-bench Verified: With "Extra High" (xhigh) reasoning effort, it hits 77.9%.
  • Efficiency: Using the "medium" reasoning level, it outperforms previous models while consuming 30% fewer thinking tokens.

This efficiency boost translates to concrete savings for developers, who can obtain high-quality frontend designs and complex functionality at lower costs.

Windows Support and CLI Integration

A highly anticipated feature is the expansion of operating environments. GPT-5.1-Codex-Max is the first OpenAI model natively trained to operate in Windows environments. This opens the door for a vast segment of enterprise and .NET developers working on Microsoft tech stacks.

Additionally, training included specific tasks to improve collaboration within the Codex CLI (Command Line Interface), making terminal interaction smoother and more natural.

Safety and Reliability in Long-Term Tasks

With the ability to work autonomously for over 24 hours, safety becomes a priority. OpenAI has implemented a "sandbox by default" approach:

  • File writes are limited to the project workspace.
  • Network access is disabled unless explicitly enabled by the developer.
  • Specific monitoring to detect cybersecurity abuse.

Although the model does not yet reach "High" capability in the cybersecurity preparedness framework, it is the most capable model the company has released in this domain. OpenAI still recommends treating Codex as an additional reviewer (a tireless "junior") rather than a total replacement for human review, especially before deploying to production.

Availability

GPT-5.1-Codex-Max is available immediately for users with ChatGPT Plus, Pro, Business, Edu, and Enterprise plans within Codex. It replaces the previous GPT-5.1-Codex as the default model for coding surfaces. API access for developers using the CLI via key will be coming soon.

Conclusion

The introduction of GPT-5.1-Codex-Max marks a turning point. Internally at OpenAI, 95% of engineers use Codex weekly, recording a 70% increase in shipped Pull Requests. The combination of advanced reasoning, efficient context management via compaction, and multi-platform support promises to "supercharge" developer productivity worldwide.

GPT-5.1-Codex-Max FAQ

What is GPT-5.1-Codex-Max?

It is OpenAI's new agentic coding model, designed to handle long and complex tasks with greater efficiency and intelligence.

How does compaction technology work?

Compaction allows the model to work across millions of tokens by pruning less relevant history while preserving essential context for completing long-term tasks.

Does GPT-5.1-Codex-Max support Windows?

Yes, it is the first OpenAI model natively trained to operate in Windows environments, in addition to improving CLI interaction.

How much does it cost to use GPT-5.1-Codex-Max?

The model is included in ChatGPT Plus, Pro, and Enterprise plans. Due to its efficiency, it consumes about 30% fewer tokens for medium-complexity tasks.

Is it safe to use the agent for production code?

Codex operates in a secure sandbox, but OpenAI always recommends human review of the code before deployment, treating the AI as an additional collaborator.

Introduction: A New Frontier for Agentic Coding OpenAI has just raised the bar in AI-assisted software development with the launch of GPT-5.1-Codex-Max. This Evol Magazine