Archon: designing an effective self-driving computer

Archon: building a practical self-driving computer

18 August 2025

Article Highlights:

Archon pairs GPT‑5 planner with archon‑mini executor
Hierarchical split: what to do vs where to click
Saliency patches cut visual token and compute costs
Patch caching reduces latency and GPU usage
Adaptive router: favor fast path, escalate on ambiguity
GRPO and synthetic rollouts improve grounding training
Future: streaming frames for smoother actions
Goal: distill planner into executor for simplicity

Introduction

Archon is a desktop copilot that turns natural language instructions into UI actions by combining GPT‑5 for planning and a small grounding model for precise click coordinates.

Context

The design splits responsibilities: a powerful reasoner (GPT‑5) outputs semantic actions and a lightweight executor (archon‑mini) returns exact screen coordinates; the goal is more autonomous and natural desktop control with controlled latency and cost.

How it works

Core flow: Archon captures screenshots, runs a saliency scorer to extract high‑relevance patches, applies caching for unchanged regions, then sends semantic descriptions to the grounding model which outputs precise (x,y) coordinates. A routing policy escalates to the planner only on ambiguity to save latency.

Technical highlights

Hierarchical split: GPT‑5 for reasoning, archon‑mini for grounding
Patching: top‑K patch extraction reduces visual tokens and raises precision
Cache: reusing invariant patches (70%+ hit rate) lowers latency and GPU cost
Training: archon‑mini (7B, Qwen‑2.5‑VL) trained with GRPO and synthetic rollouts
Adaptive routing: fast path (~50ms) and escalations when signals indicate uncertainty

Main trade‑offs

Accuracy vs latency: deeper reasoning improves robustness but increases delay
Vision token cost: mitigated via patching, downsampling and caching
UI robustness: some element types need more data to handle reliably

Conclusion

Archon shows that separating planner and executor, using patch‑based grounding and adaptive caching, is a practical path toward a self‑driving computer. Source: Surya Dantuluri.

FAQ

How does Archon build a self‑driving computer on the desktop?

It splits planning (GPT‑5) and grounding (archon‑mini): the planner outputs semantic actions and the executor returns precise pixel coordinates from salient patches.

What latency and cost limits affect Archon?

Visual tokens and deep reasoning raise latency and cost; Archon mitigates this with patching, caching, and an adaptive routing policy.

How does archon‑mini perform GUI grounding?

It extracts top patches via a saliency scorer, encodes them and outputs (x,y) clicks; training uses GRPO and synthetic rollouts for robustness.

What safety risks should be considered for a self‑driving computer?

Risks include incorrect actions on sensitive UIs and interface drift; the planner acts as a safety guard for ambiguous cases.

How do you measure efficiency of Archon in practice?

Useful metrics: per‑action latency (ms), patch cache hit‑rate, escalation frequency to the planner, and end‑to‑end success rate.

Archon: building a practical self-driving computer

Introduction

Context

How it works

Technical highlights

Main trade‑offs

Conclusion

FAQ

How does Archon build a self‑driving computer on the desktop?

What latency and cost limits affect Archon?

How does archon‑mini perform GUI grounding?

What safety risks should be considered for a self‑driving computer?

How do you measure efficiency of Archon in practice?

Tag:

Related links:

Introduction

Context

How it works

Technical highlights

Main trade‑offs

Conclusion

FAQ

How does Archon build a self‑driving computer on the desktop?

What latency and cost limits affect Archon?

How does archon‑mini perform GUI grounding?

What safety risks should be considered for a self‑driving computer?

How do you measure efficiency of Archon in practice?

Tag:

Related links:

Related Articles

GPT-5.1 by OpenAI: rapid upgrade or crisis signal?

Moonshot Kimi K2 Thinking: Open Source AI Beats GPT-5 and Claude 4.5

GPT-5.1 Thinking: OpenAI challenges Google Gemini 3 Pro