Introduction
Context engineering represents the new frontier in developing effective AI agents. While models become increasingly powerful, the ability to manage context determines the success or failure of an agentic system. The Manus team learned this lesson through four complete framework reconstructions, developing principles that transformed their approach to agentic AI.
The Strategic Choice: Context vs Fine-Tuning
When the Manus team started their project, they faced a crucial decision: train an end-to-end model or build an agent based on in-context learning. They chose context engineering for one fundamental reason: iteration speed.
In the early days of NLP, fine-tuning required weeks per iteration. Today, context engineering enables improvements in hours rather than weeks, keeping the product orthogonal to underlying models. Like a boat floating on the rising tide of model progress, instead of being a fixed pillar on the seabed.
KV-Cache: The Most Important Metric
The KV-cache hit rate is the most critical metric for a production AI agent. It directly impacts both latency and costs, with dramatic effects: with Claude Sonnet, cached tokens cost 0.30 USD/MTok versus 3 USD/MTok for non-cached ones.
Strategies to Optimize KV-Cache
- Keep the prompt prefix stable: Even a single different token can invalidate the entire cache
- Make context append-only: Avoid modifications to previous actions or observations
- Deterministic serialization: Ensure stable ordering of JSON keys
- Explicit breakpoints: Strategically mark cache points when necessary
Mask Tools, Don't Remove Them
As the agent's capabilities expand, the action space becomes complex. The temptation to implement dynamic tools is strong, but it carries significant risks for KV-cache and model consistency.
Instead of dynamically removing tools, Manus uses a state machine that masks logits during decoding. This approach maintains context stability while controlling action selection through three modes:
- Auto: The model chooses whether to call a function
- Required: The model must call a function
- Specified: The model must choose from a specific subset
File System as Extended Context
Even with 128K token context windows, real-world agents often hit limits. Observations can be huge, performance degrades with long contexts, and costs increase proportionally.
Manus treats the file system as the definitive context: unlimited, persistent, and directly operable. The model learns to use files not just as storage, but as structured, externalized memory. Compression strategies are always recoverable, maintaining references that allow information retrieval when needed.
Manipulating Attention Through Rehearsal
A distinctive behavior of Manus is the constant creation and updating of todo.md files during complex tasks. This isn't just aesthetic appeal, but a deliberate mechanism to manipulate the model's attention.
With an average of 50 tool calls per task, Manus risks losing focus on its objectives. By constantly rewriting the to-do list, it pushes the global plan into the recent attention span, avoiding "lost in the middle" problems and maintaining goal alignment.
Keep Errors in Context
Against the common instinct to hide errors, Manus maintains wrong paths in context. When the model sees a failed action and the resulting error observation, it implicitly updates its internal beliefs, reducing the likelihood of repeating the same mistake.
Error recovery is one of the clearest indicators of true agentic behavior, even though it remains underrepresented in academic benchmarks that focus on success under ideal conditions.
Avoiding the Few-Shot Trap
Language models are excellent imitators and tend to follow patterns in context. In repetitive tasks, this can lead to drift and over-generalizations. The solution is to introduce structured diversity: variations in serialization patterns, alternative formulations, and small changes in order or formatting.
Conclusion
Context engineering is an emerging but already essential science for agentic systems. How you model context defines agent behavior: speed, recovery capability, and scalability. These lessons, learned through millions of real interactions, offer practical guidance for those developing AI agents in the real world.
FAQ
What is context engineering for AI agents?
Context engineering is the discipline that deals with designing and optimizing how AI agents manage information during task execution, directly influencing performance and costs.
Why is KV-cache so important for AI agents?
KV-cache dramatically reduces latency and costs: with Claude Sonnet, cached tokens cost 0.30 USD/MTok versus 3 USD/MTok for non-cached ones, a 10x difference.
How do you optimize KV-cache hit rate?
By keeping the prompt prefix stable, making context append-only, ensuring deterministic serialization, and marking explicit breakpoints when necessary.
Why not dynamically remove tools from AI agents?
Dynamically removing tools invalidates KV-cache and confuses the model when previous actions reference tools no longer defined in the current context.
How do you use the file system as extended context?
By treating the file system as structured, externalized memory where the agent can write and read information on demand, overcoming traditional context window limitations.
Why keep errors in AI agent context?
Errors provide valuable evidence that allows the model to update its internal beliefs and reduce the likelihood of repeating the same mistakes in the future.