Introduction
Context engineering represents the natural evolution of prompt engineering in managing AI agents. This emerging discipline focuses on curating and optimizing context—the set of tokens provided to large language models (LLMs)—to maximize the operational effectiveness of intelligent agents. As AI applications grow in complexity, the challenge is no longer just writing perfect prompts, but strategically managing the entire informational state available to the model at any given moment.
Context is a critical but limited resource. While advanced models can handle increasingly larger context windows, recent studies have revealed the phenomenon of "context rot": as the number of tokens increases, the model's ability to accurately retrieve information progressively diminishes. This makes context engineering fundamental to building agents capable of operating effectively over extended time horizons.
From Prompt Engineering to Context Engineering
Prompt engineering focuses on methods for writing and organizing instructions for LLMs, primarily optimizing system prompts for specific tasks. This practice dominated the early years of AI application development, when most use cases required one-shot classifications or simple text generation.
Context engineering expands this perspective. It includes all strategies for curating and maintaining the optimal set of tokens during inference, considering not just prompts, but also tools, external data, message history, and protocols like the Model Context Protocol (MCP). Modern agents operate in continuous loops, progressively generating data that might be relevant for subsequent iterations. Context engineering is the art of selecting what to include in the limited context window from this ever-expanding informational universe.
Why Context Engineering is Crucial
Despite their speed and ability to process large data volumes, LLMs—like humans—lose focus when overloaded. Context must be treated as a finite resource with diminishing marginal returns. Models have a limited "attention budget" that is progressively consumed by processing each additional token.
Architectural Limitations of Transformers
This attention scarcity stems from the architectural constraints of LLMs. Transformers allow every token to attend to all other tokens in the context, creating n² pairwise relationships for n tokens. As context length increases, the model's ability to capture these relationships diminishes, creating a natural tension between context size and attention focus.
Models develop attention patterns based on training data distributions where short sequences are typically more common than long ones. Techniques like position encoding interpolation allow handling longer sequences, but with some degradation in token position understanding. These factors create a performance gradient rather than a hard cliff: models remain capable even with long contexts, but may show reduced precision in information retrieval and long-range reasoning.
Anatomy of Effective Context
Good context engineering means identifying the smallest possible set of high-signal tokens that maximize the likelihood of achieving the desired outcome.
Optimized System Prompts
System prompts should be extremely clear and use simple, direct language, presenting ideas at the "right altitude." This optimal zone avoids two common failure modes: on one extreme, hardcoding complex, brittle logic in prompts to elicit exact behaviors; on the other, vague high-level guidance that fails to provide concrete signals to the model.
It's recommended to organize prompts into distinct sections using XML tags or Markdown headers, although exact formatting is becoming less critical as models evolve. The goal is to provide the minimum set of information that fully outlines expected behavior—minimum doesn't necessarily mean short, but essential.
Token-Efficient Tools
Tools allow agents to interact with their environment and retrieve new context while working. It's fundamental that tools promote efficiency, returning token-efficient information and encouraging efficient agent behaviors. Tools should be self-contained, robust to errors, and clear regarding their intended use.
One of the most common failure modes is bloated tool sets that cover too much functionality or create ambiguous decision points. If a human engineer can't definitively say which tool to use in a situation, an AI agent can't be expected to do better. Curating a minimal viable set of tools also facilitates context maintenance in prolonged interactions.
Canonical Examples and Few-Shot Prompting
Providing examples—known as few-shot prompting—remains a strongly recommended best practice. However, it's discouraged to stuff prompts with exhaustive lists of edge cases. It's preferable to curate a set of diverse, canonical examples that effectively represent the agent's expected behavior. For LLMs, examples are the "pictures" worth a thousand words.
Context Retrieval and Agentic Search
Anthropic defines agents simply: LLMs autonomously using tools in a loop. As underlying models improve, the level of agent autonomy can scale, allowing them to independently navigate complex problem spaces and recover from errors.
"Just in Time" Strategy
A shift is occurring in how engineers design context for agents. Many AI-native applications use embedding-based retrieval systems before inference. However, more teams are adopting "just in time" context strategies.
Instead of pre-processing all relevant data upfront, agents maintain lightweight identifiers (file paths, stored queries, web links) and use these references to dynamically load data into context at runtime via tools. Anthropic's Claude Code uses this approach to perform complex analysis on large databases, writing targeted queries and using Bash commands like head and tail without ever loading complete data objects into context.
This approach mirrors human cognition: we generally don't memorize entire information corpuses, but introduce external organization and indexing systems like file systems, inboxes, and bookmarks to retrieve relevant information on demand.
Progressive Disclosure and Metadata
Allowing agents to navigate and retrieve data autonomously enables "progressive disclosure": agents can incrementally discover relevant context through exploration. Each interaction yields context that informs the next decision: file sizes suggest complexity, naming conventions hint at purpose, timestamps can proxy for relevance.
Reference metadata provides mechanisms to efficiently refine behavior. For an agent operating in a file system, the presence of a file named test_utils.py in a tests folder implies a different purpose than the same name in src/core_logic.py. Folder hierarchies, naming conventions, and timestamps provide important signals.
Hybrid Strategies
There's a trade-off: runtime exploration is slower than retrieving pre-computed data. Additionally, thoughtful engineering is required to ensure the LLM has the right tools and heuristics to effectively navigate its information landscape. The most effective agents might employ a hybrid strategy, retrieving some data upfront for speed and pursuing further autonomous exploration at its discretion.
Claude Code implements this hybrid model: CLAUDE.md files are dropped into context upfront, while primitives like glob and grep allow navigating the environment and retrieving files just-in-time. As model capabilities improve, agentic design will trend toward letting intelligent models act intelligently, with progressively less human curation.
Context Engineering for Long-Horizon Tasks
Long-horizon tasks require agents to maintain coherence, context, and goal-directed behavior over action sequences where token count exceeds the LLM's context window. For tasks spanning tens of minutes to hours of continuous work, like large codebase migrations or comprehensive research projects, agents require specialized techniques.
Compaction: Intelligent Context Compression
Compaction is the practice of taking a conversation nearing the context window limit, summarizing its contents, and reinitiating a new window with the summary. It typically serves as the first lever in context engineering to drive better long-term coherence.
In Claude Code, this is implemented by passing message history to the model to summarize and compress the most critical details. The model preserves architectural decisions, unresolved bugs, and implementation details while discarding redundant tool outputs or messages. The agent can then continue with this compressed context plus the five most recently accessed files.
The art of compaction lies in selecting what to keep versus what to discard. Overly aggressive compaction can result in the loss of subtle but critical context whose importance becomes apparent only later. It's recommended to carefully tune the prompt on complex agent traces, first maximizing recall to capture all relevant information, then iterating to improve precision by eliminating superfluous content.
Structured Note-Taking: Persistent Agentic Memory
Structured note-taking, or agentic memory, is a technique where the agent regularly writes notes persisted to memory outside the context window. These notes are retrieved into context at later times.
This strategy provides persistent memory with minimal overhead. Like Claude Code creating a to-do list, or a custom agent maintaining a NOTES.md file, this simple pattern allows the agent to track progress across complex tasks, maintaining critical context and dependencies that would otherwise be lost.
An emblematic example is Claude playing Pokémon: the agent maintains precise tallies across thousands of game steps, tracking objectives like "for the last 1,234 steps I've been training my Pokémon in Route 1, Pikachu has gained 8 levels toward the target of 10." Without any prompting about memory structure, it develops maps of explored regions, remembers which key achievements it has unlocked, and maintains strategic notes on combat strategies.
After context resets, the agent reads its own notes and continues multi-hour training sequences or dungeon explorations. With the Sonnet 4.5 launch, Anthropic released a memory tool in public beta on the Claude Developer Platform that facilitates storing and consulting information outside the context window through a file-based system.
Multi-Agent Architectures
Multi-agent architectures provide another way around context limitations. Instead of one agent attempting to maintain state across an entire project, specialized sub-agents can handle focused tasks with clean context windows. The main agent coordinates with a high-level plan while subagents perform deep technical work or use tools to find relevant information.
Each subagent might explore extensively, using tens of thousands of tokens or more, but returns only a condensed, distilled summary of its work (often 1,000-2,000 tokens). This approach achieves clear separation of concerns: detailed search context remains isolated within sub-agents, while the lead agent focuses on synthesizing and analyzing results.
Choosing the Optimal Strategy
The choice between these approaches depends on task characteristics:
- Compaction maintains conversational flow for tasks requiring extensive back-and-forth
- Note-taking excels in iterative development with clear milestones
- Multi-agent architectures handle complex research and analysis where parallel exploration pays dividends
Even as models continue to improve, the challenge of maintaining coherence across extended interactions will remain central to building more effective agents.
Conclusion
Context engineering represents a fundamental shift in how we build with LLMs. As models become more capable, the challenge isn't just crafting the perfect prompt, but thoughtfully curating what information enters the model's limited attention budget at each step. Whether implementing compaction for long-horizon tasks, designing token-efficient tools, or enabling agents to explore their environment just-in-time, the guiding principle remains the same: find the smallest set of high-signal tokens that maximize the likelihood of your desired outcome.
The techniques outlined will continue evolving as models improve. We're already seeing that smarter models require less prescriptive engineering, allowing agents to operate with more autonomy. But even as capabilities scale, treating context as a precious, finite resource will remain central to building reliable, effective agents. The Claude Developer Platform offers tools to get started with context engineering, including cookbooks on memory and context management.
FAQ
What is context engineering for AI agents?
Context engineering is the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference, including prompts, tools, external data, and message history. It goes beyond traditional prompt engineering by managing the entire informational state available to the agent.
Why is context a limited resource in LLMs?
LLMs have a finite "attention budget" due to architectural constraints of transformers. As tokens increase, the ability to capture pairwise relationships diminishes, causing the "context rot" phenomenon where precision in information retrieval progressively decreases.
How does the "just in time" context retrieval strategy work?
Agents maintain lightweight identifiers (file paths, queries, links) and dynamically load data into context at runtime using tools, instead of pre-processing everything upfront. This approach mirrors human cognition and keeps context focused only on what's immediately relevant.
What's the difference between compaction and structured note-taking in context engineering?
Compaction summarizes and compresses message history when approaching the context window limit, reinitiating with a summary. Structured note-taking writes persistent notes outside the context window that are retrieved later, providing long-term memory with minimal overhead.
When should you use multi-agent architectures instead of a single agent?
Multi-agent architectures are ideal for complex research and analysis where parallel exploration is needed. Specialized sub-agents handle focused tasks with clean contexts, returning condensed summaries to the main agent that coordinates and synthesizes results.
How do you optimize system prompts for context engineering?
Prompts should be clear, use direct language, and present ideas at the "right altitude"—specific enough to guide behavior, but flexible enough to provide strong heuristics. It's recommended to organize into distinct sections and aim for the minimum set of essential information.
What role do tools play in context engineering?
Tools allow agents to interact with their environment and retrieve context dynamically. They should be token-efficient, self-contained, clear in intended use, and constitute a minimal viable set without functional overlaps or decision ambiguities.
Will context engineering still be necessary with larger context windows?
Yes, even with larger context windows, strategic context management will remain crucial. "Context rot" and information relevance issues persist at all context sizes where maximum agent performance is desired.