From Prompt to Context
The conversation around AI interaction is evolving. While Prompt Engineering focuses on crafting the perfect question, Context Engineering builds the entire information environment the AI operates in. This shift is critical for creating robust, autonomous, and reliable AI agents.
Dimension | Prompt Engineering | Context Engineering |
---|---|---|
Focus | The query | The entire information payload |
Goal | Maximize single output quality | Enable reliable, autonomous behavior |
Analogy | Asking a question | Building a workspace (RAM) |
The Agent Token Ratio
100:1
In typical AI agents like Manus, the ratio of input tokens (context) to output tokens (action) is heavily skewed. This makes efficient context management paramount for performance and cost.
The Impact of Smart Context
KV-Cache: The 10x Cost Saver
By keeping the initial context stable, agents can leverage KV-caching, drastically reducing latency and cost. For models like Claude Sonnet, the cost for cached tokens can be 10 times lower than for uncached ones.
The Agent's "Working Memory"
An agent's context is a dynamic assembly of instructions, history, retrieved knowledge, and tools. Managing this payload effectively prevents "context poisoning" and keeps the agent on track.
The Context Engineering Toolkit
RAG: Grounding LLMs in Reality
Retrieval-Augmented Generation (RAG) is a foundational pattern. It connects the LLM to external, up-to-date knowledge sources, reducing hallucinations and providing verifiable information.
💬
1. User Query
A question is asked.
🔍
2. Retrieve
System searches vector DB for relevant documents.
➕
3. Augment
Retrieved info is added to the prompt.
💡
4. Generate
LLM provides a grounded, informed response.
Memory: Short & Long Term
Effective agents need memory. Short-term memory (like chat history) provides conversational flow, while long-term memory (user preferences, past interactions) enables personalization and learning across sessions.
Tool Use: From Words to Actions
Context engineering defines which tools an agent can use and when. This allows the LLM to move beyond text generation to interact with APIs, databases, and the file system, turning intent into execution.
Lessons from the Field: The Manus Agent
The team behind the Manus agent shared key principles discovered through extensive experimentation, or "Stochastic Graduate Descent."
⚡️
Design Around the KV-Cache
Keep prompt prefixes stable and append-only to maximize cache hits, drastically cutting latency and cost.
🎭
Mask, Don't Remove
Instead of dynamically changing the toolset (which breaks cache), mask unavailable tool logits during decoding to guide the model.
💾
Use the File System as Context
Treat the file system as infinite, persistent, and operable memory. The agent learns to write/read files on demand to overcome context window limits.
🎯
Manipulate Attention Through Recitation
By constantly rewriting a `todo.md` file, the agent pushes its main objectives to the end of the context, keeping its focus sharp and avoiding "lost-in-the-middle" issues.
🐛
Keep the Wrong Stuff In
Don't hide errors. Including failed actions and stack traces in the context teaches the model to adapt and avoid repeating mistakes.
🎲
Don't Get Few-Shotted
Avoid overly repetitive examples in the context. Introduce structured variation to prevent the model from getting stuck in a suboptimal loop.
The Core Agentic Loop
User Input
Task is initiated.
Context Assembly
System prompt, history, RAG, and tools are combined.
LLM Action Selection
Model chooses the next action (e.g., a tool call).
Execution in Environment
The action is performed (API call, file write).
Observation & Append
Result is observed and added back to the context. Loop repeats.
The Future is Contextual
Rising Context Complexity
As agents tackle more complex tasks and integrate more data modalities (text, image, audio), the challenge of managing context will only grow. Future work will focus on advanced memory systems and cognition-inspired management techniques.