Introduction
Code-based MCP: exposing a single tool that executes code (Python or JavaScript) changes how agents interact with systems. This pattern reduces CLI fragility, simplifies composition of complex flows, and yields reusable scripts, while raising practical security and control concerns.
Context
Agents often rely on many command-line tools. CLIs are platform- and version-dependent, affected by encoding issues, and sometimes undocumented; this leads to brittle runs and frequent failures, especially for tools not present in the model’s training data.
Code-based MCP: approach and benefits
Expose an “ubertool” that evaluates code inside a persistent interpreter: an MCP becomes a stateful Python/JS REPL with useful libraries available (e.g., pexpect, Playwright). Key practical benefits:
- Stateful sessions: variables and objects persist across invocations;
- Natural composition: loops, error handling and synchronization are written in code;
- Reusable scripts: session interactions can be dumped as standalone playbooks;
- Large functional surface: language introspection functions help discover available capabilities.
Practical examples: pexpect and Playwright
With pexpect the MCP provides a Python environment where the agent sends code that spawns interactive programs and reacts to prompts (e.g., LLDB), reducing many tool calls into a few code executions. With Playwright the MCP runs JS against the Playwright API, forwarding console logs and preserving state across page interactions for efficient scraping or testing.
The problem: CLI frictions
CLIs introduce encoding headaches, newline requirements, and the need for explicit session management (e.g., tmux) that agents handle poorly. Security preflights and extra validation can further slow multi-turn interactions. Small mistakes break state and force restarts.
Recommended approach
Provide an interpreter as the single MCP tool and include:
- A controlled runtime/virtualenv with common libs;
- Timeouts, structured logging and automatic forwarding of stdout/stderr;
- Facilities to export session logic into reusable scripts;
- Sandboxing and syscall-level controls where feasible.
Security considerations
Eval-style execution opens large attack surfaces because the interpreter can spawn processes and run arbitrary commands. True protection may require syscall interception and strict sandboxing; balancing utility and security is nontrivial and residual risk remains.
When to use a code-based MCP
Use this pattern when you need persisted session state, complex composition and the ability to convert explorations into reusable playbooks—typical for interactive debugging, multi-page scraping, and E2E testing.
Conclusion
A code-based MCP that exposes a language interpreter provides clear operational advantages over many discrete CLI tools: less brittleness, easier composition and higher reuse. Still, significant security and operational trade-offs must be addressed by careful runtime design and isolation.
FAQ
-
When should I use a code-based MCP for debugging C programs?
When interactive state and reproducible playbooks matter (e.g., LLDB sessions): it reduces turns and speeds up repeated runs.
-
What practical gains come from a code-based MCP versus multiple CLIs?
Fewer tool calls, direct logs, easier synchronization and scripts that can be reused outside the MCP.
-
How can I mitigate security risks of an MCP that runs eval?
Apply strict sandboxing, syscall filtering, minimal privileges, and thorough logging and timeout policies.
-
Can session code produced within the MCP run independently later?
Yes: one major advantage is that agents often output runnable scripts that work without the MCP.
-
Where does the code-based MCP approach struggle?
When a domain or library is novel and the model lacks patterns for it; results depend on library familiarity and prompt quality.