Wanxin's Blog

When You Believe
🤖 Explore the full interactive guide → agentwiki.wanxinbai.com
Includes interactive ReAct simulators, token economics comparisons, failover demos, and a live AI quiz.

Overview

The core operational logic of the Agentic Coding System lives inside a while true loop — a classic implementation of the ReAct pattern (Reasoning-Action Loop). The model first reasons about what needs to be done, invokes specific tools to execute those actions, then reasons again on the results — cycling until the task is complete.

What separates a production-grade agentic system from a simple chatbot wrapper is everything around that loop: how context is managed, how failures are recovered, how security is enforced, and how multiple agents are coordinated. This post breaks it all down.


The Core ReAct Loop

The main loop runs through five distinct stages on every iteration.

Stage 1: Context Preparation

Before invoking the model, the system prunes outdated historical messages and applies light compression to cached tool results. If the context becomes excessively long, it triggers a comprehensive summarization process. These operations ensure the context delivered to the model in every iteration is complete and within token limits.

Stage 2: Streaming Model Invocation

The system packages the current conversation history, system prompts, and available tools, then sends this bundle to the cloud model using streaming output — the model transmits its response in real-time as it generates content rather than waiting for the full output. Two types of information are collected simultaneously: the textual response, and any tool invocation intentions — signals from the model indicating it wants to call a specific tool.

Stage 3: Tool Execution

Two executors handle tool calls:

  • Streaming Executor — begins executing tools in parallel while the model is still generating output
  • Batch Executor — waits until all tool invocation requests are finalized, then executes collectively

All requests pass through a permissions check and a hook system before execution. Operations requiring missing permissions trigger a confirmation prompt; hooks allow interception or modification of tool behaviors.

Stage 4: Artifact Collection

Once tool execution completes, the system gathers supplementary artifacts — task notifications, memory content updates, file change logs — to prepare for the subsequent model invocation round.

Stage 5: Termination & Recovery

If the model proposes no further tool invocations, the task is complete and the loop terminates. Otherwise the loop continues. On error, the system self-heals rather than crashing:

  • C13 (context overflow) → triggers context compression, then retries
  • Truncated output → auto-upgrades token limit from 8K to 64K, retries up to 3×
  • Seven-layer recovery cascade: API exponential backoff → overload handling → token recovery → compressed response contexts → context purging → unattended persistent retries → emergency compaction

The maximum backoff is 5 minutes with a 6-hour reset cycle. Network jitter, API overload, context explosions — the system recovers from all of them.

The entire loop is implemented as an async generator (async function + yield), pushing every message and progress update via SSE so the UI renders updates in real-time.


6 Engineering Highlights

1. Prompt Cache Segmentation

The system prompt is split into two halves at a boundary marker named system_prompt_dynamic_boundary:

Section Contents Cached?
Static Role declarations, tool rules, coding philosophy ✅ Global cache shared across all users
Dynamic Memory content, MCP instructions, environment info ❌ Never cached

A cache hit on Anthropic’s API can cut input token costs by up to 90% and dramatically reduce Time-To-First-Token. By keeping the shared prefix exactly identical across users, the system maximizes its cache-hit rate — a level of optimization only achievable by a team with deep knowledge of the underlying API.

2. Four-Tier Context Compression

Rather than simply truncating history when the context gets long, the system uses a four-tier strategy:

  1. Snip — lightweight trim of older messages before each API call
  2. Micro-compact — three sub-strategies: cache-aware, time-based, and API-level compression
  3. Auto-compact — triggered at a token threshold; AI-powered summarization condenses the full conversation into a structured summary
  4. reactorcompact — emergency compression on a genuine 413 context-overflow error, followed by intelligent restoration of recently accessed files (the active “play” file and invoked skills)

The post-compression recovery logic is the clever part: it doesn’t blindly discard and restart — it compresses to make room, then restores only the most critical context.

3. Speculative Execution

The system begins pre-executing an operation before the user explicitly confirms it, using a copy-on-write overlay filesystem:

  • All write operations go to a temporary overlay directory
  • If confirmed → overlay files are merged to the real filesystem
  • If declined → overlay is deleted, real filesystem untouched

The process is pipelined: while the user reviews the current suggestion, the system has already begun speculatively executing the next one in the queue — mirroring CPU instruction pipelines to mask confirmation latency.

4. 20-Check Command Security

Every shell command passes 20 security checks before execution:

  • Incomplete command detection
  • JQ function injection (dangerous source-character embedding)
  • Newline injection attacks
  • Command substitution patterns
  • IFS injection
  • Token theft attempts
  • Unicode whitespace masquerading
  • Dangerous shell command patterns

In automatic mode, an interpreter blacklist prevents Python, Node.js, Ruby, Perl, and PHP from executing without explicit user confirmation — the AI cannot silently run scripts in these languages.

5. Zustand-Style State Store

Rather than using an off-the-shelf state library, the system implements a custom lightweight store optimized for terminal rendering with React Ink. Key properties:

  • Object-identity reference comparisons — re-renders fire only when the subscribed field actually changes
  • Selector subscriptions — components subscribe to specific fields, not the whole store
  • Immutable store instance — guarantees no cascading re-renders

The global state holds 100+ properties: settings, task queues, tool configurations, permissions, MCP status, speculative execution state, and more.

6. Worker System (6 types · 24 events)

Six worker types cover every execution mode:

Worker Purpose
Command Direct shell execution
Prompt LLM-powered review
Agent Full multi-turn agent session
HTTP External endpoint calls
Callback Internal TypeScript functions
Function Boolean checks

24 event types span pre/post tool execution, API requests, conversation lifecycle, compression triggers, and user input. This lets enterprise teams deeply customize behavior — auto-logging every Bash call, running security reviews before writes — without modifying core source code.


Why It’s a Mature Agent Runtime

Multi-Agent Architecture

Three agent execution models are supported:

  • Fork Agent — child inherits parent’s full context, runs in an independent process branch
  • In-Process Agent — same process, AsyncLocalStorage for context isolation, lower overhead
  • Split-Pane Agent — leader + teammate rendered side-by-side in a Tmux split, visible parallelism

Coordinator Mode

A central orchestrator decomposes tasks into sub-tasks and assigns each to a worker agent with its own prompt, tool set, and model. Four built-in phases: Research → Synthesis → Implementation → Verification.

Built-in roles: Planning Agent, Exploration Agent, Verification Agent, Agentic Coding Guide Agent. Custom agents can be defined via config.

This is not a monolithic single-agent tool — it is a hierarchical, multi-agent architecture with Manager-Worker patterns, task decomposition, concurrent execution, and a clear separation of planning and execution.


Key Takeaways

The Agentic Coding System demonstrates that production-grade agentic infrastructure requires deliberate engineering on multiple fronts simultaneously:

  • Reliability — a seven-layer recovery cascade ensures self-healing, not crashing
  • Cost — prompt cache segmentation and context compression keep token costs manageable
  • Latency — speculative execution and streaming mask wait times
  • Security — 20-point command validation and interpreter blacklists limit blast radius
  • Scalability — a multi-agent coordinator with concurrent execution handles complex tasks

Any one of these six design patterns is worth a dedicated post. Together, they define what it means to build an agent system with engineering maturity.

Want to go deeper? The full interactive guide at agentwiki.wanxinbai.com covers orchestration frameworks, advanced RAG patterns, MCP, token economics, and an AI-powered knowledge quiz.