Agentic Coding System — Deep Dive | Wanxin Bai

🤖 Explore the full interactive guide → agentwiki.wanxinbai.com
Includes interactive ReAct simulators, token economics comparisons, failover demos, and a live AI quiz.

Overview

The core operational logic of the Agentic Coding System lives inside a while true loop — a classic implementation of the ReAct pattern (Reasoning-Action Loop). The model first reasons about what needs to be done, invokes specific tools to execute those actions, then reasons again on the results — cycling until the task is complete.

What separates a production-grade agentic system from a simple chatbot wrapper is everything around that loop: how context is managed, how failures are recovered, how security is enforced, and how multiple agents are coordinated. This post breaks it all down.

The Core ReAct Loop

The main loop runs through five distinct stages on every iteration.

Stage 1: Context Preparation

Before invoking the model, the system prunes outdated historical messages and applies light compression to cached tool results. If the context becomes excessively long, it triggers a comprehensive summarization process. These operations ensure the context delivered to the model in every iteration is complete and within token limits.

Stage 2: Streaming Model Invocation

The system packages the current conversation history, system prompts, and available tools, then sends this bundle to the cloud model using streaming output — the model transmits its response in real-time as it generates content rather than waiting for the full output. Two types of information are collected simultaneously: the textual response, and any tool invocation intentions — signals from the model indicating it wants to call a specific tool.

Stage 3: Tool Execution

Two executors handle tool calls:

Streaming Executor — begins executing tools in parallel while the model is still generating output
Batch Executor — waits until all tool invocation requests are finalized, then executes collectively

All requests pass through a permissions check and a hook system before execution. Operations requiring missing permissions trigger a confirmation prompt; hooks allow interception or modification of tool behaviors.

Stage 4: Artifact Collection

Once tool execution completes, the system gathers supplementary artifacts — task notifications, memory content updates, file change logs — to prepare for the subsequent model invocation round.

Stage 5: Termination & Recovery

If the model proposes no further tool invocations, the task is complete and the loop terminates. Otherwise the loop continues. On error, the system self-heals rather than crashing:

C13 (context overflow) → triggers context compression, then retries
Truncated output → auto-upgrades token limit from 8K to 64K, retries up to 3×
Seven-layer recovery cascade: API exponential backoff → overload handling → token recovery → compressed response contexts → context purging → unattended persistent retries → emergency compaction

The maximum backoff is 5 minutes with a 6-hour reset cycle. Network jitter, API overload, context explosions — the system recovers from all of them.

The entire loop is implemented as an async generator (async function + yield), pushing every message and progress update via SSE so the UI renders updates in real-time.

6 Engineering Highlights

1. Prompt Cache Segmentation

The system prompt is split into two halves at a boundary marker named system_prompt_dynamic_boundary:

Section	Contents	Cached?
Static	Role declarations, tool rules, coding philosophy	✅ Global cache shared across all users
Dynamic	Memory content, MCP instructions, environment info	❌ Never cached

A cache hit on Anthropic’s API can cut input token costs by up to 90% and dramatically reduce Time-To-First-Token. By keeping the shared prefix exactly identical across users, the system maximizes its cache-hit rate — a level of optimization only achievable by a team with deep knowledge of the underlying API.

2. Four-Tier Context Compression

Rather than simply truncating history when the context gets long, the system uses a four-tier strategy:

Snip — lightweight trim of older messages before each API call
Micro-compact — three sub-strategies: cache-aware, time-based, and API-level compression
Auto-compact — triggered at a token threshold; AI-powered summarization condenses the full conversation into a structured summary
reactorcompact — emergency compression on a genuine 413 context-overflow error, followed by intelligent restoration of recently accessed files (the active “play” file and invoked skills)

The post-compression recovery logic is the clever part: it doesn’t blindly discard and restart — it compresses to make room, then restores only the most critical context.

3. Speculative Execution

The system begins pre-executing an operation before the user explicitly confirms it, using a copy-on-write overlay filesystem:

All write operations go to a temporary overlay directory
If confirmed → overlay files are merged to the real filesystem
If declined → overlay is deleted, real filesystem untouched

The process is pipelined: while the user reviews the current suggestion, the system has already begun speculatively executing the next one in the queue — mirroring CPU instruction pipelines to mask confirmation latency.

4. 20-Check Command Security

Every shell command passes 20 security checks before execution:

Incomplete command detection
JQ function injection (dangerous source-character embedding)
Newline injection attacks
Command substitution patterns
IFS injection
Token theft attempts
Unicode whitespace masquerading
Dangerous shell command patterns

In automatic mode, an interpreter blacklist prevents Python, Node.js, Ruby, Perl, and PHP from executing without explicit user confirmation — the AI cannot silently run scripts in these languages.

5. Zustand-Style State Store

Rather than using an off-the-shelf state library, the system implements a custom lightweight store optimized for terminal rendering with React Ink. Key properties:

Object-identity reference comparisons — re-renders fire only when the subscribed field actually changes
Selector subscriptions — components subscribe to specific fields, not the whole store
Immutable store instance — guarantees no cascading re-renders

The global state holds 100+ properties: settings, task queues, tool configurations, permissions, MCP status, speculative execution state, and more.

6. Worker System (6 types · 24 events)

Six worker types cover every execution mode:

Worker	Purpose
`Command`	Direct shell execution
`Prompt`	LLM-powered review
`Agent`	Full multi-turn agent session
`HTTP`	External endpoint calls
`Callback`	Internal TypeScript functions
`Function`	Boolean checks

24 event types span pre/post tool execution, API requests, conversation lifecycle, compression triggers, and user input. This lets enterprise teams deeply customize behavior — auto-logging every Bash call, running security reviews before writes — without modifying core source code.

Why It’s a Mature Agent Runtime

Multi-Agent Architecture

Three agent execution models are supported:

Fork Agent — child inherits parent’s full context, runs in an independent process branch
In-Process Agent — same process, AsyncLocalStorage for context isolation, lower overhead
Split-Pane Agent — leader + teammate rendered side-by-side in a Tmux split, visible parallelism

Coordinator Mode

A central orchestrator decomposes tasks into sub-tasks and assigns each to a worker agent with its own prompt, tool set, and model. Four built-in phases: Research → Synthesis → Implementation → Verification.

Built-in roles: Planning Agent, Exploration Agent, Verification Agent, Agentic Coding Guide Agent. Custom agents can be defined via config.

This is not a monolithic single-agent tool — it is a hierarchical, multi-agent architecture with Manager-Worker patterns, task decomposition, concurrent execution, and a clear separation of planning and execution.

Key Takeaways

The Agentic Coding System demonstrates that production-grade agentic infrastructure requires deliberate engineering on multiple fronts simultaneously:

Reliability — a seven-layer recovery cascade ensures self-healing, not crashing
Cost — prompt cache segmentation and context compression keep token costs manageable
Latency — speculative execution and streaming mask wait times
Security — 20-point command validation and interpreter blacklists limit blast radius
Scalability — a multi-agent coordinator with concurrent execution handles complex tasks

Any one of these six design patterns is worth a dedicated post. Together, they define what it means to build an agent system with engineering maturity.

Want to go deeper? The full interactive guide at agentwiki.wanxinbai.com covers orchestration frameworks, advanced RAG patterns, MCP, token economics, and an AI-powered knowledge quiz.

Machine Learning Developer, Security Awareness Educator and Life Adventurer

QUICK LINKS

FEATURED TAGS

cyber-security machine learning nlp paper review random-wandering security-awareness word representation