Upstream comparison — Claude Code source vs. our v0
Read date: 2026-05-16. Source: vendored copy of the Claude Code TS source
(~1,902 files). The public SDK surface is in src/entrypoints/sdk/; the
runtime engine is src/QueryEngine.ts + src/query.ts + src/Tool.ts
src/services/tools/*+src/services/mcp/*.
The honest framing: I designed v0 against the public API shape of the Claude Agent SDK, not the actual source. Reading the source raises the bar. Below is what I missed, what's correctly assumed, and what we should change before M1.
What the upstream actually is
A drop-in analog is not just "Anthropic client + tool loop." The upstream engine carries a production agent runtime:
| Subsystem | What it does | Files |
|---|---|---|
| QueryEngine | Owns conversation state across turns. Constructs the system prompt from many parts, calls query, accumulates usage, handles compaction triggers, runs hooks. |
QueryEngine.ts (1,295 LOC) |
query loop |
The hot path. Calls the model, streams events, drives the streaming tool executor, handles auto-compact + reactive compact + snip + context-collapse, retries on fallback. | query.ts (1,729 LOC) |
| StreamingToolExecutor | Starts tool calls as they arrive in the assistant stream, not after the assistant message finalizes. Concurrent-safe tools run in parallel up to a cap; non-safe tools serialize. | services/tools/StreamingToolExecutor.ts (530 LOC) |
| toolOrchestration | Partitions a tool batch by tool.isConcurrencySafe(parsedInput) and runs each partition. Concurrency cap default 10 (env). |
services/tools/toolOrchestration.ts |
| toolExecution | Runs one tool: parse input, validate, ask permissions, execute, capture progress, format result. ~1,745 LOC because real tools generate progress messages mid-execution. | services/tools/toolExecution.ts |
| Permissions | canUseTool callback + 28 hook events (PreToolUse, PostToolUse, Stop, SubagentStart, …). Allow/deny/ask rules per source (user/project/local). |
Tool.ts, hooks/useCanUseTool.ts, services/tools/toolHooks.ts |
| MCP client | stdio + sse + http + in-process ("sdk") + claudeai-proxy transports. Server discovery, auth, OAuth port routing, elicitation (URL params mid-tool via JSON-RPC error -32042), channel allowlist, vscode SDK bridge. | services/mcp/* (~24 files) |
| Compaction | Auto-compact when budget low, reactive compact, snip compaction, context-collapse. Multiple strategies behind feature flags. | services/compact/* |
| Skills | Separate from tools. searchHint, alwaysLoad, skill prefetch in query loop. |
services/SkillSearch/*, tools/SkillTool/* |
| Memory | The MEMORY.md tree system — separate "memdir" subsystem with auto-loading + nested paths. | memdir/* |
| Sessions | Persistence to disk, sessionId tracking, fork/resume, replay, message-queue manager. | bootstrap/state.ts, utils/sessionStorage.ts, V2 session API |
| Cost tracking | Per-model pricing table, usage accumulation, costUSD reported in events. |
cost-tracker.ts, services/api/logging.ts |
| File state cache | Tracks file mtimes/contents to detect external edits between turns. | utils/fileStateCache.ts, utils/fileHistory.ts |
| Thinking config | adaptive (Opus 4.6+) / enabled with budgetTokens / disabled | utils/thinking.ts |
| Output enforcement | JSON-schema-constrained output via SYNTHETIC_OUTPUT_TOOL_NAME. |
tools/SyntheticOutputTool/* |
| Hooks | 28 events; each can mutate the request or block it. | utils/hooks.ts, services/tools/toolHooks.ts |
| Agent definitions | Filesystem-loaded sub-agent definitions (AgentDefinition), spawned via AgentTool. |
tools/AgentTool/* |
Public SDK surface (entrypoints/agentSdkTypes.ts) is small by comparison:
function tool(name, description, zodSchema, handler, extras?) -> SdkMcpToolDefinition
function createSdkMcpServer({name, version?, tools?}) -> McpSdkServerConfigWithInstance
function query({prompt, options}) -> Query // async iterable of SDKMessage
class AbortError
// V2 (alpha)
unstable_v2_createSession(options) -> SDKSession
unstable_v2_resumeSession(id, options) -> SDKSession
unstable_v2_prompt(message, options) -> Promise<SDKResultMessage>
// Session ops
listSessions, getSessionInfo, getSessionMessages, renameSession, tagSession,
forkSession
// Remote control + scheduled tasks
connectRemoteControl, watchScheduledTasks, buildMissedTaskNotificationSDKMessage is a discriminated union — assistant chunks, user echoes,
tool-use, tool-result, system init, compact boundaries, status changes, permission denials.
What I got right in v0
- Universal Message + ContentBlock model with tagged unions. Their
Messageis also a tagged union (AssistantMessage | UserMessage | SystemMessage | ProgressMessage | AttachmentMessage | TombstoneMessage | ToolUseSummaryMessage | SystemLocalCommandMessage). Same idea. - Normalized streaming event model that all providers fold into. They don't multi-provider, but their
StreamEventshape is the same Anthropic SSE taxonomy I mirrored. - Provider adapter as a protocol; lazy registry. Right call — they don't have to do this because they're Anthropic-only, but for a multi-provider SDK it's the correct abstraction.
- Shared
httpx.AsyncClientwith HTTP/2. Mirrors the long-lived@anthropic-ai/sdkclient they hold open. @tooldecorator + schema auto-derivation. They usezodschemas passed explicitly; I auto-derive from type hints. Theirs is more explicit, mine is more ergonomic. I should support both — explicitinput_schema=override is already there, just need to document.- anyio + task group for parallel dispatch. Right primitive. They use a generator merge helper (
all) with concurrency limit; same thing different shape.
What I got wrong or missed (rank-ordered)
1. Tool dispatch happens AFTER the assistant message finalizes. Wrong.
Their model: StreamingToolExecutor watches the assistant stream and kicks off each tool_use block as soon as it finishes streaming, while later content is still arriving. Concurrent-safe tools run in parallel up to a cap; non-safe ones serialize.
My model: I assemble the full AssistantMessage, then dispatch all tool calls in a task_group. That's a real latency tax on multi-tool turns — if a tool takes 5 s and the assistant streams 3 more tool calls after it, my version waits 5 s before starting any.
Fix: Move tool dispatch into the stream consumer. When a ContentBlockStop event fires for a tool_use block, kick the dispatch task immediately. The agent loop becomes: stream → for each tool_use block that closes, start it → after message_stop, await all and assemble results → next turn.
2. parallel_safe: bool is too coarse.
Theirs: Tool.isConcurrencySafe(parsedInput) -> bool. Function of the inputs. Two bash calls writing to different paths can run in parallel; two writing to the same path can't.
Mine: static parallel_safe per tool.
Fix: Change Tool.parallel_safe from bool to Callable[[dict], bool] | bool (bool stays for backwards-compat). Partition runs by walking the input list.
3. No concurrency cap.
Theirs: CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY=10 default, env-overridable.
Mine: task_group.start_soon for every call — unbounded.
Fix: Wrap in anyio.Semaphore(cap) or use anyio.create_memory_object_stream + a worker pool. Cap configurable per-Agent.
4. No sibling-abort.
Theirs: Each tool batch runs under a child AbortController rooted at the turn's. If one tool errors (esp. bash), siblings get aborted so subprocesses die immediately rather than running to completion.
Mine: Errors are caught per-task and turned into ToolResultBlock(is_error=True); siblings keep running.
Fix: Open an anyio.CancelScope per concurrent batch; cancel on first error if a per-tool flag (abort_siblings_on_error) is set.
5. No permission / hook system.
Theirs: canUseTool(tool, input, ctx, message, useID, force?) is required on every tool call. Returns PermissionResult (allow / deny / ask). Hook events fire at 28 named points (PreToolUse, PostToolUse, Stop, SubagentStart/Stop, …) — each hook can mutate the request or block.
Mine: Tools just run. No gating, no hooks.
Fix: Add Agent(can_use_tool=async_fn, hooks=Hooks(...)). Start with a minimal hook set (PreToolUse, PostToolUse, Stop) and grow. The 28-event taxonomy in their HOOK_EVENTS const is a good steal — copy it verbatim.
6. No token budget / max-budget-usd / fallback model.
Theirs: maxTurns, maxBudgetUsd, taskBudget.total, fallbackModel. Auto-fallback when primary errors with FallbackTriggeredError.
Mine: max_steps only.
Fix: Add max_budget_usd + a per-model pricing table (Anthropic publishes them; ship a JSON). Track total_cost on the Agent; abort when exceeded. Add fallback_model knob and an error class to trigger it.
7. No thinking config.
Theirs: {type: 'adaptive'} | {type: 'enabled', budgetTokens?} | {type: 'disabled'}.
Mine: Nothing. Just temperature.
Fix: Add thinking: ThinkingConfig. Pass through to Anthropic adapter as thinking param; ignore on other providers.
8. No auto-compact.
Theirs: Multiple strategies — autoCompact (default), reactiveCompact, snipCompact, contextCollapse — all feature-gated. When token usage crosses a threshold, summarize older messages into a single SDKCompactBoundaryMessage and drop them.
Mine: None. Long sessions will OOM the context.
Fix: Ship SimpleCompactor for v0 — when total tokens > context_window * 0.85, summarize the oldest N messages with a cheap model. Boundary message in the history preserves audit trail. More strategies can plug in later.
9. MCP is not implemented.
Theirs: Four transports (stdio, sse, http, in-process "sdk"), OAuth flows, server registry, channel allowlist, elicitation, connection manager, vscode SDK bridge.
Mine: Nothing. My plan mentions M2 = MCP client.
Fix: Keep MCP at M2 but write the types now (server config union, transport enum) so the agent loop already handles MCP-provided tools alongside local ones. Otherwise the integration is going to require an invasive refactor.
10. Skills ≠ tools.
Theirs: Skills are first-class alongside tools and have a separate registry, searchHint, alwaysLoad, skill-prefetch in the query loop. The agent might "discover" a skill at turn start and inject it as additional context.
Mine: Skills don't exist.
Fix: Defer to M3. Note it in the plan so users know the data model accommodates it later.
11. Smaller misses
| Miss | Where | Fix priority |
|---|---|---|
webSearchRequests in Usage |
their ModelUsageSchema |
low — keep our Usage minimal, add as needed |
costUSD in usage |
their schema | medium — emit from a pricing table |
contextWindow/maxOutputTokens in usage |
their schema | low |
| File state cache (detect external edits) | utils/fileStateCache.ts |
low — agents that touch the FS need this |
| Permission modes (default/auto/bypass) | Tool.ts |
medium — needed for any non-trivial deployment |
SystemPrompt from parts (not a single string) |
utils/queryContext.ts |
low — string works for v0 |
| Cost tracker per-model | cost-tracker.ts |
medium |
| Session forking | V2 SDK | low — defer |
| Output JSON-schema enforcement (synthetic tool) | tools/SyntheticOutputTool |
low — defer |
Revised milestones
Original M0 was "Anthropic-only single-turn done." That's still right but with three corrections before moving to M1:
M0.1 — Hot-path fixes (this week)
- Streaming tool dispatch (kick off tool tasks at
content_block_stop, not aftermessage_stop). is_concurrency_safe: Callable[[input], bool]+ concurrency cap (default 10, env-overridable).- Sibling-abort via per-batch
CancelScope. - Tests + a recorded SSE fixture (vcr-style).
M0.2 — Safety surface
5. can_use_tool callback wiring.
6. Hooks dataclass with the same 28 event names as upstream. v0 implements 3: pre_tool_use, post_tool_use, stop.
7. max_budget_usd + per-model pricing JSON.
M1 — Multi-provider (as before)
M1.5 — Compaction & thinking
8. thinking: ThinkingConfig knob.
9. SimpleCompactor.
M2 — MCP + sub-agents (as before)
M3 — Production polish + skills, output-schema enforcement, session fork.
Decision asks
- Drop-in API shape. Upstream's public surface is
query({prompt, options}) -> Query(async iterable). Mine isAgent(...).run(messages) -> messages. The Agent class is cleaner for typed code; theirquery()is more declarative. Recommendation: ship both.Agentis the primary,query()is a thin wrapper for drop-in compatibility. - Zod-equivalent schema. Upstream uses
zodfor tool schemas (gives JSON schema for free + runtime validation). I'm using type-hint introspection. Recommendation: keep my approach as the default, add an explicitinput_schema=override (already exists), document that users who want zod-style validation can pass amsgspec.Structand we'll derive the schema from it (cleaner than adding a pydantic/zod dep). - MCP types in v0 even though MCP itself ships M2. I want to write the
McpServerConfigunion, theToolregistry to accept MCP-provided tools, and the transport enum now so the agent loop already polymorphic. Cost: a few hundred lines of typed scaffolding that does nothing. Benefit: M2 is purely additive instead of an invasive refactor. Recommendation: yes, do it.
What I'm going to do next unless you say otherwise
- Land the four M0.1 fixes on a branch and PR them.
- Write the comparison this doc captures into a structured changelog.
- Add MCP type scaffolding to
types.pyso the agent loop is already polymorphic.
If you want me to also wire OpenAI/Gemini before M0.1, say so — but I think the speed + safety hot-path fixes are more important than a second provider with a slow tool loop.