Upstream comparison — Claude Code source vs. our v0

Read date: 2026-05-16. Source: vendored copy of the Claude Code TS source (~1,902 files). The public SDK surface is in src/entrypoints/sdk/; the runtime engine is src/QueryEngine.ts + src/query.ts + src/Tool.ts

src/services/tools/* + src/services/mcp/*.

The honest framing: I designed v0 against the public API shape of the Claude Agent SDK, not the actual source. Reading the source raises the bar. Below is what I missed, what's correctly assumed, and what we should change before M1.

What the upstream actually is

A drop-in analog is not just "Anthropic client + tool loop." The upstream engine carries a production agent runtime:

Subsystem	What it does	Files
QueryEngine	Owns conversation state across turns. Constructs the system prompt from many parts, calls `query`, accumulates usage, handles compaction triggers, runs hooks.	`QueryEngine.ts` (1,295 LOC)
`query` loop	The hot path. Calls the model, streams events, drives the streaming tool executor, handles auto-compact + reactive compact + snip + context-collapse, retries on fallback.	`query.ts` (1,729 LOC)
StreamingToolExecutor	Starts tool calls as they arrive in the assistant stream, not after the assistant message finalizes. Concurrent-safe tools run in parallel up to a cap; non-safe tools serialize.	`services/tools/StreamingToolExecutor.ts` (530 LOC)
toolOrchestration	Partitions a tool batch by `tool.isConcurrencySafe(parsedInput)` and runs each partition. Concurrency cap default 10 (env).	`services/tools/toolOrchestration.ts`
toolExecution	Runs one tool: parse input, validate, ask permissions, execute, capture progress, format result. ~1,745 LOC because real tools generate progress messages mid-execution.	`services/tools/toolExecution.ts`
Permissions	`canUseTool` callback + 28 hook events (PreToolUse, PostToolUse, Stop, SubagentStart, …). Allow/deny/ask rules per source (user/project/local).	`Tool.ts`, `hooks/useCanUseTool.ts`, `services/tools/toolHooks.ts`
MCP client	stdio + sse + http + in-process ("sdk") + claudeai-proxy transports. Server discovery, auth, OAuth port routing, elicitation (URL params mid-tool via JSON-RPC error -32042), channel allowlist, vscode SDK bridge.	`services/mcp/*` (~24 files)
Compaction	Auto-compact when budget low, reactive compact, snip compaction, context-collapse. Multiple strategies behind feature flags.	`services/compact/*`
Skills	Separate from tools. `searchHint`, `alwaysLoad`, skill prefetch in query loop.	`services/SkillSearch/`, `tools/SkillTool/`
Memory	The MEMORY.md tree system — separate "memdir" subsystem with auto-loading + nested paths.	`memdir/*`
Sessions	Persistence to disk, sessionId tracking, fork/resume, replay, message-queue manager.	`bootstrap/state.ts`, `utils/sessionStorage.ts`, V2 session API
Cost tracking	Per-model pricing table, usage accumulation, `costUSD` reported in events.	`cost-tracker.ts`, `services/api/logging.ts`
File state cache	Tracks file mtimes/contents to detect external edits between turns.	`utils/fileStateCache.ts`, `utils/fileHistory.ts`
Thinking config	adaptive (Opus 4.6+) / enabled with budgetTokens / disabled	`utils/thinking.ts`
Output enforcement	JSON-schema-constrained output via `SYNTHETIC_OUTPUT_TOOL_NAME`.	`tools/SyntheticOutputTool/*`
Hooks	28 events; each can mutate the request or block it.	`utils/hooks.ts`, `services/tools/toolHooks.ts`
Agent definitions	Filesystem-loaded sub-agent definitions (`AgentDefinition`), spawned via `AgentTool`.	`tools/AgentTool/*`

Public SDK surface (entrypoints/agentSdkTypes.ts) is small by comparison:

function tool(name, description, zodSchema, handler, extras?) -> SdkMcpToolDefinition
function createSdkMcpServer({name, version?, tools?}) -> McpSdkServerConfigWithInstance
function query({prompt, options}) -> Query   // async iterable of SDKMessage
class AbortError
 
// V2 (alpha)
unstable_v2_createSession(options) -> SDKSession
unstable_v2_resumeSession(id, options) -> SDKSession
unstable_v2_prompt(message, options) -> Promise<SDKResultMessage>
 
// Session ops
listSessions, getSessionInfo, getSessionMessages, renameSession, tagSession,
forkSession
 
// Remote control + scheduled tasks
connectRemoteControl, watchScheduledTasks, buildMissedTaskNotification

SDKMessage is a discriminated union — assistant chunks, user echoes, tool-use, tool-result, system init, compact boundaries, status changes, permission denials.

What I got right in v0

Universal Message + ContentBlock model with tagged unions. Their Message is also a tagged union (AssistantMessage | UserMessage | SystemMessage | ProgressMessage | AttachmentMessage | TombstoneMessage | ToolUseSummaryMessage | SystemLocalCommandMessage). Same idea.
Normalized streaming event model that all providers fold into. They don't multi-provider, but their StreamEvent shape is the same Anthropic SSE taxonomy I mirrored.
Provider adapter as a protocol; lazy registry. Right call — they don't have to do this because they're Anthropic-only, but for a multi-provider SDK it's the correct abstraction.
Shared httpx.AsyncClient with HTTP/2. Mirrors the long-lived @anthropic-ai/sdk client they hold open.
@tool decorator + schema auto-derivation. They use zod schemas passed explicitly; I auto-derive from type hints. Theirs is more explicit, mine is more ergonomic. I should support both — explicit input_schema= override is already there, just need to document.
anyio + task group for parallel dispatch. Right primitive. They use a generator merge helper (all) with concurrency limit; same thing different shape.

What I got wrong or missed (rank-ordered)

1. Tool dispatch happens AFTER the assistant message finalizes. Wrong.

Their model: StreamingToolExecutor watches the assistant stream and kicks off each tool_use block as soon as it finishes streaming, while later content is still arriving. Concurrent-safe tools run in parallel up to a cap; non-safe ones serialize.

My model: I assemble the full AssistantMessage, then dispatch all tool calls in a task_group. That's a real latency tax on multi-tool turns — if a tool takes 5 s and the assistant streams 3 more tool calls after it, my version waits 5 s before starting any.

Fix: Move tool dispatch into the stream consumer. When a ContentBlockStop event fires for a tool_use block, kick the dispatch task immediately. The agent loop becomes: stream → for each tool_use block that closes, start it → after message_stop, await all and assemble results → next turn.

2. `parallel_safe: bool` is too coarse.

Theirs: Tool.isConcurrencySafe(parsedInput) -> bool. Function of the inputs. Two bash calls writing to different paths can run in parallel; two writing to the same path can't.

Mine: static parallel_safe per tool.

Fix: Change Tool.parallel_safe from bool to Callable[[dict], bool] | bool (bool stays for backwards-compat). Partition runs by walking the input list.

3. No concurrency cap.

Theirs: CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY=10 default, env-overridable.

Mine: task_group.start_soon for every call — unbounded.

Fix: Wrap in anyio.Semaphore(cap) or use anyio.create_memory_object_stream + a worker pool. Cap configurable per-Agent.

4. No sibling-abort.

Theirs: Each tool batch runs under a child AbortController rooted at the turn's. If one tool errors (esp. bash), siblings get aborted so subprocesses die immediately rather than running to completion.

Mine: Errors are caught per-task and turned into ToolResultBlock(is_error=True); siblings keep running.

Fix: Open an anyio.CancelScope per concurrent batch; cancel on first error if a per-tool flag (abort_siblings_on_error) is set.

5. No permission / hook system.

Theirs: canUseTool(tool, input, ctx, message, useID, force?) is required on every tool call. Returns PermissionResult (allow / deny / ask). Hook events fire at 28 named points (PreToolUse, PostToolUse, Stop, SubagentStart/Stop, …) — each hook can mutate the request or block.

Mine: Tools just run. No gating, no hooks.

Fix: Add Agent(can_use_tool=async_fn, hooks=Hooks(...)). Start with a minimal hook set (PreToolUse, PostToolUse, Stop) and grow. The 28-event taxonomy in their HOOK_EVENTS const is a good steal — copy it verbatim.

6. No token budget / max-budget-usd / fallback model.

Theirs: maxTurns, maxBudgetUsd, taskBudget.total, fallbackModel. Auto-fallback when primary errors with FallbackTriggeredError.

Mine: max_steps only.

Fix: Add max_budget_usd + a per-model pricing table (Anthropic publishes them; ship a JSON). Track total_cost on the Agent; abort when exceeded. Add fallback_model knob and an error class to trigger it.

7. No thinking config.

Theirs: {type: 'adaptive'} | {type: 'enabled', budgetTokens?} | {type: 'disabled'}.

Mine: Nothing. Just temperature.

Fix: Add thinking: ThinkingConfig. Pass through to Anthropic adapter as thinking param; ignore on other providers.

8. No auto-compact.

Theirs: Multiple strategies — autoCompact (default), reactiveCompact, snipCompact, contextCollapse — all feature-gated. When token usage crosses a threshold, summarize older messages into a single SDKCompactBoundaryMessage and drop them.

Mine: None. Long sessions will OOM the context.

Fix: Ship SimpleCompactor for v0 — when total tokens > context_window * 0.85, summarize the oldest N messages with a cheap model. Boundary message in the history preserves audit trail. More strategies can plug in later.

9. MCP is not implemented.

Theirs: Four transports (stdio, sse, http, in-process "sdk"), OAuth flows, server registry, channel allowlist, elicitation, connection manager, vscode SDK bridge.

Mine: Nothing. My plan mentions M2 = MCP client.

Fix: Keep MCP at M2 but write the types now (server config union, transport enum) so the agent loop already handles MCP-provided tools alongside local ones. Otherwise the integration is going to require an invasive refactor.

10. Skills ≠ tools.

Theirs: Skills are first-class alongside tools and have a separate registry, searchHint, alwaysLoad, skill-prefetch in the query loop. The agent might "discover" a skill at turn start and inject it as additional context.

Mine: Skills don't exist.

Fix: Defer to M3. Note it in the plan so users know the data model accommodates it later.

11. Smaller misses

Miss	Where	Fix priority
`webSearchRequests` in `Usage`	their `ModelUsageSchema`	low — keep our Usage minimal, add as needed
`costUSD` in usage	their schema	medium — emit from a pricing table
`contextWindow`/`maxOutputTokens` in usage	their schema	low
File state cache (detect external edits)	`utils/fileStateCache.ts`	low — agents that touch the FS need this
Permission modes (default/auto/bypass)	`Tool.ts`	medium — needed for any non-trivial deployment
`SystemPrompt` from parts (not a single string)	`utils/queryContext.ts`	low — string works for v0
Cost tracker per-model	`cost-tracker.ts`	medium
Session forking	V2 SDK	low — defer
Output JSON-schema enforcement (synthetic tool)	`tools/SyntheticOutputTool`	low — defer

Revised milestones

Original M0 was "Anthropic-only single-turn done." That's still right but with three corrections before moving to M1:

M0.1 — Hot-path fixes (this week)

Streaming tool dispatch (kick off tool tasks at content_block_stop, not after message_stop).
is_concurrency_safe: Callable[[input], bool] + concurrency cap (default 10, env-overridable).
Sibling-abort via per-batch CancelScope.
Tests + a recorded SSE fixture (vcr-style).

M0.2 — Safety surface 5. can_use_tool callback wiring. 6. Hooks dataclass with the same 28 event names as upstream. v0 implements 3: pre_tool_use, post_tool_use, stop. 7. max_budget_usd + per-model pricing JSON.

M1 — Multi-provider (as before)

M1.5 — Compaction & thinking 8. thinking: ThinkingConfig knob. 9. SimpleCompactor.

M2 — MCP + sub-agents (as before)

M3 — Production polish + skills, output-schema enforcement, session fork.

Decision asks

Drop-in API shape. Upstream's public surface is query({prompt, options}) -> Query (async iterable). Mine is Agent(...).run(messages) -> messages. The Agent class is cleaner for typed code; their query() is more declarative. Recommendation: ship both. Agent is the primary, query() is a thin wrapper for drop-in compatibility.
Zod-equivalent schema. Upstream uses zod for tool schemas (gives JSON schema for free + runtime validation). I'm using type-hint introspection. Recommendation: keep my approach as the default, add an explicit input_schema= override (already exists), document that users who want zod-style validation can pass a msgspec.Struct and we'll derive the schema from it (cleaner than adding a pydantic/zod dep).
MCP types in v0 even though MCP itself ships M2. I want to write the McpServerConfig union, the Tool registry to accept MCP-provided tools, and the transport enum now so the agent loop already polymorphic. Cost: a few hundred lines of typed scaffolding that does nothing. Benefit: M2 is purely additive instead of an invasive refactor. Recommendation: yes, do it.

What I'm going to do next unless you say otherwise

Land the four M0.1 fixes on a branch and PR them.
Write the comparison this doc captures into a structured changelog.
Add MCP type scaffolding to types.py so the agent loop is already polymorphic.

If you want me to also wire OpenAI/Gemini before M0.1, say so — but I think the speed + safety hot-path fixes are more important than a second provider with a slow tool loop.