mantis
claude code · claude agent sdk · for open models

Claude Code, for open source.
Any model you can serve, on any provider.

A Claude-Code-style coding agent in your terminal, and Anthropic's claude-agent-sdk surface as a library — both running on Llama, Qwen, DeepSeek, GLM, or anything behind Ollama, vLLM, Groq, or your own GPU box. The migration is one import.

agent.py
-from claude_agent_sdk import query, ClaudeAgentOptions, tool
+from

That's the whole diff. Every canonical Claude SDK example runs verbatim — the surface is Anthropic-shaped, the wire format underneath is OpenAI-compat or Ollama.

The mantis terminal ships in the same install — a Claude-Code-style coding agent driving the open model you choose.

831 testsPython 3.11–3.13Apache-2.0v1.21.0

One kwarg between

OllamavLLMllama.cppTGITogetherFireworksGroqOpenRouterCerebrasOpenAIGeminiModal
two ways in · one pip install

A terminal to code in, and a library to build with.

01

The mantis terminal

Point it at any directory. It reads, writes, edits, greps, and runs shell commands — Claude Code's feel, driving your local Ollama, your vLLM box, or a hosted endpoint. The input stays pinned to the bottom; replies render as Markdown; file edits come back as real, line-numbered diffs.

mantis — ~/code/todo-api
            ▄▀▄▀
           ▄█▀
        ▄██▀▀█▀
    ▄█ ▄███▀▀
 ▄▄██▀▀██▀▀▀▀▀
 ▀▀ █  █▀ ▀▄
 ▄▄▀  ▄▀   ▀▄
build me a fastapi todo app
⚒ Edit app/main.py +12 -0
1+ from fastapi import FastAPI
2+ app = FastAPI()
3+ todos: list[str] = []
Done — run it with uvicorn app.main:app --reload.
✻ Undulating…(3s · esc to interrupt)
02

The Python library

The same engine, as an SDK. A tool-calling loop is a few lines away — and the exact same script runs against Together, Fireworks, vLLM, or Groq by changing one string.

quickstart.py
import asyncio
from mantis_agent import query, ClaudeAgentOptions, tool, AssistantMessage

@tool
async def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"{city}: 67°F"

async def main():
    async for msg in query(
        prompt="What's the weather in SF?",
        options=ClaudeAgentOptions(
            model="qwen2.5:1.5b",   # routes to local Ollama automatically
            tools=[get_weather],
            max_turns=5,
        ),
    ):
        if isinstance(msg, AssistantMessage):
            for block in msg.content:
                if hasattr(block, "text"):
                    print(block.text)

asyncio.run(main())
# same script, three backends — change one line
options = ClaudeAgentOptions(model="qwen2.5:7b")                       # → local Ollama
options = ClaudeAgentOptions(model="Qwen/Qwen2.5-72B-Instruct-Turbo")  # → Together
options = ClaudeAgentOptions(model="llama-3.3-70b-versatile",
                            backend="https://api.groq.com/openai/v1")  # → Groq
the whole surface

Streaming dispatch, hooks, permissions, MCP, sub-agents, sessions. None of the OSS alternatives ship the whole set.

one api, many backends

Route from the model name

qwen3:8b → Ollama. Qwen/… → Together. gpt-4o-mini → OpenAI. The URL is inferred from the model name shape; pass backend= to override.

real tool use

Native, prompted, or grammar-constrained

Native tools[] where supported, prompt-engineered <tool_call> XML where not, grammar-constrained JSON where the server enforces it. Chosen per model, automatically.

full mcp

Four transports, both directions

In-process via create_sdk_mcp_server, plus stdio / sse / http. Elicitation lets servers prompt the user; sampling lets them call back into the model.

sessions

Survive restarts, fork, resume

JSONL transcript persistence, fork from any checkpoint, resume from an arbitrary one, auto-compaction at a token threshold.

sub-agents + plugins

Compose agents as tools

Plugin(tools=, system_prompt_addition=, hooks=) merges at session start. Rewrite tool args before dispatch with PermissionResultAllow(updated_input=…).

budget

A ceiling on every run

Per-model pricing table, max_usd and max_turns ceilings, BudgetExceededError, total_cost_usd on every ResultMessage.

universal tool use

Every model gets tool use — through whichever path it can actually take.

A capability table (30+ models) picks the path per model, automatically. You write one@tool; the library figures out how the model in front of it can call it.

ANative

OpenAI-compat tools[]. The fast path for anything that speaks function-calling — Qwen 2.5+, Llama 3.1+, gpt-oss.

BPrompted

Prompt-engineered <tool_call> XML, parsed back out. Brings tool use to Llama 2, Mistral 7B, and older Qwens that never learned the schema.

CConstrained

Grammar-constrained JSON when the server can enforce it. The model physically cannot emit an invalid call.

ranked · picked by where they run

Pick the highest-ranked model that fits your hardware.

ModelRunsmodel=Notable
Kimi K2.6cloudmoonshotai/Kimi-K2.6-Instruct#1 open-weights GPQA
Qwen3 235B-A22Bcloud · 64 GB+Qwen/Qwen3-235B-A22B-Instruct-TurboApache 2.0, broad leader
GLM-5cloudzai-org/GLM-5Best open Arena Elo
MiniMax M2.5cloudminimaxai/MiniMax-M2.580.2% SWE-bench
DeepSeek-V3.2cloud · 80 GB+deepseek-ai/DeepSeek-V3.2Top general-purpose OSS
gpt-oss-120bcloud · 80 GBgpt-oss:120bOpenAI open, ~o4-mini class
Qwen2.5-Coder 7B8 GB localqwen2.5-coder:7bStrongest small coder
qwen2.5:1.5b4 GB localqwen2.5:1.5bCPU default, tool-capable

Full ranked catalog — 20 hosted + 10 CPU-friendly tiers — in Models & backends.

observability, shipped

A full span tree of every run — tokens and cost on the root.

agent.run → agent.turn → llm.call + tool.call, with per-model usage on the root span. Swap InMemoryTracer for OTelTracer to ship the same spans to your pipeline. Tool spans record input keys, never values — the safe choice is the only choice.

tracing.py
from mantis_agent import Agent, InMemoryTracer

tracer = InMemoryTracer()
agent  = Agent(model="qwen2.5:7b", tools=[...], tracer=tracer)
await agent.run(...)

tracer.summary()            # turns / tokens / cost_usd on the root span
tracer.write_jsonl("t.jsonl")

# ship the same spans to Datadog / Honeycomb / Tempo — zero extra code
from mantis_agent import OTelTracer
agent = Agent(model="qwen2.5:7b", tracer=OTelTracer(service_name="my-agent"))
does it actually work?

On a fresh machine, no GPU. Works on the first try.

pip install mantis-agent-sdk
mantis-agent setup-local     # pulls a CPU-friendly model, smoke-tests
python my_agent.py           # two tools, a 5-turn task — first try

Change one word — model= — and the same script runs against Together, Fireworks, vLLM, llama.cpp, or Groq.