guides
Models and backends
mantis-agent-sdk ships seven backends. You almost never pick one by name —
the SDK auto-routes from the model string.
Backends at a glance
| Backend | Use when | Routes from |
|---|---|---|
ollama |
You're running Ollama locally (or remote). | Model names with a tag form: llama3.2:3b, qwen2.5:7b, deepseek-r1:1.5b. |
openai_compat |
Hosted OpenAI-compatible endpoints: vLLM, Together, Fireworks, Groq, OpenRouter, Cerebras. | MANTIS_AGENT_BASE_URL set; or org-prefixed names like Qwen/Qwen2.5-72B-Instruct. |
openai |
OpenAI proper. | gpt-4*, gpt-3.5*, o1*, o3*, o4*. |
gemini |
Google Gemini via the OpenAI-compat endpoint. | gemini-*. |
llamacpp |
Local llama.cpp llama-server. |
--backend llamacpp or base_url=http://localhost:8080/v1. |
tgi |
HuggingFace text-generation-inference. | --backend tgi. |
modal |
Modal serverless GPUs. | --backend modal with a MODAL_* env. |
anthropic_passthrough |
Parity testing against real Claude. | claude-* model names (only when ANTHROPIC_API_KEY is set). |
mock |
Tests / smoke runs. | MANTIS_AGENT_MOCK=1 env. |
Auto-routing rules
The routing module maps model names to backends in priority order:
- Explicit
backend=orMANTIS_AGENT_BACKENDenv wins. MANTIS_AGENT_MOCK=1→mock.- Ollama tag form (
name:tag) →ollama. gpt-*/o[134]*→openai.gemini-*→gemini.claude-*withANTHROPIC_API_KEY→anthropic_passthrough.- Org-prefixed (
Qwen/...,meta-llama/...) →openai_compat(needsMANTIS_AGENT_BASE_URL). - Otherwise → error with a hint about which env vars to set.
If you're unsure what a name will route to:
from mantis_agent.routing import resolve_backend
print(resolve_backend("qwen2.5:7b")) # → 'ollama'
print(resolve_backend("gpt-4o-mini")) # → 'openai'
print(resolve_backend("Qwen/Qwen2.5-72B")) # → 'openai_compat'Forcing a backend
options = {
"model": "Qwen/Qwen2.5-72B-Instruct",
"backend": "openai_compat",
"base_url": "https://api.together.xyz/v1",
"api_key": os.environ["TOGETHER_API_KEY"],
}Or set MANTIS_AGENT_BACKEND=openai_compat, MANTIS_AGENT_BASE_URL=…,
MANTIS_AGENT_API_KEY=… and skip the explicit fields.
Capabilities
Every model also carries a ModelCapability row that tells the runtime
how to drive tool use. The capability table currently covers 30+ models:
from mantis_agent import lookup_model, resolve_tool_use_path
cap = lookup_model("deepseek-r1:1.5b")
print(cap.tool_use_path) # ToolUsePath.XML_PROMPT_ENGINEERED
print(cap.supports_thinking) # True
print(cap.context_window) # 128_000resolve_tool_use_path() chooses between three strategies:
NATIVE_TOOLS— passtools[]in the request body. Modern OpenAI, Claude, most Qwens, llama 3.1+.XML_PROMPT_ENGINEERED— inject<tool_call>XML into the system prompt and parse it back out of completions. Llama 2, Mistral 7B, older Qwens.GRAMMAR_CONSTRAINED_JSON— use a JSON-schema grammar (llama.cpp, vLLM) to force valid tool-call JSON.
You don't normally pick this manually — the routing module handles it. But you can override if a specific model needs a different path:
options = {
"model": "qwen2.5:0.5b",
"tool_use_path": "xml_prompt_engineered",
}Per-backend notes
Ollama
- Auto-discovered on
http://localhost:11434unless overridden byMANTIS_AGENT_BASE_URL. - Native tool use supported for Llama 3.1+ and Qwen 2.5+.
setup-localwrites a startup-on-first-run launcher for the Ollama daemon — see Local setup.
OpenAI-compat (vLLM, Together, Fireworks, Groq, OpenRouter, Cerebras)
- Set
MANTIS_AGENT_BASE_URLandMANTIS_AGENT_API_KEY. - Tool use goes via native
tools[]. - Some providers (Cerebras, Groq) have stricter context windows; the capability table tracks these.
llama.cpp
- Use
--jinjato enable native tool-use templates.mantis-agent-sdkdoes this for you when starting viasetup-local-llamacpp. - Without
--jinja, falls back toGRAMMAR_CONSTRAINED_JSON.
Modal serverless
- For multi-hour or GPU runs: launch a model on Modal, point the SDK at the Modal URL. The Modal adapter handles cold-start delays and per-request keepalives.
Anthropic passthrough
- Only for parity testing. Pins to the real Anthropic API so you can compare mantis-agent-sdk's behaviour against the source.
- Not part of the 1.0 public surface — don't build production code on it.