Quickstart

Five minutes from a clean Python env to a streaming agent with tool use.

1. Install

pip install mantis-agent-sdk

2. Choose a backend

Any of these works. Pick whichever you have credentials for.

=== "Ollama (local)"

```bash
ollama pull qwen2.5:7b
```
 
No env vars. `mantis-agent-sdk` will auto-discover Ollama on
`http://localhost:11434`.

=== "Together / Fireworks / Groq / vLLM (hosted OpenAI-compat)"

```bash
export MANTIS_AGENT_BASE_URL=https://api.together.xyz/v1
export MANTIS_AGENT_API_KEY=$TOGETHER_API_KEY
export MANTIS_AGENT_MODEL=Qwen/Qwen2.5-72B-Instruct
```

=== "OpenAI"

```bash
export OPENAI_API_KEY=sk-...
```

=== "Anthropic (parity testing)"

```bash
export ANTHROPIC_API_KEY=sk-ant-...
```

3. Your first agent

quickstart.py:

import asyncio
from mantis_agent import query, tool
 
 
@tool
async def get_weather(city: str) -> str:
    """Get the current weather for a city. Returns a one-line summary."""
    return f"{city}: 67°F, partly cloudy, wind 8 mph NW"
 
 
async def main():
    async for msg in query(
        prompt="What's the weather in Lagos?",
        options={
            "model": "qwen2.5:7b",       # or "gpt-4o-mini", or
                                          # "Qwen/Qwen2.5-72B-Instruct"
            "tools": [get_weather],
            "max_turns": 4,
        },
    ):
        if msg.type == "assistant":
            for block in msg.message["content"]:
                if block["type"] == "text":
                    print(block["text"])
        elif msg.type == "result":
            print(f"\n[done — cost ${msg.total_cost_usd:.4f}]")
 
 
asyncio.run(main())

python quickstart.py

You should see the assistant pick get_weather, run it, then narrate the result in plain English.

What just happened

query() is the same function the Claude Agent SDK ships. It returns an async iterator of SDKMessage objects (assistant / user / system / result).
@tool decorates an async Python function. The signature becomes the JSON schema sent to the model.
options={"model": "qwen2.5:7b"} — mantis-agent-sdk auto-routes from the model name. qwen2.5:7b → Ollama. No backend= argument needed.
max_turns=4 puts a hard ceiling on the agent loop. Pair it with max_usd=0.10 for cost limits — see Budget.

Streaming with `ClaudeSDKClient`

For a session that survives multiple query() calls:

from mantis_agent import ClaudeSDKClient, ClaudeAgentOptions
 
async def main():
    options = ClaudeAgentOptions(
        model="qwen2.5:7b",
        tools=[get_weather],
    )
    async with ClaudeSDKClient(options) as client:
        async for msg in client.query("What's the weather in Lagos?"):
            ...
        async for msg in client.query("Now compare it to Lisbon."):
            ...

The transcript is persisted to ~/.mantis-agent/sessions/{session_id}.jsonl between calls and can be forked or resumed later.

Next steps

Pick the right backend for your model
Write more tools, including parallel-safe ones
Plug in MCP servers
Stream and dispatch tools mid-response
Look up the full API