Thinking blocks · mantis-agent-sdk

Models that reason out loud — DeepSeek-R1, QwQ, Marco-o1, R1-distills, and the OpenAI reasoning family — emit "thinking" content separately from the final answer. mantis-agent-sdk normalises both forms into a single ThinkingBlock you can render or hide.

Two formats, one block

Source	Wire format	Surface
DeepSeek-R1 / QwQ / Marco-o1	Inline `<think>...</think>` in the text stream.	`ThinkingBlock`
DeepSeek API (out-of-band)	`thinking` field on the message, separate from `content`.	`ThinkingBlock`
OpenAI o1 / o3 / o4	API doesn't expose tokens — only counts.	Not surfaced.

The runtime parses <think> tags out of the text stream so by the time you see content, thinking is its own block.

Streaming

async for event in agent.run_iter("..."):
    if event.type == "thinking_delta":
        print(f"\033[2m{event.text}\033[0m", end="", flush=True)
    elif event.type == "text_delta":
        print(event.text, end="", flush=True)

thinking_delta events stream as the thinking text arrives; text_delta events are the final answer.

Flat-shape messages

In query() output, ThinkingBlock appears in message.content:

async for msg in query(prompt="...", options={"model": "deepseek-r1:1.5b"}):
    if msg.type == "assistant":
        for block in msg.message["content"]:
            if block["type"] == "thinking":
                # Hide from end users; show in a "details" UI element
                ...
            elif block["type"] == "text":
                print(block["text"])

Enabling / disabling

Some models gate thinking behind a request flag. Setting include_thinking=False strips it from the request body where supported, and removes any inline <think> blocks the model emits anyway:

options = {
    "model": "deepseek-r1:1.5b",
    "include_thinking": False,
}

The default is True (surface everything).

Capability check

from mantis_agent import lookup_model
 
cap = lookup_model("deepseek-r1:1.5b")
print(cap.supports_thinking)        # True
print(cap.thinking_format)          # 'inline' | 'out_of_band' | 'hidden'

When you want to use thinking

Thinking blocks are useful for:

Debugging. When a tool call goes wrong, the thinking trail shows why.
Self-consistency. Re-running with a different seed and comparing thinking can surface confidence.
UX. Render thinking in a collapsed disclosure so users can audit reasoning without it dominating the chat.

They are not useful as final output. Always render the regular text blocks as the user-facing answer.