Tool calling not working through Studio — what am I missing?

# Tool calling not working through Studio — what am I missing?

Hey team 👋 — I've been trying to get OpenAI-style function calling working with Unsloth Studio and I'm stuck. The same request works when I hit the underlying `llama-server` directly, but fails through Studio's proxy, so I suspect I'm configuring something wrong. Wanted to ask before assuming it's a bug.

## What I'm seeing

When I send `tools` + `tool_choice` to `POST :8888/v1/chat/completions`, the model behaves as if it never saw them — its own chain-of-thought literally says *"No tool definitions have been provided."* `finish_reason` comes back as `stop`, never `tool_calls`, even with `tool_choice: "required"`.

OpenAI clients that rely on function calling (opencode in my case) then end up surfacing the model's raw tool-call tokens as plain text, e.g.:

```
<think>...</think><|tool_call>call:bash { "command": "ls" }<tool_call|>
```

…which I think is because llama.cpp stays in `"chat_format":"Content-only"` when no tool schema is present in the prompt, so it doesn't parse the model's tool-call tokens into structured `tool_calls`.

## What I've tried

### Through Studio (`:8888`)

```bash
TOKEN=$(~/.unsloth/get-token.sh)
curl -s http://localhost:8888/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{
    "model":"unsloth/gemma-4-26B-A4B-it-GGUF",
    "messages":[{"role":"user","content":"List files using the bash tool."}],
    "tools":[{"type":"function","function":{"name":"bash",
              "description":"run a shell command",
              "parameters":{"type":"object",
                "properties":{"command":{"type":"string"}},
                "required":["command"]}}}],
    "tool_choice":"required","stream":false,"max_tokens":256}'
```

Response:

```json
{"id":"chatcmpl-422ff459db1f","object":"chat.completion","created":1776080902,"model":"unsloth/gemma-4-26B-A4B-it-GGUF","choices":[{"index":0,"message":{"role":"assistant","content":"<think>*   User wants to \"List files using the bash tool.\"\n    *   The user is implying that I have access to a \"bash tool.\"\n    *   I need to check my system instructions regarding tool use.\n\n    *   \"Tool Use: You can use tools only if specific endpoints are provided to you in the context. You do not have access to Google Search or any other tools by default.\"\n    *   \"Do not attempt to use tools or search the internet unless a specific tool definition and endpoint has been explicitly provided in the prompt.\"\n\n    *   The user has *not* provided a tool definition or an endpoint for a bash tool in the current context.\n    *   Therefore, I cannot use a bash tool.\n\n    *   State clearly that I do not have access to a bash tool or any other tools in this context.\n    *   Explain that I can only use tools if they are explicitly provided with definitions and endpoints.</think>I do not have access to a bash tool or any other tools in this context. I can only use tools if specific definitions and endpoints are provided to me."},"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
```

The smoking gun is inside the model's own reasoning: **"The user has *not* provided a tool definition or an endpoint for a bash tool in the current context."** — i.e. the tool schema never reached the prompt.

### Directly against the `llama-server` Studio spawned (`:56571`)

Same payload, different endpoint + model id:

```bash
curl -s http://localhost:56571/v1/chat/completions \
  -H "Content-Type: application/json" -d '{
    "model":"gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf",
    "messages":[{"role":"user","content":"List files using the bash tool."}],
    "tools":[{"type":"function","function":{"name":"bash",
              "description":"run a shell command",
              "parameters":{"type":"object",
                "properties":{"command":{"type":"string"}},
                "required":["command"]}}}],
    "tool_choice":"required","stream":false,"max_tokens":256}'
```

Response:

```json
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":"","tool_calls":[{"type":"function","function":{"name":"bash","arguments":"{\"command\":\"ls\\n\"}"},"id":"3dqIDG2aSB2W1rxHH0BlWGUIvemuX3zU"}]}}],"created":1776081588,"model":"gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf","system_fingerprint":"b8772-bafae2765","object":"chat.completion","usage":{"completion_tokens":18,"prompt_tokens":62,"total_tokens":80,"prompt_tokens_details":{"cached_tokens":0}},"id":"chatcmpl-44aNE36ExT6EQkIVUWQGKJTOXOQpwee6","timings":{"cache_n":0,"prompt_n":62,"prompt_ms":680.489,"prompt_per_token_ms":10.975629032258064,"prompt_per_second":91.11095109546223,"predicted_n":18,"predicted_ms":268.675,"predicted_per_token_ms":14.926388888888889,"predicted_per_second":66.99544058807109}}
```

Works perfectly. Same binary (`~/.unsloth/llama.cpp/llama-server`), same GGUF, same flags (`--jinja --chat-template-kwargs {"enable_thinking": true}`) — the only variable is whether the request goes through Studio.

## A couple of other things I noticed (might be clues, might be red herrings)

- `usage.prompt_tokens` / `completion_tokens` are always `0` in Studio responses — makes me think the proxy is reconstructing responses rather than passing the backend body through.
- Response `id` format differs between the two: random hex (`chatcmpl-422ff459db1f`) from Studio vs. an alphanumeric token (`chatcmpl-44aNE36ExT6EQkIVUWQGKJTOXOQpwee6`) from the backend.

Given those, I'm wondering if the Studio translation layer just isn't tool-aware yet — but I may well be missing a flag or config option that enables tool-call passthrough.

## Environment

- macOS 24.6.0 (arm64)
- `unsloth 2026.4.4`
- llama.cpp `b8772` (commit `bafae2765`), Unsloth prebuilt
- Model: `unsloth/gemma-4-26B-A4B-it-GGUF` (`UD-Q4_K_XL`)
- Started Studio with: `unsloth studio` (default flags), model loaded via the UI

## Questions

1. Is tool calling supposed to work through the Studio OpenAI-compatible endpoint today? If yes, is there something I need to enable?
2. If not yet supported, is hitting the backend `llama-server` directly the intended workaround, or is there a better path?
3. Any known gotchas with this specific Gemma GGUF + `--chat-template-kwargs {"enable_thinking": true}` that I should know about?

Thanks for any pointers 🙏


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tool calling not working through Studio — what am I missing? #4999

Tool calling not working through Studio — what am I missing?

What I'm seeing

What I've tried

Through Studio (`:8888`)

Directly against the `llama-server` Studio spawned (`:56571`)

A couple of other things I noticed (might be clues, might be red herrings)

Environment

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Tool calling not working through Studio — what am I missing? #4999

Description

Tool calling not working through Studio — what am I missing?

What I'm seeing

What I've tried

Through Studio (:8888)

Directly against the llama-server Studio spawned (:56571)

A couple of other things I noticed (might be clues, might be red herrings)

Environment

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Through Studio (`:8888`)

Directly against the `llama-server` Studio spawned (`:56571`)