Skip to content

Tool calling not working through Studio — what am I missing? #4999

@Ranhiru

Description

@Ranhiru

Tool calling not working through Studio — what am I missing?

Hey team 👋 — I've been trying to get OpenAI-style function calling working with Unsloth Studio and I'm stuck. The same request works when I hit the underlying llama-server directly, but fails through Studio's proxy, so I suspect I'm configuring something wrong. Wanted to ask before assuming it's a bug.

What I'm seeing

When I send tools + tool_choice to POST :8888/v1/chat/completions, the model behaves as if it never saw them — its own chain-of-thought literally says "No tool definitions have been provided." finish_reason comes back as stop, never tool_calls, even with tool_choice: "required".

OpenAI clients that rely on function calling (opencode in my case) then end up surfacing the model's raw tool-call tokens as plain text, e.g.:

<think>...</think><|tool_call>call:bash { "command": "ls" }<tool_call|>

…which I think is because llama.cpp stays in "chat_format":"Content-only" when no tool schema is present in the prompt, so it doesn't parse the model's tool-call tokens into structured tool_calls.

What I've tried

Through Studio (:8888)

TOKEN=$(~/.unsloth/get-token.sh)
curl -s http://localhost:8888/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{
    "model":"unsloth/gemma-4-26B-A4B-it-GGUF",
    "messages":[{"role":"user","content":"List files using the bash tool."}],
    "tools":[{"type":"function","function":{"name":"bash",
              "description":"run a shell command",
              "parameters":{"type":"object",
                "properties":{"command":{"type":"string"}},
                "required":["command"]}}}],
    "tool_choice":"required","stream":false,"max_tokens":256}'

Response:

{"id":"chatcmpl-422ff459db1f","object":"chat.completion","created":1776080902,"model":"unsloth/gemma-4-26B-A4B-it-GGUF","choices":[{"index":0,"message":{"role":"assistant","content":"<think>*   User wants to \"List files using the bash tool.\"\n    *   The user is implying that I have access to a \"bash tool.\"\n    *   I need to check my system instructions regarding tool use.\n\n    *   \"Tool Use: You can use tools only if specific endpoints are provided to you in the context. You do not have access to Google Search or any other tools by default.\"\n    *   \"Do not attempt to use tools or search the internet unless a specific tool definition and endpoint has been explicitly provided in the prompt.\"\n\n    *   The user has *not* provided a tool definition or an endpoint for a bash tool in the current context.\n    *   Therefore, I cannot use a bash tool.\n\n    *   State clearly that I do not have access to a bash tool or any other tools in this context.\n    *   Explain that I can only use tools if they are explicitly provided with definitions and endpoints.</think>I do not have access to a bash tool or any other tools in this context. I can only use tools if specific definitions and endpoints are provided to me."},"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

The smoking gun is inside the model's own reasoning: "The user has not provided a tool definition or an endpoint for a bash tool in the current context." — i.e. the tool schema never reached the prompt.

Directly against the llama-server Studio spawned (:56571)

Same payload, different endpoint + model id:

curl -s http://localhost:56571/v1/chat/completions \
  -H "Content-Type: application/json" -d '{
    "model":"gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf",
    "messages":[{"role":"user","content":"List files using the bash tool."}],
    "tools":[{"type":"function","function":{"name":"bash",
              "description":"run a shell command",
              "parameters":{"type":"object",
                "properties":{"command":{"type":"string"}},
                "required":["command"]}}}],
    "tool_choice":"required","stream":false,"max_tokens":256}'

Response:

{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":"","tool_calls":[{"type":"function","function":{"name":"bash","arguments":"{\"command\":\"ls\\n\"}"},"id":"3dqIDG2aSB2W1rxHH0BlWGUIvemuX3zU"}]}}],"created":1776081588,"model":"gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf","system_fingerprint":"b8772-bafae2765","object":"chat.completion","usage":{"completion_tokens":18,"prompt_tokens":62,"total_tokens":80,"prompt_tokens_details":{"cached_tokens":0}},"id":"chatcmpl-44aNE36ExT6EQkIVUWQGKJTOXOQpwee6","timings":{"cache_n":0,"prompt_n":62,"prompt_ms":680.489,"prompt_per_token_ms":10.975629032258064,"prompt_per_second":91.11095109546223,"predicted_n":18,"predicted_ms":268.675,"predicted_per_token_ms":14.926388888888889,"predicted_per_second":66.99544058807109}}

Works perfectly. Same binary (~/.unsloth/llama.cpp/llama-server), same GGUF, same flags (--jinja --chat-template-kwargs {"enable_thinking": true}) — the only variable is whether the request goes through Studio.

A couple of other things I noticed (might be clues, might be red herrings)

  • usage.prompt_tokens / completion_tokens are always 0 in Studio responses — makes me think the proxy is reconstructing responses rather than passing the backend body through.
  • Response id format differs between the two: random hex (chatcmpl-422ff459db1f) from Studio vs. an alphanumeric token (chatcmpl-44aNE36ExT6EQkIVUWQGKJTOXOQpwee6) from the backend.

Given those, I'm wondering if the Studio translation layer just isn't tool-aware yet — but I may well be missing a flag or config option that enables tool-call passthrough.

Environment

  • macOS 24.6.0 (arm64)
  • unsloth 2026.4.4
  • llama.cpp b8772 (commit bafae2765), Unsloth prebuilt
  • Model: unsloth/gemma-4-26B-A4B-it-GGUF (UD-Q4_K_XL)
  • Started Studio with: unsloth studio (default flags), model loaded via the UI

Questions

  1. Is tool calling supposed to work through the Studio OpenAI-compatible endpoint today? If yes, is there something I need to enable?
  2. If not yet supported, is hitting the backend llama-server directly the intended workaround, or is there a better path?
  3. Any known gotchas with this specific Gemma GGUF + --chat-template-kwargs {"enable_thinking": true} that I should know about?

Thanks for any pointers 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestFeature request pending on roadmap

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions