Tool calling not working through Studio — what am I missing?
Hey team 👋 — I've been trying to get OpenAI-style function calling working with Unsloth Studio and I'm stuck. The same request works when I hit the underlying llama-server directly, but fails through Studio's proxy, so I suspect I'm configuring something wrong. Wanted to ask before assuming it's a bug.
What I'm seeing
When I send tools + tool_choice to POST :8888/v1/chat/completions, the model behaves as if it never saw them — its own chain-of-thought literally says "No tool definitions have been provided." finish_reason comes back as stop, never tool_calls, even with tool_choice: "required".
OpenAI clients that rely on function calling (opencode in my case) then end up surfacing the model's raw tool-call tokens as plain text, e.g.:
<think>...</think><|tool_call>call:bash { "command": "ls" }<tool_call|>
…which I think is because llama.cpp stays in "chat_format":"Content-only" when no tool schema is present in the prompt, so it doesn't parse the model's tool-call tokens into structured tool_calls.
What I've tried
Through Studio (:8888)
TOKEN=$(~/.unsloth/get-token.sh)
curl -s http://localhost:8888/v1/chat/completions \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{
"model":"unsloth/gemma-4-26B-A4B-it-GGUF",
"messages":[{"role":"user","content":"List files using the bash tool."}],
"tools":[{"type":"function","function":{"name":"bash",
"description":"run a shell command",
"parameters":{"type":"object",
"properties":{"command":{"type":"string"}},
"required":["command"]}}}],
"tool_choice":"required","stream":false,"max_tokens":256}'
Response:
{"id":"chatcmpl-422ff459db1f","object":"chat.completion","created":1776080902,"model":"unsloth/gemma-4-26B-A4B-it-GGUF","choices":[{"index":0,"message":{"role":"assistant","content":"<think>* User wants to \"List files using the bash tool.\"\n * The user is implying that I have access to a \"bash tool.\"\n * I need to check my system instructions regarding tool use.\n\n * \"Tool Use: You can use tools only if specific endpoints are provided to you in the context. You do not have access to Google Search or any other tools by default.\"\n * \"Do not attempt to use tools or search the internet unless a specific tool definition and endpoint has been explicitly provided in the prompt.\"\n\n * The user has *not* provided a tool definition or an endpoint for a bash tool in the current context.\n * Therefore, I cannot use a bash tool.\n\n * State clearly that I do not have access to a bash tool or any other tools in this context.\n * Explain that I can only use tools if they are explicitly provided with definitions and endpoints.</think>I do not have access to a bash tool or any other tools in this context. I can only use tools if specific definitions and endpoints are provided to me."},"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
The smoking gun is inside the model's own reasoning: "The user has not provided a tool definition or an endpoint for a bash tool in the current context." — i.e. the tool schema never reached the prompt.
Directly against the llama-server Studio spawned (:56571)
Same payload, different endpoint + model id:
curl -s http://localhost:56571/v1/chat/completions \
-H "Content-Type: application/json" -d '{
"model":"gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf",
"messages":[{"role":"user","content":"List files using the bash tool."}],
"tools":[{"type":"function","function":{"name":"bash",
"description":"run a shell command",
"parameters":{"type":"object",
"properties":{"command":{"type":"string"}},
"required":["command"]}}}],
"tool_choice":"required","stream":false,"max_tokens":256}'
Response:
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":"","tool_calls":[{"type":"function","function":{"name":"bash","arguments":"{\"command\":\"ls\\n\"}"},"id":"3dqIDG2aSB2W1rxHH0BlWGUIvemuX3zU"}]}}],"created":1776081588,"model":"gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf","system_fingerprint":"b8772-bafae2765","object":"chat.completion","usage":{"completion_tokens":18,"prompt_tokens":62,"total_tokens":80,"prompt_tokens_details":{"cached_tokens":0}},"id":"chatcmpl-44aNE36ExT6EQkIVUWQGKJTOXOQpwee6","timings":{"cache_n":0,"prompt_n":62,"prompt_ms":680.489,"prompt_per_token_ms":10.975629032258064,"prompt_per_second":91.11095109546223,"predicted_n":18,"predicted_ms":268.675,"predicted_per_token_ms":14.926388888888889,"predicted_per_second":66.99544058807109}}
Works perfectly. Same binary (~/.unsloth/llama.cpp/llama-server), same GGUF, same flags (--jinja --chat-template-kwargs {"enable_thinking": true}) — the only variable is whether the request goes through Studio.
A couple of other things I noticed (might be clues, might be red herrings)
usage.prompt_tokens / completion_tokens are always 0 in Studio responses — makes me think the proxy is reconstructing responses rather than passing the backend body through.
- Response
id format differs between the two: random hex (chatcmpl-422ff459db1f) from Studio vs. an alphanumeric token (chatcmpl-44aNE36ExT6EQkIVUWQGKJTOXOQpwee6) from the backend.
Given those, I'm wondering if the Studio translation layer just isn't tool-aware yet — but I may well be missing a flag or config option that enables tool-call passthrough.
Environment
- macOS 24.6.0 (arm64)
unsloth 2026.4.4
- llama.cpp
b8772 (commit bafae2765), Unsloth prebuilt
- Model:
unsloth/gemma-4-26B-A4B-it-GGUF (UD-Q4_K_XL)
- Started Studio with:
unsloth studio (default flags), model loaded via the UI
Questions
- Is tool calling supposed to work through the Studio OpenAI-compatible endpoint today? If yes, is there something I need to enable?
- If not yet supported, is hitting the backend
llama-server directly the intended workaround, or is there a better path?
- Any known gotchas with this specific Gemma GGUF +
--chat-template-kwargs {"enable_thinking": true} that I should know about?
Thanks for any pointers 🙏
Tool calling not working through Studio — what am I missing?
Hey team 👋 — I've been trying to get OpenAI-style function calling working with Unsloth Studio and I'm stuck. The same request works when I hit the underlying
llama-serverdirectly, but fails through Studio's proxy, so I suspect I'm configuring something wrong. Wanted to ask before assuming it's a bug.What I'm seeing
When I send
tools+tool_choicetoPOST :8888/v1/chat/completions, the model behaves as if it never saw them — its own chain-of-thought literally says "No tool definitions have been provided."finish_reasoncomes back asstop, nevertool_calls, even withtool_choice: "required".OpenAI clients that rely on function calling (opencode in my case) then end up surfacing the model's raw tool-call tokens as plain text, e.g.:
…which I think is because llama.cpp stays in
"chat_format":"Content-only"when no tool schema is present in the prompt, so it doesn't parse the model's tool-call tokens into structuredtool_calls.What I've tried
Through Studio (
:8888)Response:
{"id":"chatcmpl-422ff459db1f","object":"chat.completion","created":1776080902,"model":"unsloth/gemma-4-26B-A4B-it-GGUF","choices":[{"index":0,"message":{"role":"assistant","content":"<think>* User wants to \"List files using the bash tool.\"\n * The user is implying that I have access to a \"bash tool.\"\n * I need to check my system instructions regarding tool use.\n\n * \"Tool Use: You can use tools only if specific endpoints are provided to you in the context. You do not have access to Google Search or any other tools by default.\"\n * \"Do not attempt to use tools or search the internet unless a specific tool definition and endpoint has been explicitly provided in the prompt.\"\n\n * The user has *not* provided a tool definition or an endpoint for a bash tool in the current context.\n * Therefore, I cannot use a bash tool.\n\n * State clearly that I do not have access to a bash tool or any other tools in this context.\n * Explain that I can only use tools if they are explicitly provided with definitions and endpoints.</think>I do not have access to a bash tool or any other tools in this context. I can only use tools if specific definitions and endpoints are provided to me."},"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}The smoking gun is inside the model's own reasoning: "The user has not provided a tool definition or an endpoint for a bash tool in the current context." — i.e. the tool schema never reached the prompt.
Directly against the
llama-serverStudio spawned (:56571)Same payload, different endpoint + model id:
Response:
{"choices":[{"finish_reason":"tool_calls","index":0,"message":{"role":"assistant","content":"","tool_calls":[{"type":"function","function":{"name":"bash","arguments":"{\"command\":\"ls\\n\"}"},"id":"3dqIDG2aSB2W1rxHH0BlWGUIvemuX3zU"}]}}],"created":1776081588,"model":"gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf","system_fingerprint":"b8772-bafae2765","object":"chat.completion","usage":{"completion_tokens":18,"prompt_tokens":62,"total_tokens":80,"prompt_tokens_details":{"cached_tokens":0}},"id":"chatcmpl-44aNE36ExT6EQkIVUWQGKJTOXOQpwee6","timings":{"cache_n":0,"prompt_n":62,"prompt_ms":680.489,"prompt_per_token_ms":10.975629032258064,"prompt_per_second":91.11095109546223,"predicted_n":18,"predicted_ms":268.675,"predicted_per_token_ms":14.926388888888889,"predicted_per_second":66.99544058807109}}Works perfectly. Same binary (
~/.unsloth/llama.cpp/llama-server), same GGUF, same flags (--jinja --chat-template-kwargs {"enable_thinking": true}) — the only variable is whether the request goes through Studio.A couple of other things I noticed (might be clues, might be red herrings)
usage.prompt_tokens/completion_tokensare always0in Studio responses — makes me think the proxy is reconstructing responses rather than passing the backend body through.idformat differs between the two: random hex (chatcmpl-422ff459db1f) from Studio vs. an alphanumeric token (chatcmpl-44aNE36ExT6EQkIVUWQGKJTOXOQpwee6) from the backend.Given those, I'm wondering if the Studio translation layer just isn't tool-aware yet — but I may well be missing a flag or config option that enables tool-call passthrough.
Environment
unsloth 2026.4.4b8772(commitbafae2765), Unsloth prebuiltunsloth/gemma-4-26B-A4B-it-GGUF(UD-Q4_K_XL)unsloth studio(default flags), model loaded via the UIQuestions
llama-serverdirectly the intended workaround, or is there a better path?--chat-template-kwargs {"enable_thinking": true}that I should know about?Thanks for any pointers 🙏