Skip to content

fix(guard): use UTF-8 safe string truncation in output preview logging#3690

Closed
lietblue wants to merge 1 commit intogithub:mainfrom
lietblue:fix/utf8-safe-output-preview
Closed

fix(guard): use UTF-8 safe string truncation in output preview logging#3690
lietblue wants to merge 1 commit intogithub:mainfrom
lietblue:fix/utf8-safe-output-preview

Conversation

@lietblue
Copy link
Copy Markdown

@lietblue lietblue commented Apr 13, 2026

Summary

  • label_response panics when the serialized output JSON contains multi-byte UTF-8 characters (CJK, emoji, etc.) and byte index 500 falls in the middle of a character
  • Replace &output_json[..500] with floor_char_boundary(500) at both occurrences (line 808 and line 939) to find the nearest valid char boundary
  • This crash poisons the WASM guard for the entire session — all subsequent MCP calls to that server fail with "unavailable after a previous trap"

Root cause

// Before — panics on CJK/emoji when byte 500 is mid-character:
let output_preview = if output_json.len() > 500 {
    &output_json[..500]  // ← byte-index slice
} else {
    &output_json
};

// After — safe truncation to nearest char boundary:
let preview_end = output_json.floor_char_boundary(500);
let output_preview = &output_json[..preview_end];

Evidence

Discovered via moeru-ai/airi PR triage workflow (run #24311673575). PR #1649 has a Chinese body — pull_request_read with method: "get" returns the PR body in the tool result, the tool result gets cloned into LabelResponseOutput, and the serialized JSON has CJK characters within the first 500 bytes.

From mcp-gateway.log:

  • Last successful log: "generated 1 labeled items" (lib.rs around L930)
  • The output_preview=... log line is absent — confirming crash between serialization and the preview log
  • Next log lines are dealloc calls (Go defer cleanup after WASM trap)
  • All subsequent MCP calls fail: "WASM guard 'github' is unavailable after a previous trap"

Test plan

  • cargo test --lib — 279 tests pass
  • build.sh — WASM compiles (295 KB)
  • Manual: trigger an agentic workflow against a PR with CJK content in the body — label_response should log a truncated preview without crashing

`label_response` panics when the serialized output JSON contains
multi-byte UTF-8 characters (CJK, emoji) and byte index 500 falls
mid-character. Replace `&output_json[..500]` with
`floor_char_boundary(500)` which finds the nearest char boundary.

This crash poisons the WASM guard for the entire session — all
subsequent MCP calls fail with "unavailable after a previous trap".

Discovered via moeru-ai/airi PR triage workflow (run #24311673575)
on a PR with Chinese text in the body.
@lpcox
Copy link
Copy Markdown
Collaborator

lpcox commented Apr 13, 2026

@lietblue thanks for this! though please file issues that we can work on rather than PRs. will close this PR and created a new issue describing the bug #3711

@lpcox lpcox closed this Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants