Skip to content

WASM guard panics on multi-byte UTF-8 in tool response preview, poisoning the entire session #3711

@lpcox

Description

@lpcox

Problem

The label_response function in the Rust WASM guard panics when a tool response contains multi-byte UTF-8 characters (CJK, emoji, accented characters, etc.) and the byte index 500 falls in the middle of a multi-byte code point.

The root cause is two instances of unsafe byte-index string slicing in lib.rs:

// lib.rs line 808 (path-specific output preview)
let output_preview = if output_json.len() > 500 {
    &output_json[..500]  // panics if byte 500 is mid-character
} else {
    &output_json
};

// lib.rs line 939 (general output preview)  
let output_preview = if output_json.len() > 500 {
    &output_json[..500]  // same panic
} else {
    &output_json
};

In Rust, indexing into a &str at a byte position that is not a character boundary panics with byte index N is not a char boundary. Since the slicing happens inside the WASM guest, the panic causes a WASM trap that permanently poisons the guard instance for the rest of the session.

Impact

Severity: High — this is a session-killing bug.

  1. Immediate: The label_response call panics and returns an error to the gateway
  2. Cascading: The WASM instance is trapped and cannot recover. Every subsequent MCP tool call routed through that guard fails with: "WASM guard 'github' is unavailable after a previous trap"
  3. Scope: Affects any workflow that processes content containing non-ASCII characters within the first ~500 bytes of a tool response JSON. This includes:
    • PRs/issues with CJK (Chinese, Japanese, Korean) text in title or body
    • Content with emoji in the first 500 bytes of serialized JSON
    • Any Unicode content where multi-byte sequences cross the 500-byte boundary

Evidence

Discovered in the moeru-ai/airi repository during a PR triage workflow (run #24311673575). PR #1649 has a Chinese-language body — pull_request_read with method: "get" returns the PR body in the tool result, and the serialized JSON has CJK characters within the first 500 bytes.

Log analysis from mcp-gateway.log:

  • Last successful log: "generated 1 labeled items" (lib.rs ~L930)
  • The output_preview=... log line is absent — confirming the crash occurs between JSON serialization and the preview log statement
  • Next log lines are dealloc calls (Go defer cleanup after WASM trap)
  • All subsequent MCP calls fail: "WASM guard 'github' is unavailable after a previous trap"

Additional note

There is a third byte-slicing pattern at line 752 that slices &[u8] (not &str) and uses from_utf8() which returns Result — this one does not panic, but it silently drops the preview log if the truncation splits a character. It is not a crash bug but could be improved for consistency.

let preview_len = std::cmp::min(500, input_bytes.len());
if let Ok(preview) = std::str::from_utf8(&input_bytes[..preview_len]) {
    // safe — from_utf8 returns Err instead of panicking
}

Fix

Replace byte-index slicing with str::floor_char_boundary(500) (stable since Rust 1.80) which finds the nearest valid character boundary at or before the given byte index:

let preview_end = output_json.floor_char_boundary(500);
let output_preview = &output_json[..preview_end];

See PR #3690 for the fix.

Reproduction

Any workflow that calls a GitHub MCP tool returning content with multi-byte UTF-8 characters positioned such that byte index 500 falls within a multi-byte sequence. Example: pull_request_read on a PR with a CJK body, or search_code returning source files with Unicode comments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions