ggml-rpc: fix 32-bit ARM (ILP32) serialization bugs#21828
Open
rovmo wants to merge 1 commit intoggml-org:masterfrom
Open
ggml-rpc: fix 32-bit ARM (ILP32) serialization bugs#21828rovmo wants to merge 1 commit intoggml-org:masterfrom
rovmo wants to merge 1 commit intoggml-org:masterfrom
Conversation
The RPC backend assumes size_t and pointers are 8 bytes in its wire protocol serialization. On ILP32 platforms (ARMv7, x86-32) where size_t is 4 bytes, this causes silent hangs, data corruption, and server crashes — making RPC completely non-functional. Four fixes: - send_msg/send_rpc_cmd: use uint64_t for the size header instead of size_t, matching the uint64_t the receiver reads via recv_msg - set_tensor: serialize offset as uint64_t (8 bytes) instead of size_t (4 bytes), and fix the subsequent data placement offset - serialize_graph: widen node pointers to uint64_t before memcpy instead of reading sizeof(uint64_t) bytes from a 4-byte pointer - serialize_tensor: zero-initialize rpc_tensor before populating fields to avoid uninitialized upper bytes in uint64_t fields assigned from narrower types Tested on Android ARMv7 (Termux, clang 21) with Qwen2.5-0.5B over RPC: previously hung on HELLO handshake, now loads and runs inference.
|
Hi @rovmo, thanks for your contribution! Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:
Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix RPC backend on 32-bit ARM (ILP32) platforms
Problem
The RPC backend is completely non-functional on 32-bit ARM (armv7, ILP32) platforms. Any attempt to use
--rpccauses either a silent hang, server-side "Expected HELLO command" errors,set_tensorout-of-bounds crashes, orgraph_computefailures with corrupted tensor IDs.Root cause: several places in the RPC serialization code assume
size_tand pointers are 8 bytes. On 32-bit platformssize_tis 4 bytes and pointers are 4 bytes, causing the wire protocol to produce undersized or misaligned data that the receiving side (which reads fixeduint64_tfields) cannot parse.Fixes
1.
send_msg/send_rpc_cmd: message size field is 4 bytes instead of 8send_msg()sendssizeof(msg_size)bytes for the size header, wheremsg_sizeissize_t(4 bytes on ILP32). The receiverrecv_msg()readsuint64_t(8 bytes). The receiver blocks forever waiting for the remaining 4 bytes, causing a silent hang on the very first RPC command (HELLO).2.
ggml_backend_rpc_buffer_set_tensor: offset field is 4 bytes instead of 8The
offsetparameter issize_tbut the wire format specifies 8 bytes. Thememcpycopies onlysizeof(offset)= 4 bytes, and the subsequent data copy starts at the wrong position. The server reads a garbage 64-bit offset and rejects with "out of buffer bounds".3.
serialize_graph: node pointer memcpy reads 8 bytes from a 4-byte pointermemcpy(dest, &cgraph->nodes[i], sizeof(uint64_t))reads 8 bytes starting at the address of a 4-byte pointer. The extra 4 bytes are adjacent memory, producing corrupted tensor IDs that fail with "failed to create graph node".4.
serialize_tensor: zero-initialize rpc_tensor structMove
memsetbefore the null check so all serializations start zeroed, preventing uninitialized upper bytes inuint64_tfields assigned from narrower types.Platforms affected
All 32-bit platforms where
sizeof(size_t) == 4:Testing
Tested on Android ARMv7 (Termux, clang 21) with Qwen2.5-0.5B-Instruct Q4_K_M:
llama-cli --rpchangs indefinitely with zero output