Skip to content

Incorrect buffer skipping for V4 Union types in IPC skip_field #9828

@pchintar

Description

@pchintar

Description

When reading IPC data with column projection enabled, skipping a Union column encoded with V4 metadata can lead to buffer misalignment and incorrect decoding of subsequent columns.


Root Cause

In arrow-ipc/src/reader.rs, skip_field does not correctly handle the buffer layout of Union types for V4.

Current implementation:

Union(fields, mode) => {
    self.skip_buffer(); // Nulls

    match mode {
        UnionMode::Dense => self.skip_buffer(),
        UnionMode::Sparse => {}
    };

    ...
}

However, based on the V4 layout:

  • Union includes:

    • null buffer
    • type_ids buffer
    • (for dense) offsets buffer

And create_array correctly consumes:

if self.version < MetadataVersion::V5 {
    self.next_buffer()?; // null
}
let type_ids = self.next_buffer()?; // type_ids
// optionally offsets for dense

So the current skip_field logic does not skip type_ids and misinterprets buffer order


Impact

  • Can lead to:

    • incorrect decoding of subsequent columns
    • runtime errors (e.g., invalid buffer sizes)
  • Only occurs when:

    • projection is enabled
    • a Union column is skipped
    • IPC metadata version is V4

Reproduction

A minimal test case:

// Schema:
// union: Union<Int32> (skipped)
// values: Int32 (projected)

let options = IpcWriteOptions::try_new(8, false, MetadataVersion::V4)?;
let mut writer = FileWriter::try_new_with_options(..., options)?;

let reader = FileReader::try_new(cursor, Some(vec![1]))?;

Before fix:

InvalidArgumentError("Need at least 12 bytes in buffers[0] in array of type Int32, but got 1")

Proposed Fix

Update skip_field to match the actual buffer layout

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions