Description
Decoding ColumnIndex assumes that page-aligned arrays (null_pages, min_values, max_values) have matching lengths. This assumption is not validated, leading to a panic when they are inconsistent.
Root Cause
In parquet/src/file/page_index/column_index.rs, decoding performs unchecked indexing:
let len = null_pages.len();
for (i, is_null) in null_pages.iter().enumerate().take(len) {
if !is_null {
let min = min_bytes[i];
let max = max_bytes[i];
...
}
}
Similarly for byte array indexes:
let min = min_values[i];
let max = max_values[i];
But there is no validation that:
min_values.len() == null_pages.len()
max_values.len() == null_pages.len()
Impact
- Panic (
index out of bounds) on malformed or corrupted metadata
- Inconsistent with expected behavior (should return
ParquetError)
- Affects robustness when handling external/untrusted parquet files
Reproduction
// Two pages are declared via null_pages
// But only ONE min/max entry is provided --> length mismatch
let column_index = ThriftColumnIndex {
null_pages: vec![false, false], // 2 pages
min_values: vec![&[1, 0, 0, 0]], // only 1 entry
max_values: vec![&[10, 0, 0, 0]], // only 1 entry
null_counts: None,
repetition_level_histograms: None,
definition_level_histograms: None,
boundary_order: BoundaryOrder::UNORDERED,
};
let _ = PrimitiveColumnIndex::<i32>::try_from_thrift(column_index);
Results in Panic:
index out of bounds: the len is 1 but the index is 1
Expected Behavior
Return a ParquetError when array lengths do not match the number of pages.
Proposed Fix
Validate lengths in:
PrimitiveColumnIndex::try_new
ByteArrayColumnIndex::try_new
before indexing into min_values / max_values.
Description
Decoding
ColumnIndexassumes that page-aligned arrays (null_pages,min_values,max_values) have matching lengths. This assumption is not validated, leading to a panic when they are inconsistent.Root Cause
In
parquet/src/file/page_index/column_index.rs, decoding performs unchecked indexing:Similarly for byte array indexes:
But there is no validation that:
Impact
index out of bounds) on malformed or corrupted metadataParquetError)Reproduction
Results in Panic:
Expected Behavior
Return a
ParquetErrorwhen array lengths do not match the number of pages.Proposed Fix
Validate lengths in:
PrimitiveColumnIndex::try_newByteArrayColumnIndex::try_newbefore indexing into
min_values/max_values.