Range predicates (e.g. value >= X) applied to DELTA_BINARY_PACKED columns
currently require decoding every value before filtering. The DELTA format's
per-miniblock headers (min_delta, bit_width) make it possible to compute a
conservative [lo, hi] range for an entire miniblock without decoding
individual values — if the predicate rejects the range, the miniblock can be
skipped in O(1).
Proposed addition: A new provided method Decoder::scan_filtered(num_values, out, predicate) on the Decoder trait. The default implementation ignores the
predicate and decodes everything (safe fallback for all encodings that don't
carry per-region metadata). DeltaBitPackDecoder overrides it to:
- Check
first_value as a point range [v, v] before entering the miniblock loop.
- Per miniblock, compute a conservative
[lo, hi] using last_value,
min_delta, bit_width, and the count of values in the miniblock.
- If the predicate rejects
[lo, hi], skip the miniblock (O(1) for bw=0;
decode-to-track-last_value for mid-stream bw>0 misses; bit-skip for terminal
bw>0 misses).
- If the predicate accepts, decode and emit the miniblock normally.
Returns (values_emitted, values_consumed).
Measured improvement:
- Wall-time scan on 1M-row monotone DELTA column: 1.96ms → 470µs (4.2x)
Note: scan_filtered is intentionally conservative — a range miss on a
miniblock is always safe (false positives are possible, false negatives are not).
Callers that need exact per-value filtering should apply a second pass on the
emitted values.
Range predicates (e.g.
value >= X) applied to DELTA_BINARY_PACKED columnscurrently require decoding every value before filtering. The DELTA format's
per-miniblock headers (
min_delta,bit_width) make it possible to compute aconservative
[lo, hi]range for an entire miniblock without decodingindividual values — if the predicate rejects the range, the miniblock can be
skipped in O(1).
Proposed addition: A new provided method
Decoder::scan_filtered(num_values, out, predicate)on theDecodertrait. The default implementation ignores thepredicate and decodes everything (safe fallback for all encodings that don't
carry per-region metadata).
DeltaBitPackDecoderoverrides it to:first_valueas a point range[v, v]before entering the miniblock loop.[lo, hi]usinglast_value,min_delta,bit_width, and the count of values in the miniblock.[lo, hi], skip the miniblock (O(1) for bw=0;decode-to-track-last_value for mid-stream bw>0 misses; bit-skip for terminal
bw>0 misses).
Returns
(values_emitted, values_consumed).Measured improvement:
Note:
scan_filteredis intentionally conservative — a range miss on aminiblock is always safe (false positives are possible, false negatives are not).
Callers that need exact per-value filtering should apply a second pass on the
emitted values.