Use Case
Building a DuckDB extension for weather data using grib-rs. When users query with LIMIT N, we want to avoid parsing the entire file.
// Current: parses entire file upfront (~4s for 147MB file)
let grib2 = grib::from_reader(reader)?;
for (msg_idx, submessage) in grib2.iter() {
// decode values...
}
Even with streaming data extraction on our side, the from_reader() call parses all message metadata upfront. For a 147MB GFS file:
LIMIT 10: ~4.5s (dominated by initial parse)
- Full scan (177M rows): ~3.5 minutes
Request
Lazy/incremental message parsing - only parse message N when accessed.
Possible API:
// Iterator that yields messages lazily
let iter = grib::lazy_iter(reader)?;
for submessage in iter.take(10) {
// Only first 10 messages parsed
}
Or:
// Seek to specific message without parsing earlier ones
let reader = grib::LazyGrib2::new(file)?;
let msg = reader.get_message(5)?; // Only parses message 5
Context
- Large GRIB files (GFS global forecasts: 100MB+)
- Database integration needs early termination for
LIMIT pushdown
- Related to roadmap item "Efficient read from cloud sources such as S3" - cloud reads would also benefit from lazy parsing
Would this be feasible given GRIB2's structure?
Use Case
Building a DuckDB extension for weather data using
grib-rs. When users query withLIMIT N, we want to avoid parsing the entire file.Even with streaming data extraction on our side, the
from_reader()call parses all message metadata upfront. For a 147MB GFS file:LIMIT 10: ~4.5s (dominated by initial parse)Request
Lazy/incremental message parsing - only parse message N when accessed.
Possible API:
Or:
Context
LIMITpushdownWould this be feasible given GRIB2's structure?