Skip to content

Feature request: Lazy/streaming message iterator for large GRIB2 files #153

@onnimonni

Description

@onnimonni

Use Case

Building a DuckDB extension for weather data using grib-rs. When users query with LIMIT N, we want to avoid parsing the entire file.

// Current: parses entire file upfront (~4s for 147MB file)
let grib2 = grib::from_reader(reader)?;
for (msg_idx, submessage) in grib2.iter() {
    // decode values...
}

Even with streaming data extraction on our side, the from_reader() call parses all message metadata upfront. For a 147MB GFS file:

  • LIMIT 10: ~4.5s (dominated by initial parse)
  • Full scan (177M rows): ~3.5 minutes

Request

Lazy/incremental message parsing - only parse message N when accessed.

Possible API:

// Iterator that yields messages lazily
let iter = grib::lazy_iter(reader)?;
for submessage in iter.take(10) {
    // Only first 10 messages parsed
}

Or:

// Seek to specific message without parsing earlier ones
let reader = grib::LazyGrib2::new(file)?;
let msg = reader.get_message(5)?; // Only parses message 5

Context

  • Large GRIB files (GFS global forecasts: 100MB+)
  • Database integration needs early termination for LIMIT pushdown
  • Related to roadmap item "Efficient read from cloud sources such as S3" - cloud reads would also benefit from lazy parsing

Would this be feasible given GRIB2's structure?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions