Skip to content

Bug: Parquet export crashes with AttributeError on binary file handle in SystemTableExporter.dump_table #727

@coderabbitai

Description

@coderabbitai

Summary

During review of PR #725 (QA: Migrate type checker from mypy to ty), a potential runtime bug was identified in cratedb_toolkit/cfr/systable.py.

Problem

In SystemTableExporter.dump_table, the parquet branch at line 154 calls:

return frame.write_parquet(file and file.buffer)  # ty: ignore[invalid-argument-type]

When the parquet format is selected, the file is opened with mode "wb" (line 196 in _save), which yields an io.BufferedWriter. However, io.BufferedWriter does not have a .buffer attribute — only io.TextIOWrapper does (to expose the underlying binary buffer). Accessing .buffer on a binary file handle will raise an AttributeError at runtime.

Additionally, line 198 casts the file handle as t.cast(t.TextIO, fh_data), which is incorrect — the actual object is a binary handle (BinaryIO), not a text handle.

The same .buffer access also exists in the NDJSON branch, though that path opens the file with mode "w" (text mode), so it works correctly there.

Impact

Any call to SystemTableExporter.save() with data_format="parquet" or data_format="pq" will crash at runtime with an AttributeError.

Root Cause

The parquet code path was likely written by analogy with the NDJSON path (which correctly accesses .buffer on a text handle), but the binary open mode for parquet means the handle already is the binary stream — no .buffer unwrapping is needed.

Suggested Fix

def dump_table(self, frame: "pl.DataFrame", file: t.Optional[t.IO[t.Any]] = None):
    if self.data_format == "csv":
        return frame.to_pandas().to_csv(file)
    elif self.data_format in ["jsonl", "ndjson"]:
        target = None if file is None else (file.buffer if hasattr(file, "buffer") else file)
        return frame.write_ndjson(target)
    elif self.data_format in ["parquet", "pq"]:
        target = None if file is None else (file.buffer if hasattr(file, "buffer") else file)
        return frame.write_parquet(target)
    else:
        raise NotImplementedError(f"Output format not implemented: {self.data_format}")

And the corresponding caller in _save should use t.cast(t.BinaryIO, fh_data) instead of t.cast(t.TextIO, fh_data) for the parquet branch.

Additional Notes

  • The test suite does not cover the parquet export code path (test_cfr_sys_export_success uses the default jsonl format), so this bug is not caught by existing tests.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions