Summary
During review of PR #725 (QA: Migrate type checker from mypy to ty), a potential runtime bug was identified in cratedb_toolkit/cfr/systable.py.
Problem
In SystemTableExporter.dump_table, the parquet branch at line 154 calls:
return frame.write_parquet(file and file.buffer) # ty: ignore[invalid-argument-type]
When the parquet format is selected, the file is opened with mode "wb" (line 196 in _save), which yields an io.BufferedWriter. However, io.BufferedWriter does not have a .buffer attribute — only io.TextIOWrapper does (to expose the underlying binary buffer). Accessing .buffer on a binary file handle will raise an AttributeError at runtime.
Additionally, line 198 casts the file handle as t.cast(t.TextIO, fh_data), which is incorrect — the actual object is a binary handle (BinaryIO), not a text handle.
The same .buffer access also exists in the NDJSON branch, though that path opens the file with mode "w" (text mode), so it works correctly there.
Impact
Any call to SystemTableExporter.save() with data_format="parquet" or data_format="pq" will crash at runtime with an AttributeError.
Root Cause
The parquet code path was likely written by analogy with the NDJSON path (which correctly accesses .buffer on a text handle), but the binary open mode for parquet means the handle already is the binary stream — no .buffer unwrapping is needed.
Suggested Fix
def dump_table(self, frame: "pl.DataFrame", file: t.Optional[t.IO[t.Any]] = None):
if self.data_format == "csv":
return frame.to_pandas().to_csv(file)
elif self.data_format in ["jsonl", "ndjson"]:
target = None if file is None else (file.buffer if hasattr(file, "buffer") else file)
return frame.write_ndjson(target)
elif self.data_format in ["parquet", "pq"]:
target = None if file is None else (file.buffer if hasattr(file, "buffer") else file)
return frame.write_parquet(target)
else:
raise NotImplementedError(f"Output format not implemented: {self.data_format}")
And the corresponding caller in _save should use t.cast(t.BinaryIO, fh_data) instead of t.cast(t.TextIO, fh_data) for the parquet branch.
Additional Notes
- The test suite does not cover the parquet export code path (
test_cfr_sys_export_success uses the default jsonl format), so this bug is not caught by existing tests.
References
Summary
During review of PR #725 (QA: Migrate type checker from
mypytoty), a potential runtime bug was identified incratedb_toolkit/cfr/systable.py.Problem
In
SystemTableExporter.dump_table, the parquet branch at line 154 calls:When the parquet format is selected, the file is opened with mode
"wb"(line 196 in_save), which yields anio.BufferedWriter. However,io.BufferedWriterdoes not have a.bufferattribute — onlyio.TextIOWrapperdoes (to expose the underlying binary buffer). Accessing.bufferon a binary file handle will raise anAttributeErrorat runtime.Additionally, line 198 casts the file handle as
t.cast(t.TextIO, fh_data), which is incorrect — the actual object is a binary handle (BinaryIO), not a text handle.The same
.bufferaccess also exists in the NDJSON branch, though that path opens the file with mode"w"(text mode), so it works correctly there.Impact
Any call to
SystemTableExporter.save()withdata_format="parquet"ordata_format="pq"will crash at runtime with anAttributeError.Root Cause
The parquet code path was likely written by analogy with the NDJSON path (which correctly accesses
.bufferon a text handle), but the binary open mode for parquet means the handle already is the binary stream — no.bufferunwrapping is needed.Suggested Fix
And the corresponding caller in
_saveshould uset.cast(t.BinaryIO, fh_data)instead oft.cast(t.TextIO, fh_data)for the parquet branch.Additional Notes
test_cfr_sys_export_successuses the defaultjsonlformat), so this bug is not caught by existing tests.References
mypytoty#725mypytoty#725 (comment)