Bug: Parquet export crashes with AttributeError on binary file handle in SystemTableExporter.dump_table

## Summary

During review of PR #725 (QA: Migrate type checker from `mypy` to `ty`), a potential runtime bug was identified in `cratedb_toolkit/cfr/systable.py`.

## Problem

In `SystemTableExporter.dump_table`, the parquet branch at line 154 calls:

```python
return frame.write_parquet(file and file.buffer)  # ty: ignore[invalid-argument-type]
```

When the parquet format is selected, the file is opened with mode `"wb"` (line 196 in `_save`), which yields an `io.BufferedWriter`. However, `io.BufferedWriter` does **not** have a `.buffer` attribute — only `io.TextIOWrapper` does (to expose the underlying binary buffer). Accessing `.buffer` on a binary file handle will raise an `AttributeError` at runtime.

Additionally, line 198 casts the file handle as `t.cast(t.TextIO, fh_data)`, which is incorrect — the actual object is a binary handle (`BinaryIO`), not a text handle.

The same `.buffer` access also exists in the NDJSON branch, though that path opens the file with mode `"w"` (text mode), so it works correctly there.

## Impact

Any call to `SystemTableExporter.save()` with `data_format="parquet"` or `data_format="pq"` will crash at runtime with an `AttributeError`.

## Root Cause

The parquet code path was likely written by analogy with the NDJSON path (which correctly accesses `.buffer` on a text handle), but the binary open mode for parquet means the handle already **is** the binary stream — no `.buffer` unwrapping is needed.

## Suggested Fix

```python
def dump_table(self, frame: "pl.DataFrame", file: t.Optional[t.IO[t.Any]] = None):
    if self.data_format == "csv":
        return frame.to_pandas().to_csv(file)
    elif self.data_format in ["jsonl", "ndjson"]:
        target = None if file is None else (file.buffer if hasattr(file, "buffer") else file)
        return frame.write_ndjson(target)
    elif self.data_format in ["parquet", "pq"]:
        target = None if file is None else (file.buffer if hasattr(file, "buffer") else file)
        return frame.write_parquet(target)
    else:
        raise NotImplementedError(f"Output format not implemented: {self.data_format}")
```

And the corresponding caller in `_save` should use `t.cast(t.BinaryIO, fh_data)` instead of `t.cast(t.TextIO, fh_data)` for the parquet branch.

## Additional Notes

- The test suite does not cover the parquet export code path (`test_cfr_sys_export_success` uses the default `jsonl` format), so this bug is not caught by existing tests.

## References

- PR: https://github.com/crate/cratedb-toolkit/pull/725
- Review comment: https://github.com/crate/cratedb-toolkit/pull/725#discussion_r2990648151
- Reported by: @amotl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Parquet export crashes with AttributeError on binary file handle in SystemTableExporter.dump_table #727

Summary

Problem

Impact

Root Cause

Suggested Fix

Additional Notes

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Parquet export crashes with AttributeError on binary file handle in SystemTableExporter.dump_table #727

Description

Summary

Problem

Impact

Root Cause

Suggested Fix

Additional Notes

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions