Restore per-transaction .csv.zst output via --full flag by russfellows · Pull Request #475 · minio/warp

russfellows · 2026-04-07T22:49:04Z

bench: restore per-transaction `.csv.zst` output via `--full` flag

Summary

Prior to commit 483f844, warp wrote a per-transaction CSV/TSV file after every
benchmark run, enabling fine-grained post-hoc analysis with warp analyze. That
commit introduced addCollector() but registered the --full flag only in
analyzeFlags — never in benchFlags — so the flag silently had no effect and
the per-transaction file was never written.

This PR restores the intended behavior with a minimal, targeted fix across three
files, adds unit tests, and updates documentation.

Problem

After 483f844, running any benchmark with --full:

warp put --host=... --full

produced only a .json.zst aggregate file. The .csv.zst per-transaction file was
never written, making post-hoc per-operation analysis impossible. The flag appeared
to be accepted but was completely ignored.

Changes

`cli/benchmark.go` — Core fix (single-node mode)

addCollector() now gates on ctx.Bool("full"):

Without --full (default): A LiveCollector only. Live throughput display and
auto-termination work as before. Output: .json.zst only.
With --full: Ops are fanned to both a NewOpsCollector (capturing every
individual request) and the LiveCollector (for live display). After the benchmark,
.csv.zst is written first, then .json.zst is also written. --full is strictly
additive — both files are always produced together.

A duplicate --full flag registration in benchFlags that caused a startup panic
(since --full already lives in analyzeFlags, which is combined with benchFlags
for all benchmark commands) has been removed.

`cli/benchserver.go` — Distributed mode mirror

With --full: Downloads raw ops → writes .csv.zst → downloads aggregate →
writes .json.zst.
Without --full: Downloads aggregate → writes .json.zst only.

`cli/analyze.go` — User-facing warning

warp analyze file.csv.zst without --full re-aggregates individual operations into
1-second buckets before reporting, producing results equivalent to reading the
.json.zst aggregate. This loses per-operation resolution — latency percentiles and
throughput figures can differ, and per-request filtering flags have no additional
effect.

A warning is now printed in this case:

WARNING: Analyzing .csv.zst without --full produces aggregated results (1-second
         buckets), not accurate per-operation statistics.
         For precise latency percentiles and throughput, use: warp analyze --full <file>

The warning is suppressed when --quiet or --json output is active.

Note: Passing --full to warp analyze on a .json.zst file is silently
ignored — aggregate files contain no individual operations.

`cli/addcollector_test.go` — New unit tests (7 tests)

Test	Validates
`TestAddCollectorDefaultNoOps`	Default mode stores no individual ops
`TestAddCollectorFullCollectsOps`	`--full` captures all operations
`TestAddCollectorFullFansOutToLive`	`--full` also feeds the LiveCollector
`TestAddCollectorDefaultUpdatesChannel`	Default mode updates channel is functional
`TestAddCollectorDiscardOutput`	DiscardOutput suppresses collection in both modes
`TestAddCollectorFullIsAdditive`	`--full` always produces both files
`TestAddCollectorOpValues`	Op field values are preserved through the collector

`README.md` — Documentation

Added an Analyzing Results subsection under Full Per-Transaction Logging
documenting:

Use warp analyze --full file.csv.zst for accurate per-operation stats
Default analyze on .csv.zst re-aggregates (same as .json.zst) and now warns
--full is silently ignored when analyzing .json.zst

Behavior Before / After

Scenario	Before this PR	After this PR
`warp put` (no flags)	`.json.zst` written	`.json.zst` written ✓
`warp put --full`	`.json.zst` written (bug)	`.csv.zst` and `.json.zst` written ✓
`warp analyze file.csv.zst`	Silent re-aggregation	Re-aggregation with warning ✓
`warp analyze --full file.csv.zst`	Full per-op analysis	Full per-op analysis ✓
`warp analyze --full file.json.zst`	Ignored silently	Ignored silently (noted in docs) ✓

Testing

Validated against a real MinIO S3 cluster. With --full:

$ ls -lh warp-put-*.{csv,json}.zst
-rw-r--r-- 1 user user 142K warp-put-2026-04-07[162451]-abcd.csv.zst
-rw-r--r-- 1 user user 1.2K warp-put-2026-04-07[162451]-abcd.json.zst

$ warp analyze warp-put-2026-04-07[162451]-abcd.csv.zst
WARNING: Analyzing .csv.zst without --full produces aggregated results...

$ warp analyze --full warp-put-2026-04-07[162451]-abcd.csv.zst
# → exact per-op percentiles, 4x1s throughput windows

All 7 unit tests pass: go test ./cli/ -run TestAddCollector -v

klauspost

Sure. It is overly dramatic, though - and lacks mentioning the main downside.

russfellows · 2026-04-16T15:47:43Z

Klaus, I didn’t want to presume to own the warning messages to your clients coming from your code. I am happy to craft the output messages, but being that you all own the product, I would think you would prefer to control the messages. I will run some analysis of the memory usage and time issues you cited. I don’t find them to be of much concern, but you seem to feel they are. So, in order to provide fact based messages, I will run some analysis. If you already have data points you would rather output, that is of course up to you. To be clear, this is just fixing a documented feature that was recently broken. A better solution all the way around would be to utilize histograms, such as HDR histograms to collect the metrics. That way you can combine the results from multiple clients accurately and maintain statistical accuracy of all data points such as mean, median, p90, p95, p99 values. The current 5 second averaging method you use by default loses statistical validity. In contrast, the detailed zst trace log files you can collect, do provide statistically valid results, as you well know, but at the expense of processing time at the end. Regards, —Russ

…

On Apr 16, 2026, at 8:47 AM, Klaus Post ***@***.***> wrote: @klauspost commented on this pull request. In README.md <#475 (comment)>: > +using the `--benchdata` parameter. + +## Full Per-Transaction Logging (`--full`) + +Adding `--full` to any benchmark command enables full per-transaction logging. With this +flag warp writes **both** output files: + +| File | Contents | +|------|----------| +| `<benchdata>.csv.zst` | Every individual request — one row per operation, [zstandard](https://facebook.github.io/zstd/)-compressed CSV/TSV | +| `<benchdata>.json.zst` | Aggregated summary (same as default) | + +```bash +warp put --host=... --full +``` + Then the issue just gets closed as not planned. If you want to take ownership of the fix, I am offering code review. That is it. — Reply to this email directly, view it on GitHub <#475 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF64UJ2D5KZTJFKY2GCYJUD4WDW6PAVCNFSM6AAAAACXQBCJPGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DCMRRHA2DONJQHA>. You are receiving this because you authored the thread.

Prior to commit 483f844, warp wrote a per-transaction CSV/TSV file after every benchmark run. That commit introduced addCollector() but registered --full only in analyzeFlags, never in benchFlags, so the flag had no effect and the per-transaction file was never written. This commit restores the intended behavior: * addCollector() now gates on ctx.Bool("full"). Without --full, only a LiveCollector (aggregating only) is used and .json.zst is the sole output — same default behavior as pre-483f844. * With --full, ops are fanned to both a NewOpsCollector (capturing every individual request) and the LiveCollector (live display and auto-termination). After the benchmark, .csv.zst is written first, then .json.zst is also written — so --full is strictly additive; both files are always produced together. * benchserver.go (distributed mode) mirrors this: with --full it downloads raw ops, writes .csv.zst, then downloads the aggregate and writes .json.zst. Without --full it writes .json.zst only. * The --full flag already exists in analyzeFlags, which is combined with benchFlags for all benchmark commands. A duplicate --full registration in benchFlags that caused a startup panic has been removed. * warp analyze now prints a warning when a .csv.zst is analyzed without --full: re-aggregating ops into 1-second buckets loses per-operation resolution, producing results equivalent to reading the .json.zst. Users are directed to add --full for precise latency percentiles and throughput figures. The warning is suppressed when --quiet or --json output is active. * README updated with an "Analyzing Results" subsection documenting the difference between default and --full analysis paths, and noting that --full is silently ignored on .json.zst files (which contain no individual operations). * 7 unit tests added in cli/addcollector_test.go covering: default mode stores no ops, --full collects all ops, --full fans out to LiveCollector, default mode updates channel is functional, DiscardOutput suppresses both modes, --full is additive (both files written), and op value preservation.

…ad docs - cli/analyze.go: soften warning from 'WARNING:' to 'Note:'; remove trailing newline from Println arg (fixes govet) - cli/addcollector_test.go: fix misspell 'behaviour' -> 'behavior'; remove redundant embedded field selector b.Common.Collector -> b.Collector (fixes staticcheck QF1008) - README.md: replace vague memory/perf caution with measured data — table of RSS overhead at three concurrency/duration levels (~310 B/op at low load, ~850 B/op at high concurrency), explanation of Go GC arena growth, and analysis time comparison (csv.zst --full is ~5x slower than json.zst; re-aggregating without --full is ~13x slower)

russfellows · 2026-04-19T18:30:19Z

Thanks for the feedback. Pushed an updated commit that:

Softens the analyze.go warning from WARNING: to Note: and shortens it to two lines (also fixes the trailing \n flagged by govet)
Fixes behaviour → behavior in the test file (misspell) and removes the redundant b.Common.Collector selector (staticcheck QF1008)
Replaces the vague memory note in the README with actual measurements from a local MinIO server — the overhead is negligible for short runs (+9 MB at 30s/8c) but grows to +124 MB at 60s/32c and projects to ~8 GB for a 1-hour high-concurrency run, so the caution is warranted at scale

klauspost

lgtm

Copilot

Pull request overview

Restores the intended --full behavior for benchmarks so per-transaction .csv.zst output is produced again (alongside the aggregate .json.zst), and adds a user-facing warning when analyzing .csv.zst without --full.

Changes:

Reworks collector wiring so --full collects per-op data while preserving live aggregation for UI/autoterm.
Mirrors --full behavior for remote benchmarks by downloading ops for .csv.zst and still writing .json.zst.
Adds an analyze-time warning for .csv.zst without --full, plus new unit tests and README updates.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`cli/benchmark.go`	Wires `--full` to collect ops + keep live updates; writes `.json.zst` alongside `.csv.zst` in full mode.
`cli/benchserver.go`	In remote mode, downloads ops for `.csv.zst` when `--full` and also writes the aggregate `.json.zst`.
`cli/analyze.go`	Emits a warning when analyzing `.csv.zst` without `--full` (unless quiet/JSON).
`cli/addcollector_test.go`	Adds unit tests covering default/full/discard-output collector behavior.
`README.md`	Documents `--full` outputs and analysis behavior differences for `.csv.zst` vs `.json.zst`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

russfellows mentioned this pull request Apr 7, 2026

bug: --full flag silently ignored during benchmark — per-transaction .csv.zst never written #476

Open

klauspost reviewed Apr 8, 2026

View reviewed changes

Comment thread cli/analyze.go

Comment thread README.md

russfellows added 2 commits April 19, 2026 11:33

russfellows force-pushed the fix/default-tsv-output branch from e6b62e1 to 8f5a66f Compare April 19, 2026 18:27

russfellows mentioned this pull request Apr 19, 2026

Feat/streaming op log: Implements streaming zst writer for tsv operation logs russfellows/warp-replay#5

Merged

klauspost approved these changes Apr 20, 2026

View reviewed changes

klauspost requested review from Copilot and harshavardhana April 20, 2026 11:40

Copilot started reviewing on behalf of klauspost April 20, 2026 11:40 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

Comment thread cli/benchmark.go

Comment thread cli/benchserver.go

Comment thread README.md

russfellows and others added 2 commits April 20, 2026 07:48

Update cli/benchmark.go

c352d5b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update cli/benchserver.go

47e1534

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore per-transaction .csv.zst output via --full flag#475

Restore per-transaction .csv.zst output via --full flag#475
russfellows wants to merge 4 commits intominio:masterfrom
russfellows:fix/default-tsv-output

russfellows commented Apr 7, 2026

Uh oh!

klauspost left a comment

Uh oh!

Uh oh!

Uh oh!

russfellows commented Apr 16, 2026 via email

Uh oh!

russfellows commented Apr 19, 2026

Uh oh!

klauspost left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

russfellows commented Apr 7, 2026

bench: restore per-transaction .csv.zst output via --full flag

Summary

Problem

Changes

cli/benchmark.go — Core fix (single-node mode)

cli/benchserver.go — Distributed mode mirror

cli/analyze.go — User-facing warning

cli/addcollector_test.go — New unit tests (7 tests)

README.md — Documentation

Behavior Before / After

Testing

Uh oh!

klauspost left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

russfellows commented Apr 16, 2026 via email

Uh oh!

russfellows commented Apr 19, 2026

Uh oh!

klauspost left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bench: restore per-transaction `.csv.zst` output via `--full` flag

`cli/benchmark.go` — Core fix (single-node mode)

`cli/benchserver.go` — Distributed mode mirror

`cli/analyze.go` — User-facing warning

`cli/addcollector_test.go` — New unit tests (7 tests)

`README.md` — Documentation