Skip to content

Fix unreliable capture of standard/error outputs#184

Open
rudolf-adamkovic wants to merge 1 commit intonuprl:mainfrom
rudolf-adamkovic:fix-output-capture
Open

Fix unreliable capture of standard/error outputs#184
rudolf-adamkovic wants to merge 1 commit intonuprl:mainfrom
rudolf-adamkovic:fix-output-capture

Conversation

@rudolf-adamkovic
Copy link
Copy Markdown
Contributor

Fix potentially incomplete output capture when running unit tests, as seen with e.g. GNU Guile. Changes:

  • Drain standard output and error output pipes
  • Fix triply hard-coded stdout/stderr truncation
  • Increase stdout/stderr truncation from 2kB to 16kB

Fix potentially incomplete output capture when running unit tests, as
seen with e.g. GNU Guile.  Changes:

- Drain standard output and error output pipes
- Fix triply hard-coded stdout/stderr truncation
- Increase stdout/stderr truncation from 2kB to 16kB
@rudolf-adamkovic rudolf-adamkovic marked this pull request as ready for review March 26, 2026 14:16
Copy link
Copy Markdown
Member

@arjunguha arjunguha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The draining fix is right and we have that in other places.

Capturing the entire output can cause a lot of problems, especially when a program produces unbounded output.

@rudolf-adamkovic
Copy link
Copy Markdown
Contributor Author

rudolf-adamkovic commented Apr 16, 2026

Capturing the entire output can cause a lot of problems, especially when a program produces unbounded output.

Yes, a limit is necessary. I found 4kB insufficient and anything above 16kB problematic. In my measurements, across 15 languages (3 million completions generated), the 4kB limit caused minimal changes in pass@1 (Instruct & Base), but it made the output files less useful for deeper analyses. More specifically (Instruct, t = 0.2), the 4kB limit reclassified 759 (2.32%) out of 32,654 failing completions (χ2 = 47, p < 0.001, ϕ = 0.04, n = 32,654), 93.41% becoming identifier errors that truncation had obscured. Scheme (GNU Guile) was affected the most, as its errors include stack traces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants