Fix unreliable capture of standard/error outputs#184
Fix unreliable capture of standard/error outputs#184rudolf-adamkovic wants to merge 1 commit intonuprl:mainfrom
Conversation
Fix potentially incomplete output capture when running unit tests, as seen with e.g. GNU Guile. Changes: - Drain standard output and error output pipes - Fix triply hard-coded stdout/stderr truncation - Increase stdout/stderr truncation from 2kB to 16kB
arjunguha
left a comment
There was a problem hiding this comment.
The draining fix is right and we have that in other places.
Capturing the entire output can cause a lot of problems, especially when a program produces unbounded output.
Yes, a limit is necessary. I found 4kB insufficient and anything above 16kB problematic. In my measurements, across 15 languages (3 million completions generated), the 4kB limit caused minimal changes in pass@1 (Instruct & Base), but it made the output files less useful for deeper analyses. More specifically (Instruct, t = 0.2), the 4kB limit reclassified 759 (2.32%) out of 32,654 failing completions (χ2 = 47, p < 0.001, ϕ = 0.04, n = 32,654), 93.41% becoming identifier errors that truncation had obscured. Scheme (GNU Guile) was affected the most, as its errors include stack traces. |
Fix potentially incomplete output capture when running unit tests, as seen with e.g. GNU Guile. Changes: