Diagnose log injection smoke test flakiness instead of masking it#11075
Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 8 commits intomasterfrom Apr 23, 2026
Merged
Conversation
The `check raw file injection` test has been flaking across 11+ logging backend variants for months. CI Visibility data shows 90% of failures are `traceCount=0` at `waitForTraceCount(2)` after exactly 30s — the JVM + agent bytecode instrumentation simply takes >30s on overloaded CI machines. Changes: - Add `startupPoll` with 120s timeout for the initial `waitForTraceCount(2)` that covers JVM startup + agent init, giving 4x headroom over the current 30s `defaultPoll` - Add `waitForTraceCountAlive` that checks process liveness on each poll iteration, turning silent 30-120s timeouts into instant, actionable errors when the process crashes - Reorder `waitForTraceCount(4)` before `waitFor` to confirm all traces are delivered while the process is still alive - Assert `waitFor` return value for a clear error if the process hangs tag: no release note Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 59 metrics, 12 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~cad96739d1, baseline=1.62.0-SNAPSHOT~44874e2a03
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.057 s) : 0, 1056991
Total [baseline] (11.06 s) : 0, 11059703
Agent [candidate] (1.065 s) : 0, 1065098
Total [candidate] (11.094 s) : 0, 11093730
section appsec
Agent [baseline] (1.262 s) : 0, 1261525
Total [baseline] (11.005 s) : 0, 11004923
Agent [candidate] (1.256 s) : 0, 1256435
Total [candidate] (10.874 s) : 0, 10873665
section iast
Agent [baseline] (1.239 s) : 0, 1239286
Total [baseline] (11.403 s) : 0, 11403160
Agent [candidate] (1.251 s) : 0, 1251383
Total [candidate] (11.232 s) : 0, 11232365
section profiling
Agent [baseline] (1.181 s) : 0, 1181288
Total [baseline] (10.971 s) : 0, 10971441
Agent [candidate] (1.187 s) : 0, 1186786
Total [candidate] (11.001 s) : 0, 11001408
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~cad96739d1, baseline=1.62.0-SNAPSHOT~44874e2a03
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.22 ms) : 0, 1220
crashtracking [candidate] (1.223 ms) : 0, 1223
BytebuddyAgent [baseline] (631.523 ms) : 0, 631523
BytebuddyAgent [candidate] (637.088 ms) : 0, 637088
AgentMeter [baseline] (29.505 ms) : 0, 29505
AgentMeter [candidate] (29.798 ms) : 0, 29798
GlobalTracer [baseline] (248.982 ms) : 0, 248982
GlobalTracer [candidate] (250.458 ms) : 0, 250458
AppSec [baseline] (32.394 ms) : 0, 32394
AppSec [candidate] (32.51 ms) : 0, 32510
Debugger [baseline] (59.691 ms) : 0, 59691
Debugger [candidate] (60.024 ms) : 0, 60024
Remote Config [baseline] (583.434 µs) : 0, 583
Remote Config [candidate] (588.847 µs) : 0, 589
Telemetry [baseline] (8.788 ms) : 0, 8788
Telemetry [candidate] (8.097 ms) : 0, 8097
Flare Poller [baseline] (8.241 ms) : 0, 8241
Flare Poller [candidate] (9.148 ms) : 0, 9148
section appsec
crashtracking [baseline] (1.22 ms) : 0, 1220
crashtracking [candidate] (1.215 ms) : 0, 1215
BytebuddyAgent [baseline] (673.952 ms) : 0, 673952
BytebuddyAgent [candidate] (671.725 ms) : 0, 671725
AgentMeter [baseline] (12.143 ms) : 0, 12143
AgentMeter [candidate] (12.148 ms) : 0, 12148
GlobalTracer [baseline] (248.719 ms) : 0, 248719
GlobalTracer [candidate] (248.148 ms) : 0, 248148
AppSec [baseline] (185.836 ms) : 0, 185836
AppSec [candidate] (184.596 ms) : 0, 184596
Debugger [baseline] (66.84 ms) : 0, 66840
Debugger [candidate] (66.375 ms) : 0, 66375
Remote Config [baseline] (576.912 µs) : 0, 577
Remote Config [candidate] (580.742 µs) : 0, 581
Telemetry [baseline] (7.96 ms) : 0, 7960
Telemetry [candidate] (7.827 ms) : 0, 7827
Flare Poller [baseline] (3.496 ms) : 0, 3496
Flare Poller [candidate] (3.457 ms) : 0, 3457
IAST [baseline] (24.495 ms) : 0, 24495
IAST [candidate] (24.166 ms) : 0, 24166
section iast
crashtracking [baseline] (1.227 ms) : 0, 1227
crashtracking [candidate] (1.23 ms) : 0, 1230
BytebuddyAgent [baseline] (813.359 ms) : 0, 813359
BytebuddyAgent [candidate] (822.911 ms) : 0, 822911
AgentMeter [baseline] (11.53 ms) : 0, 11530
AgentMeter [candidate] (11.88 ms) : 0, 11880
GlobalTracer [baseline] (241.255 ms) : 0, 241255
GlobalTracer [candidate] (242.271 ms) : 0, 242271
AppSec [baseline] (30.583 ms) : 0, 30583
AppSec [candidate] (28.974 ms) : 0, 28974
Debugger [baseline] (64.929 ms) : 0, 64929
Debugger [candidate] (66.833 ms) : 0, 66833
Remote Config [baseline] (538.884 µs) : 0, 539
Remote Config [candidate] (540.984 µs) : 0, 541
Telemetry [baseline] (7.849 ms) : 0, 7849
Telemetry [candidate] (7.873 ms) : 0, 7873
Flare Poller [baseline] (3.522 ms) : 0, 3522
Flare Poller [candidate] (3.462 ms) : 0, 3462
IAST [baseline] (28.367 ms) : 0, 28367
IAST [candidate] (29.218 ms) : 0, 29218
section profiling
crashtracking [baseline] (1.192 ms) : 0, 1192
crashtracking [candidate] (1.201 ms) : 0, 1201
BytebuddyAgent [baseline] (689.422 ms) : 0, 689422
BytebuddyAgent [candidate] (692.479 ms) : 0, 692479
AgentMeter [baseline] (9.006 ms) : 0, 9006
AgentMeter [candidate] (8.991 ms) : 0, 8991
GlobalTracer [baseline] (207.339 ms) : 0, 207339
GlobalTracer [candidate] (207.537 ms) : 0, 207537
AppSec [baseline] (32.507 ms) : 0, 32507
AppSec [candidate] (32.655 ms) : 0, 32655
Debugger [baseline] (64.882 ms) : 0, 64882
Debugger [candidate] (65.705 ms) : 0, 65705
Remote Config [baseline] (574.921 µs) : 0, 575
Remote Config [candidate] (582.4 µs) : 0, 582
Telemetry [baseline] (8.544 ms) : 0, 8544
Telemetry [candidate] (7.938 ms) : 0, 7938
Flare Poller [baseline] (3.44 ms) : 0, 3440
Flare Poller [candidate] (3.519 ms) : 0, 3519
ProfilingAgent [baseline] (93.331 ms) : 0, 93331
ProfilingAgent [candidate] (94.362 ms) : 0, 94362
Profiling [baseline] (93.888 ms) : 0, 93888
Profiling [candidate] (94.938 ms) : 0, 94938
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~cad96739d1, baseline=1.62.0-SNAPSHOT~44874e2a03
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.057 s) : 0, 1057063
Total [baseline] (8.829 s) : 0, 8828652
Agent [candidate] (1.052 s) : 0, 1051741
Total [candidate] (8.826 s) : 0, 8825899
section iast
Agent [baseline] (1.226 s) : 0, 1226070
Total [baseline] (9.551 s) : 0, 9550559
Agent [candidate] (1.229 s) : 0, 1228943
Total [candidate] (9.504 s) : 0, 9504194
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~cad96739d1, baseline=1.62.0-SNAPSHOT~44874e2a03
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.228 ms) : 0, 1228
crashtracking [candidate] (1.214 ms) : 0, 1214
BytebuddyAgent [baseline] (634.251 ms) : 0, 634251
BytebuddyAgent [candidate] (631.268 ms) : 0, 631268
AgentMeter [baseline] (29.563 ms) : 0, 29563
AgentMeter [candidate] (29.429 ms) : 0, 29429
GlobalTracer [baseline] (248.988 ms) : 0, 248988
GlobalTracer [candidate] (248.194 ms) : 0, 248194
AppSec [baseline] (32.515 ms) : 0, 32515
AppSec [candidate] (32.372 ms) : 0, 32372
Debugger [baseline] (59.051 ms) : 0, 59051
Debugger [candidate] (58.799 ms) : 0, 58799
Remote Config [baseline] (599.852 µs) : 0, 600
Remote Config [candidate] (581.742 µs) : 0, 582
Telemetry [baseline] (8.755 ms) : 0, 8755
Telemetry [candidate] (8.758 ms) : 0, 8758
Flare Poller [baseline] (5.891 ms) : 0, 5891
Flare Poller [candidate] (5.061 ms) : 0, 5061
section iast
crashtracking [baseline] (1.213 ms) : 0, 1213
crashtracking [candidate] (1.212 ms) : 0, 1212
BytebuddyAgent [baseline] (805.604 ms) : 0, 805604
BytebuddyAgent [candidate] (807.579 ms) : 0, 807579
AgentMeter [baseline] (11.368 ms) : 0, 11368
AgentMeter [candidate] (11.4 ms) : 0, 11400
GlobalTracer [baseline] (237.915 ms) : 0, 237915
GlobalTracer [candidate] (238.419 ms) : 0, 238419
AppSec [baseline] (26.795 ms) : 0, 26795
AppSec [candidate] (27.349 ms) : 0, 27349
Debugger [baseline] (64.883 ms) : 0, 64883
Debugger [candidate] (65.102 ms) : 0, 65102
Remote Config [baseline] (533.65 µs) : 0, 534
Remote Config [candidate] (540.029 µs) : 0, 540
Telemetry [baseline] (7.728 ms) : 0, 7728
Telemetry [candidate] (7.83 ms) : 0, 7830
Flare Poller [baseline] (3.446 ms) : 0, 3446
Flare Poller [candidate] (3.468 ms) : 0, 3468
IAST [baseline] (30.732 ms) : 0, 30732
IAST [candidate] (30.1 ms) : 0, 30100
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 20 metrics, 16 unstable metrics. Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~cad96739d1, baseline=1.62.0-SNAPSHOT~44874e2a03
dateFormat X
axisFormat %s
section baseline
no_agent (19.33 ms) : 19129, 19531
. : milestone, 19330,
appsec (18.445 ms) : 18260, 18630
. : milestone, 18445,
code_origins (17.788 ms) : 17610, 17965
. : milestone, 17788,
iast (17.931 ms) : 17753, 18109
. : milestone, 17931,
profiling (18.058 ms) : 17881, 18235
. : milestone, 18058,
tracing (17.559 ms) : 17387, 17731
. : milestone, 17559,
section candidate
no_agent (18.315 ms) : 18125, 18504
. : milestone, 18315,
appsec (19.097 ms) : 18904, 19289
. : milestone, 19097,
code_origins (17.789 ms) : 17614, 17964
. : milestone, 17789,
iast (17.728 ms) : 17553, 17903
. : milestone, 17728,
profiling (18.027 ms) : 17846, 18207
. : milestone, 18027,
tracing (17.81 ms) : 17636, 17984
. : milestone, 17810,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~cad96739d1, baseline=1.62.0-SNAPSHOT~44874e2a03
dateFormat X
axisFormat %s
section baseline
no_agent (1.244 ms) : 1232, 1256
. : milestone, 1244,
iast (3.291 ms) : 3246, 3337
. : milestone, 3291,
iast_FULL (6.038 ms) : 5977, 6100
. : milestone, 6038,
iast_GLOBAL (3.709 ms) : 3648, 3770
. : milestone, 3709,
profiling (2.212 ms) : 2191, 2232
. : milestone, 2212,
tracing (1.856 ms) : 1841, 1872
. : milestone, 1856,
section candidate
no_agent (1.248 ms) : 1235, 1260
. : milestone, 1248,
iast (3.256 ms) : 3213, 3299
. : milestone, 3256,
iast_FULL (6.168 ms) : 6104, 6232
. : milestone, 6168,
iast_GLOBAL (3.7 ms) : 3637, 3763
. : milestone, 3700,
profiling (2.096 ms) : 2078, 2115
. : milestone, 2096,
tracing (1.945 ms) : 1928, 1961
. : milestone, 1945,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~cad96739d1, baseline=1.62.0-SNAPSHOT~44874e2a03
dateFormat X
axisFormat %s
section baseline
no_agent (1.493 ms) : 1481, 1504
. : milestone, 1493,
appsec (3.782 ms) : 3561, 4003
. : milestone, 3782,
iast (2.278 ms) : 2208, 2348
. : milestone, 2278,
iast_GLOBAL (2.322 ms) : 2252, 2392
. : milestone, 2322,
profiling (2.101 ms) : 2046, 2157
. : milestone, 2101,
tracing (2.089 ms) : 2035, 2143
. : milestone, 2089,
section candidate
no_agent (1.487 ms) : 1475, 1498
. : milestone, 1487,
appsec (3.829 ms) : 3605, 4053
. : milestone, 3829,
iast (2.281 ms) : 2210, 2351
. : milestone, 2281,
iast_GLOBAL (2.325 ms) : 2255, 2396
. : milestone, 2325,
profiling (2.108 ms) : 2052, 2163
. : milestone, 2108,
tracing (2.093 ms) : 2039, 2147
. : milestone, 2093,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~cad96739d1, baseline=1.62.0-SNAPSHOT~44874e2a03
dateFormat X
axisFormat %s
section baseline
no_agent (15.604 s) : 15604000, 15604000
. : milestone, 15604000,
appsec (14.559 s) : 14559000, 14559000
. : milestone, 14559000,
iast (18.428 s) : 18428000, 18428000
. : milestone, 18428000,
iast_GLOBAL (17.771 s) : 17771000, 17771000
. : milestone, 17771000,
profiling (15.621 s) : 15621000, 15621000
. : milestone, 15621000,
tracing (15.004 s) : 15004000, 15004000
. : milestone, 15004000,
section candidate
no_agent (15.036 s) : 15036000, 15036000
. : milestone, 15036000,
appsec (14.879 s) : 14879000, 14879000
. : milestone, 14879000,
iast (18.72 s) : 18720000, 18720000
. : milestone, 18720000,
iast_GLOBAL (17.769 s) : 17769000, 17769000
. : milestone, 17769000,
profiling (14.8 s) : 14800000, 14800000
. : milestone, 14800000,
tracing (15.026 s) : 15026000, 15026000
. : milestone, 15026000,
|
The `check raw file injection` test flakes across 11+ logging backend variants. CI Visibility data shows the failure is bimodal — successful runs complete in 3-9s, but failures sit at exactly 30s (the PollingConditions timeout) with traceCount=0. Nothing in between. This means the process either works or is totally broken — no amount of timeout increase will help. The current test is blind during the 30s wait — it just polls traceCount with no diagnostics when the process crashes or hangs. Changes: - Add `waitForTraceCountAlive` that checks process liveness on every poll iteration. If the process dies, it fails immediately with the exit code, RC poll count, and last 20 lines of process output. - On timeout, enrich the error with diagnostic state (process alive?, traceCount, RC polls received, last 30 lines of output) so the next CI failure tells us whether it's a crash, a hang, or a connectivity issue. - Reorder `waitForTraceCount(4)` before `waitFor` to confirm all traces are delivered while the process is still alive. - Assert `waitFor` return value for a clear error if the process hangs. tag: no release notes Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The liveness check fired before the trace count check, so a normal process exit after delivering all traces was treated as a failure. Check traceCount >= count first and return early if satisfied. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
PollingConditions.eventually only retries AssertionError. The liveness check was throwing AssertionError, so a dead process still waited the full 30s timeout. Switch to RuntimeException so it propagates immediately. Also narrow the catch from Throwable to AssertionError. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mhlidd
approved these changes
Apr 15, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
51759df
into
master
751 of 754 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What Does This Do
Adds diagnostic instrumentation to the
check raw file injectionsmoke test so the next CI failure tells us the root cause instead of a bare "Condition not satisfied after 30s" withtraceCount=0.Changes to
LogInjectionSmokeTest:waitForTraceCountAlive— checks process liveness on every poll iteration; if the process dies, fails immediately with exit code + last 20 lines of process outputwaitForTraceCount(4)beforewaitFor+ assertwaitForreturn valueMotivation
CI Visibility data for the last 30 days on master shows 10 failures of
check raw file injection:traceCount=0atwaitForTraceCount(2)logLines.size()=3atassertRawLogLinesWithInjectionThe failure distribution is bimodal — successful runs complete in 3.5-8.7s (80 data points, zero above 9s), while failures sit at exactly 30.3s. There is nothing in between. This means the process either works or is totally broken — a timeout increase would just delay the same failure.
The current test is blind during the wait — it just polls
traceCountin a loop. We don't know if the process crashed, hung during agent init, failed to connect to the test server, or something else entirely. This PR makes the next failure self-diagnosing.Example output when process crashes:
Example output on timeout (process alive but not sending traces):
Additional Notes
LogInjectionSmokeTest.groovyis changeddefaultPollis kept as-isrcClientMessages.size()tells us whether the agent connected to the test server at all (RC polls hit/v0.7/configevery 200ms)Contributor Checklist
type:and (comp:orinst:) labelsclose,fix, or any linking keywords when referencing an issuetag: no release notes
tag: ai generated
🤖 Generated with Claude Code