changefeedccl: log on kafka sink v2 internal retry#169374
changefeedccl: log on kafka sink v2 internal retry#169374aerfrei wants to merge 1 commit intocockroachdb:masterfrom
Conversation
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
rharding6373
left a comment
There was a problem hiding this comment.
I only have a bunch of nits about the log message, but don't want to do too much bike shedding so consider them optional. Take 'em or leave 'em.
| if err := k.client.ProduceSync(ctx, msgs...).FirstErr(); err != nil { | ||
| if k.shouldTryResizing(err, msgs) { | ||
| a, b := msgs[0:len(msgs)/2], msgs[len(msgs)/2:] | ||
| log.Changefeed.Infof(ctx, |
There was a problem hiding this comment.
Should this use Errorf?
There was a problem hiding this comment.
My sense is that we weren't logging before because, in some sense, having to resize is expected. I think that's still accurate and this communicates that this isn't necessarily cause for concern.
| if k.shouldTryResizing(err, msgs) { | ||
| a, b := msgs[0:len(msgs)/2], msgs[len(msgs)/2:] | ||
| log.Changefeed.Infof(ctx, | ||
| "kafka v2 sink reducing batch from %d to halves of %d and %d due to error: %v", |
There was a problem hiding this comment.
Is including both len(a) and len(b) helpful? Seems verbose to me.
There was a problem hiding this comment.
In the case I was looking at in cloud we saw a steady stream of these internal retries (they were happening in some kind of infinite loop presumably). I think this is a helpful proxy towards whether or not we're actually making progress. e.g. If the numbers are getting smaller, that would be a good sign.
Previously we saw spikes in changefeed internal retries without accompanying error logs. This change adds a log to the missing code path causing internal retries. This change should make sure we can tell that this error path was the cause for the internal retries. Epic: none Release note: None
ee7a991 to
cbdd181
Compare
Previously we saw spikes in changefeed internal
retries without accompanying error logs. This change adds a log to the missing code path causing internal retries.
This change should make sure we can tell that this error path was the cause for the internal retries.
Epic: none
Release note: None