return write error only when all retries have failed #452

abuchanan-nr · 2020-05-21T23:51:11Z

This is a potential fix for #451. It passes my quick-and-dirty local test.

This changes the code to only return an error (via the message "res" channel) when all write retries have failed.

This also moves the stat counters associated with a successful write (messages, bytes, and batchSize), so that they are not inflated by each call to write (i.e. each retry/attempt).

VibhorGupta1991 · 2020-06-03T17:58:30Z

This change needs to be pushed really quick. I have been struggling to find the cause but finally came across this [#451 ].
This is affecting only 0.3.6 version as of now. @abuchanan-nr your change [#382 ] was merged on 22nd Feb in this version.

achille-roussel · 2020-06-03T22:37:29Z

writer.go

+			res <- nil
+		}
+		for _, m := range batch {
+			w.stats.messages.observe(1)


Can we take this out of the loop? It seems like it could be w.stats.messages.observe(int64(len(batch)))

stevevls · 2020-06-03T22:47:09Z

Thanks for the fix @abuchanan-nr ! I traced through and agree with your assessment that there may not be a channel receiver on a second attempt.

I would prefer that we leave the meaning of the stats as-is, though. It's possible that someone out there is depending on them, so I'd prefer we don't change that and create a surprise for someone.

jnjackins · 2020-06-15T20:31:31Z

Given the nature of the breakage I believe we should revert #382 in the meantime -- see #462. Since there are low expectations for the reliability of async writes, we should give preference to sync writes which are meant to be reliable, and they are currently broken. Also, we should be sure to add more tests to cover these changes -- currently there is no test coverage for the partitionWriter abstraction, which is mocked over in tests (specifically the only test we have for retries).

If we do want to move forward with doing retries in the partitionWriter, I'd suggest we add a new interface which abstracts the write method, which can have a simple (non-retrying) implementation, and then we can add a retryWriter implementation which wraps the simple one. partitionWriters can be configured to use the retryWriter implementation normally, and in tests the simple writer can be faked and we can still exercise the retry functionality.

Note that this would mean there is no more use for a partitionWriter abstraction, which currently only exists so that it can be faked in tests.

achille-roussel · 2021-01-15T19:48:49Z

Considering we reverted the change that introduced the original regression, I believe we can close this PR. Feel free to reopen if it needs further discussion 👍

return write error only when all retries have failed

79a8fd0

achille-roussel reviewed Jun 3, 2020

View reviewed changes

achille-roussel mentioned this pull request Jun 15, 2020

0.4: kafka.Writer #461

Merged

achille-roussel closed this Jan 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

return write error only when all retries have failed #452

return write error only when all retries have failed #452

abuchanan-nr commented May 21, 2020

VibhorGupta1991 commented Jun 3, 2020

achille-roussel Jun 3, 2020

stevevls commented Jun 3, 2020

jnjackins commented Jun 15, 2020

achille-roussel commented Jan 15, 2021

return write error only when all retries have failed #452

return write error only when all retries have failed #452

Conversation

abuchanan-nr commented May 21, 2020

VibhorGupta1991 commented Jun 3, 2020

achille-roussel Jun 3, 2020

Choose a reason for hiding this comment

stevevls commented Jun 3, 2020

jnjackins commented Jun 15, 2020

achille-roussel commented Jan 15, 2021