WriteMessages in Async mode is blocking #350

sagarkrkv · 2019-09-09T05:10:50Z

Describe the bug

WriteMessages call should be non-blocking in async mode. But when there are connection issues with the Kafka broker, i.e when connections take too long establish or are slow, we observed that WriteMessages call that typically finishes in a microsecond, blocks for couple of minutes.

Kafka Version
Kafka 2.2

To Reproduce
It's hard to reproduce the exact scenario where this happens.

Expected behavior

WriteMessages call should drop messages to avoid degradation of system performance.

Additional context

We can either make the existing WriteMessages non-blocking, add a new config option to enable non-blocking mode, or add a new method to write messages in a non-blocking mode.

achille-roussel · 2019-09-09T05:31:34Z

@sagarkrkv you're correct, this could be better documented but once the internal queue is full the call blocks.

Dropping messages can be controversial, especially if it happens during a kafka outage. Backpressure can help control the amount of data loss that may occur, so it can be a preferable approach in some cases.

I'm not against changing the behavior, since it's not documented yet, I hope that no program depends on this implementation detail.

I'd love to get @stevevls's take as well on the topic as well.

In the mean time you can pass a context with a timeout to unblock the call.

sagarkrkv · 2019-09-09T18:50:07Z

Yes, we are currently adding a context with timeout.

But async and backpressure are a little contradictory, but I agree that this might be disruptive and unexpected change for users who have come to rely on this behavior. So if we want to minimize the disruption we can consider either

add a new config option to enable non-blocking mode or add a new method to write messages in a non-blocking mode.

stevevls · 2020-03-25T15:56:23Z

Coming late to this party. 😄 I think that if we add behavior to drop messages, it should definitely be guarded by a configuration flag that does not drop by default. Another configuration that we already have is the QueueCapacity setting that can be used to increase the amount of time until there is backpressure.

sagarkrkv · 2020-03-31T17:48:03Z

In a scenario where we have persistant connection issues with the Kafka cluster, or when the producer produces a large amount of messages due to a traffic spike, the QueueCapacity will not help a lot.

Is this something you're willing to accept a PR for? i.e add a config flag to make this non-blocking.

illotum · 2020-06-15T20:57:39Z

We encountered a similar issue in a different scenario. The Hash balancer combined with many hundreds partitions leads to the Writer.msgs bottlenecking due to the scheduler not keeping up draining it across all runners.

Frankly, you seem to be going for a very "proper" for Go design with nested runners and the hierarchy of channels, but it's counter productive in this particular case. AFAIU partitionWriters always belong to one and only one Writer, so running them in a go routine only adds scheduling overhead. I'd rather see Writer.run() append messages to batches directly, or call a corresponding partitionWriter func in sync. Very likely it's gonna be faster (if you dispatch the actual network write in a go routine).

Here's me raising the QueueCapacity to 60k, only to delay ineviatable:

illotum · 2021-01-25T04:13:01Z

Missed your v0.4 merge, sounds like this ticket can be closed. Is it resolved?

gibsn · 2021-03-17T18:18:57Z

Hi! Is this issue still present in the latest release?

illotum · 2021-03-17T18:37:15Z

@gibsn I verified, it no longer bottlenecks on the queue. Happily chugs along in my application.

achille-roussel · 2021-11-12T18:01:25Z

Thanks for following up everyone! Glad to hear 0.4 has been serving you well 👍

gibsn · 2022-04-20T11:26:37Z

@illotum I missed your reply 😮 thank you

sagarkrkv added the bug label Sep 9, 2019

achille-roussel self-assigned this Sep 9, 2019

achille-roussel added the question label Sep 9, 2019

achille-roussel mentioned this issue Jun 14, 2020

0.4: kafka.Writer #461

Merged

achille-roussel closed this as completed Nov 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WriteMessages in Async mode is blocking #350

WriteMessages in Async mode is blocking #350

sagarkrkv commented Sep 9, 2019 •

edited

Loading

achille-roussel commented Sep 9, 2019

sagarkrkv commented Sep 9, 2019

stevevls commented Mar 25, 2020

sagarkrkv commented Mar 31, 2020

illotum commented Jun 15, 2020 •

edited

Loading

illotum commented Jan 25, 2021

gibsn commented Mar 17, 2021

illotum commented Mar 17, 2021

achille-roussel commented Nov 12, 2021

gibsn commented Apr 20, 2022

WriteMessages in Async mode is blocking #350

WriteMessages in Async mode is blocking #350

Comments

sagarkrkv commented Sep 9, 2019 • edited Loading

achille-roussel commented Sep 9, 2019

sagarkrkv commented Sep 9, 2019

stevevls commented Mar 25, 2020

sagarkrkv commented Mar 31, 2020

illotum commented Jun 15, 2020 • edited Loading

illotum commented Jan 25, 2021

gibsn commented Mar 17, 2021

illotum commented Mar 17, 2021

achille-roussel commented Nov 12, 2021

gibsn commented Apr 20, 2022

sagarkrkv commented Sep 9, 2019 •

edited

Loading

illotum commented Jun 15, 2020 •

edited

Loading