-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading from version 0.5.5 to 0.7 created a pause of ~ 2.5 seconds between the time message is produced and the time it is consumed. #823
Comments
ReproduceProducer
Consumer 1 (min_bytes: 1, max_wait_time: 1)
Effect: Notice the diff in the output which is the latency in processing particular message.
Consumer 1 (min_bytes: 1, max_wait_time: 0.1)
Effect
Consumer 3 (min_bytes: 1, max_wait_time: 1), ruby-kafka code changed from
|
sleep 2 |
Effect:
D, [2020-04-24T11:08:09.218740 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:09.222593 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 8 from 127.0.0.1:9092
D, [2020-04-24T11:08:09.222770 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-24T11:08:09.222873 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 9 to 127.0.0.1:9092
D, [2020-04-24T11:08:09.222999 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 9 from 127.0.0.1:9092
diff: 0.1
D, [2020-04-24T11:08:09.320826 #43343] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:8222 as processed
D, [2020-04-24T11:08:09.320902 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:09.424535 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:09.527207 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:09.630089 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:09.734363 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:09.738358 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 9 from 127.0.0.1:9092
D, [2020-04-24T11:08:09.738516 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-24T11:08:09.738658 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 10 to 127.0.0.1:9092
D, [2020-04-24T11:08:09.738808 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 10 from 127.0.0.1:9092
diff: 0.11
D, [2020-04-24T11:08:09.837722 #43343] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:8223 as processed
D, [2020-04-24T11:08:09.838035 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:09.941652 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:10.042086 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:10.143646 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:10.242994 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 10 from 127.0.0.1:9092
D, [2020-04-24T11:08:10.243093 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-24T11:08:10.243203 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 11 to 127.0.0.1:9092
D, [2020-04-24T11:08:10.243425 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 11 from 127.0.0.1:9092
diff: 0.01
D, [2020-04-24T11:08:10.248061 #43343] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:8224 as processed
D, [2020-04-24T11:08:10.248143 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:10.352809 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:10.457203 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:10.561735 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:10.666170 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
D, [2020-04-24T11:08:10.755365 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Received response 11 from 127.0.0.1:9092
D, [2020-04-24T11:08:10.755508 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] Fetching batches
D, [2020-04-24T11:08:10.755690 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Sending fetch API request 12 to 127.0.0.1:9092
D, [2020-04-24T11:08:10.755849 #43343] DEBUG -- : [[my-consumer] {greetings: 0}:] [fetch] Waiting for response 12 from 127.0.0.1:9092
diff: 0.02
D, [2020-04-24T11:08:10.770807 #43343] DEBUG -- : [[my-consumer] {}:] Marking greetings/0:8225 as processed
D, [2020-04-24T11:08:10.770930 #43343] DEBUG -- : [[my-consumer] {}:] No batches to process
Summary
Changing the hardcoded sleep value manually is the only way to achieve low-latency for topics with low amount of writes.
Comment
It's very hard for me to say why this sleep
is there and what the consequences are of making it smaller. Please at least make it configurable.
I spent plenty of time playing with max_wait_time
only to discover that it does not change anything and in the worst case scenario latency was always much higher than I expected (~1.5s), only to realize that this hardcoded sleep
is what causes all the issues.
@paneq Thank you for adding great details! We also modified |
I’d be open to a PR that changes the sleep to use the |
@dasch Sounds good! May I assume that Ruby can do sleep of 100 ms or 10 ms? |
Yup, but you need to use a Float, e.g. |
@dasch we (Tesla vehicle backend services) are also experiencing this significant consumer lag and can help validate the fix works well. This is currently blocking a transition from rabbit to kafka, so please let me know if anything can be done to accelerate the upgrade. |
@renobeguh if you can validate that #825 works it would be great! |
@dasch we just validated the change. Using a max_wait_time of 0.01 we saw event processing lag drop from about 1500ms to 150ms. |
Awesome, thanks! |
@dasch do you have an estimate on when this change will be merged and published? We'd like to reference the new version, and are weighing hosting our own forked version temporarily. |
I think the PR can be merged and a new release cut – if there's a regression as mentioned we can get that fixed once we get a reproducible report. |
v1.1.0.beta1 has been released. @renobeguh can you upgrade to that in production and verify that there are no regressions? |
@dasch thank you we're upgrading and testing today |
Upgrading from version 0.5.5 to 0.7 created a pause of ~ 2.5 seconds between the time message is produced and the time it is consumed.
Note that this issue never happened before 0.7 upgrade and when we downgraded back to 0.5.5, the issue went away.
We would greatly appreciate any suggestions on what configuration settings on consumer or producer side we should try to tune.
If this is a bug report, please fill out the following:
Please verify that the problem you're seeing hasn't been fixed by the current
master
of ruby-kafka.I am talking to the team to see if we can actually do this.
Steps to reproduce
Expected outcome
No pause between the time when message is produced and then picked up by consumer.
Here is graph that shows the first block when
provider-requests-events
consumer finished message processing and produced a new message forstate-machine-events
consumer andstate-machine-events
consumer started processing it (second block). Note there is no gap between two blocks (this was before 0.7 upgrade):Actual outcome
~1 second pause between the time a message is produced and then picked up by consumer.
Note the gap between two blocks (after 0.7 upgrade):.
We are using
phobos
lib so these are our current consumer and producer settings:Thank you so much for looking into this!
The text was updated successfully, but these errors were encountered: