-
Notifications
You must be signed in to change notification settings - Fork 339
Compressed messages are yielded with incorrect offsets #505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The problem is that this breaks Kafka 0.11, which is the reason for the original bugfix: #457 The JVM Kafka client has a different algorithm for getting the correct offsets – could you look into that and see if you can port that algorithm over? |
This is because the messages were in 0.9 format in kafka. It's not just 0.11. Relative offset feature was introduced in 0.10: Cases when client will receive corrected offsets:
All other cases, compressed inner messages should have relative offset, with below attributes:
To resolve relative offsets:
You may check if |
@zmstone can you write a PR that implements this? |
I believe @klippx is working on it. |
We verified that with the fix in #506 the result of the test is > kafka.fetch_messages(topic: 'eu-live.kred.account-events', partition: 7, offset: 772033, max_bytes: 1024).map { |v| v.offset }
=> [772033] And > kafka.fetch_messages(topic: 'eu-live.kred.account-events', partition: 7, offset: 772034, max_bytes: 1024).map { |v| v.offset }
=> [772034, 772035, 772036, 772037, 772038, 772039, 772040, 772041, 772042, 772043] It should be noted that we are testing our scenario, we have no way of testing all possible real life scenarios (especially in production). |
I'd have to do at least one pre-release and ask contributors to deploy it and test. |
we are suffering from same issue after upgrading from we are running Anyway i can tests this patch and see if this fixes the issue, we are consuming billions of messages and will quickly see if this is resolved. Assuming that it is resolved next step we want to bump the message format to @dasch can you pls push a pre release to rubygems so we could test asap? |
@piavka I've just released v0.5.2.beta1 – can you test that and report back? I won't be able to test in a real-life workload until January. |
Should be fixed by #506. |
@dasch it's running stable with v0.5.2.beta2 for couple days already |
Great! |
If this is a bug report, please fill out the following:
master
of ruby-kafka.Steps to reproduce
See failing tests and investigation below
Summary
We are investigating a case which causes our consumer to crash and reconsuming it's whole partition. It is about Ruby Kafka not returning the correct offsets for the fetched batch of compressed messages.
This happens dozens of times on a daily basis.
We are using Phobos. Here you can see our debug level log output:
https://gist.github.com/klippx/65c4ef06eb0e01cbde6d0640e3176f1f
Analysis
772033 is not compacted:
772034 however, IS compacted:
So, offset 772034 contains a compacted message with 10 messages inside, as you can see above. But the offsets are not handled correctly by ruby-kafka, it should be 10 items ending with 772043:
Preparing an experiment
We got some hints from our kafka cluster team that our client may not handle compressed messages well. So, in order to see where things go wrong, I put some debug messages to try to find the issue.
In
message.rb
:In message_set.rb:
Running the experiment
After modifying Ruby Kafka as per above, I ran the fetch_message again and this is the result:
First, the working not compacted message:
Second, the compacted message:
Conclusions
What we can see here is that the code that was introduced in bugfix 42821e9 is the offender here. The messages are already correct in terms of offsets, but the bugfix code is RE-generating offsets which are, as a result, off by 10 causing our next fetch operation to crash.
We have added a test to expose the problem we are seeing.
We have verified that this works in v0.5.0
The text was updated successfully, but these errors were encountered: