-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Critical bug fix: Extra sanity checking when marking offsets as processed #824
Critical bug fix: Extra sanity checking when marking offsets as processed #824
Conversation
Update: working on updating the specs to match. |
Fixed tests and added two new cases to verify the sanity check. |
ebfdeb8
to
e4348ff
Compare
Seems the Kafka 1.1 integration test failed, but the failure appears to be totally unrelated to this pull request. I suspect it's a fluke that would go away if the CircleCI workflow is simply re-run. |
I have an alternate solution that may be more performant that, instead of checking partition assignment every time def offsets_to_commit(recommit = false)
# Do not commit offsets for partitions not assigned to this consumer
@processed_offsets.each do |topic, offsets|
offsets.keep_if { |partition, _| @group.assigned_to?(topic, partition) }
end
if recommit
offsets_to_recommit.merge!(@processed_offsets) do |_topic, committed, processed|
committed.merge!(processed)
end
else
@processed_offsets
end
end Let me know if you'd prefer I commit that solution instead. All the tests pass with either implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rebalance event is detected via 2 main operations sent to Kafka server after a message is handled: heartbeat and offset committing calls.
If automatically_mark_as_processed
is true, after each_message or each_batch block is evaluated, the messages are marked as processed automatically, and then the consumer calls heartbeat and offset commiting. When a reblance event already occurs, the consumer clears all messages in the current buffer and resume consuming from new assigned partitions. Therefore, the consumer never stale messages as processed, and the sanitizing code doesn't affect this case.
If automatically_mark_as_processed
is false, ruby-kafka let the users decide when to commit offsets. If a rebalance event occurs while there are still messages in a buffer, sanitzing mark_message_as_processed
is reasonable to prevent race condition between stale and new offsets, but it leads to another issue: some messages in a buffer may be proccesed again at other consumers. For example:
Consumer 1:
Message 1 => Proccessed, mark as proccessed
Message 2 => Proccessed, mark as proccessed
Rebalance
Message 3 => Proccessed, marking as processed is ignore
Message 4 => Proccessed, marking as processed is ignore
Consumer 2:
Message 3 => Proccessed, mark as processed
Message 4 => Proccessed, mark as processed
Orignally, the purpose of turning off automatic marking as proccessed is to group store and maybe group relative messages in a temporarily buffer to process them together. This change may break this use case.
I would like to propose another approach, which solves your use issue, and also benefits the use case I mentioned above. Let's expose another API for consumer: Consumer#assigned_to?
or Consumer#assignments
to check the current assignement.
- In your use case, just call that API before mark as processed.
- In other use cases, the consumer may be interested in checking the current partition assignment to reject a whole batch in the buffer to maintain processing integrity, or just to detect rebalance at the application layer.
I see what you're saying, but unfortunately, your proposed approach does not in fact solve the use case you mentioned above. The reason is that after the rebalancing, the other consumer has already been reassigned to the partition containing messages still in the first consumer's buffer and will also process those messages regardless of whether or not the first consumer commits those messages at that time or not. Kafka inherently, by design, makes an "at least once" message processing guarantee. All correctly implemented consumer applications must – without exception – be written with the assumption in mind that messages may occasionally re-processed by another member of the consumer group. To rely upon offset committing as a sole means of preventing message re-processing is an error. If re-processing the same message could lead to negative consequences, as in our case as we process a stream of orders and insert them into a warehouse management system to be packed and shipped to customers, you must implement some idempotency mechanism that is not reliant upon Kafka and its consumer offset checkpointing. There is no alternative to that. The correct default behavior for Kafka-based systems is to err on the side of some possibility of message re-processing. While I agree that exposing The reason your proposed solution will not be sufficient for the use case you mentioned is as follows. I'm updating and annotating your example timeline:
The incorrectly marked-as-processed offsets for the partition it is not processing will remain in ruby-kafka's As both consumers continue to process messages, this will result in them both committing their offsets like this: Consumer 1 commits:
Consumer 2 commits:
Consumer 1 commits:
Consumer 2 commits:
Consumer 1 commits:
Consumer 2 commits:
Consumer 1 commits:
This results in a race condition where, if consumer 1 is the last to commit its offsets for partition 1 before a rebalance is triggered, all of the message processing progress made by consumer 2 since the previous rebalancing is discarded. At that point, instead of only the two messages originally buffered by consumer 1 before the rebalance being reprocessed, those 2 messages get reprocessed along with all of the messages processed by consumer 2 since the last rebalancing. I hope this is clear enough to demonstrate that there is no use case in which it is ever appropriate for a consumer to commit offsets for partitions that it is not currently assigned. |
All that being said, it would be valuable to expose But I maintain that if a member of a consumer group tries to call |
And since this is such a difficult thing to reason about, I strongly feel developers should be helped out here by being prevented from shooting themselves in the foot without realizing it. |
@nguyenquangminh0711 @dasch -- bumping this. Thoughts? I know this can be a pretty complicated topic to reason about. At Dollar Shave Club we're now running in production on our fork that incorporates this patch. |
Sorry for not getting back to this sooner. I've restarted the CI and will merge once they're green 👍 |
Awesome, thanks, @dasch! |
This pull request resolves an edge case, but one that represents a critical bug for us that has caused us a very significant amount of pain over the last couple years, but has also been quite elusive to nail down.
We have a system that needs to buffer a certain number of messages and insert them into another system all at once as a batch, in a SQL transaction.
To do this, inside the
each_message
loop, we add messages to an array until we've accumulated enough for a batch. Once the buffer is large enough, we atomically export the batch to the external system, callmark_message_as_processed
for each buffered message, and clear the buffer.The problem we've encountered several times now is that the consumer group may rebalance and end up with different partition assignments before the buffer is full. Once processing resumes after the rebalancing, the next time the buffer is cleared, it will call
mark_message_as_processed
with some messages from partitions it was consuming from before the rebalance and some partitions from after the rebalance.Because there's no sanity check in
mark_message_as_processed
to make sure the partition and offset provided actually correspond to the partitions currently assigned to the group member, it will happily record the offset for the partition it's not supposed to be processing.The result is that it will happily continue repeatedly re-committing that old, stale offset for the partition(s) it used to be consuming from along with all the new offsets as it continues processing.
This creates a race condition where two members of the consumer group constantly commit offsets for the same partition, back and forth. One member constantly committing the correct offset from its up-to-date processing of that partition, and the other constantly committing an old, stale offset from before the consumer-group rebalance.
If the consumer group is shut down or rebalances again, it's a race between the stale consumer and the active consumer which one will have the last word committing the offsets for the partition they're fighting over. This can result in the consumers suddenly and unpredictably re-processing potentially hundreds of thousands of messages, depending on how long the consumer group remained stable after the last partition assignment.
The fix in this pull request is just to add a simple sanity check to prevent marking offsets as processed for partitions that are not presently assigned to the current consumer group member.
I also added a sanity check to prevent overwriting a newer offset with an older one, in the event that the messages are processed out of order.
I was able to simplify our use case to the following code that demonstrates the issue. You have to start up several instances of this consumer, then start and stop an instance to trigger rebalancing of the consumer group several times until the random timing and partition assignments of the consumer group finally trigger the bug.