Skip to content

#270 Support periodic manual commits #275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions lib/logstash/inputs/kafka.rb
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ class LogStash::Inputs::Kafka < LogStash::Inputs::Base
# `key`: A ByteBuffer containing the message key
# `timestamp`: The timestamp of this message
config :decorate_events, :validate => :boolean, :default => false

config :manual_commit_interval_ms, :validate => :string

public
def register
Expand All @@ -221,6 +221,7 @@ def register

public
def run(logstash_queue)
@manual_commit_interval_ms = manual_commit_interval_ms.to_i
@runner_consumers = consumer_threads.times.map { |i| create_consumer("#{client_id}-#{i}") }
@runner_threads = @runner_consumers.map { |consumer| thread_runner(logstash_queue, consumer) }
@runner_threads.each { |t| t.join }
Expand All @@ -247,6 +248,7 @@ def thread_runner(logstash_queue, consumer)
else
consumer.subscribe(topics);
end
last_commit_time = timestamp_ms
codec_instance = @codec.clone
while !stop?
records = consumer.poll(poll_timeout_ms)
Expand All @@ -266,8 +268,9 @@ def thread_runner(logstash_queue, consumer)
end
end
# Manual offset commit
if @enable_auto_commit == "false"
if has_to_commit?(last_commit_time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by not committing anymore on all poll operations we now have two issues that must be addressed:

  1. if you receive and process some events before has_to_commit?returns true and then no other events arrive, we'll never commit the offset because we have a guard at the start of the loop to skip if no records are returned from poll.
  2. If events are processed but logstash is asked to terminate gracefully we don't commit the offset since the stop operation doesn't do it explicitly. Currently it relies on either commit per poll or auto commit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I only took in account my case in which I have a pretty stable flow.

consumer.commitSync
last_commit_time = timestamp_ms
end
end
rescue org.apache.kafka.common.errors.WakeupException => e
Expand Down Expand Up @@ -354,4 +357,16 @@ def set_sasl_config(props)

props.put("sasl.kerberos.service.name",sasl_kerberos_service_name) unless sasl_kerberos_service_name.nil?
end

def timestamp_ms
(Time.now.to_f * 1000).to_i
end

def has_to_commit?(last_commit_time)
# If auto_commit is enable we just leave the commit to the client library on poll and close actions
return false if @enable_auto_commit == "false"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be return false if @enable_auto_commit == "true" (we don't want to commit manually if auto commit is on)


# If auto_commit is disable, we need to commit, we will do it depending on the manual_commit_interval option
@manual_commit_interval_ms <= 0 || (last_commit_time + @manual_commit_interval_ms) < timestamp_ms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for clarity, we can change the big conditional into two operations:

  def has_to_commit?(last_commit_time)
    # auto_commit is enabled so we just leave the commit to the client library on poll and close actions
    return false if @enable_auto_commit == "true"

    # auto_commit is disabled but interval committing is disabled as well, so commit on every poll
    return true if @manual_commit_interval_ms <= 0

    # auto_commit is disabled and an interval is set, so let's check if enough time passed since last commit
    (last_commit_time + @manual_commit_interval_ms) < current_timestamp_ms
  end

end
end #class LogStash::Inputs::Kafka