-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
feat(crons): Limit clock ticks to the slowest partition #58003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(crons): Limit clock ticks to the slowest partition #58003
Conversation
Codecov Report
@@ Coverage Diff @@
## master #58003 +/- ##
=======================================
Coverage 79.03% 79.03%
=======================================
Files 5130 5129 -1
Lines 223024 223059 +35
Branches 37559 37567 +8
=======================================
+ Hits 176256 176304 +48
+ Misses 41131 41120 -11
+ Partials 5637 5635 -2
|
a0ea82d
to
d9f8f22
Compare
This function keeps a global clock driven by the monitor ingest topic to trigger the monitor tasks once per minute. This change updates this function to track the slowest partition within the topic. This will help to avoid the clock being moved forward when a single partition has a large number of check-ins read out of it (in a backlog situation), causing check-ins to be marked missed since they were not read before the clock ticked. This change does NOT yet use the slowest partition timestamp as the driver of the global clock, but simply logs the timestamp so we can validate in production that it is still accurately moving forward.
d9f8f22
to
bed3e48
Compare
slowest_partitions = redis_client.zrange( | ||
name=MONITOR_TASKS_PARTITION_CLOCKS, | ||
withscores=True, | ||
start=0, | ||
end=0, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol so weird that start/end both 0 is how you fetch one item
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah inclusive, starts with one and ends with one
This is a follow up after GH-58003. The clock which dispatches tasks will now only tick forward once all partitions have been read up to to the synchronized time
Part of #55821
This function keeps a global clock driven by the monitor ingest topic to trigger the monitor tasks once per minute.
This change updates this function to track the slowest partition within the topic. This will help to avoid the clock being moved forward when a single partition has a large number of check-ins read out of it (in a backlog situation), causing check-ins to be marked missed since they were not read before the clock ticked.
This change does NOT yet use the slowest partition timestamp as the driver of the global clock, but simply logs the timestamp so we can validate in production that it is still accurately moving forward.
In a future PR I will switch this function to use the slowest partition timestamp as the
reference_ts
and add tests to validate this works