Skip to content

feat(crons): Limit clock ticks to the slowest partition #58003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

evanpurkhiser
Copy link
Member

@evanpurkhiser evanpurkhiser commented Oct 12, 2023

Part of #55821

This function keeps a global clock driven by the monitor ingest topic to trigger the monitor tasks once per minute.

This change updates this function to track the slowest partition within the topic. This will help to avoid the clock being moved forward when a single partition has a large number of check-ins read out of it (in a backlog situation), causing check-ins to be marked missed since they were not read before the clock ticked.

This change does NOT yet use the slowest partition timestamp as the driver of the global clock, but simply logs the timestamp so we can validate in production that it is still accurately moving forward.

In a future PR I will switch this function to use the slowest partition timestamp as the reference_ts and add tests to validate this works

@evanpurkhiser evanpurkhiser requested a review from a team as a code owner October 12, 2023 19:21
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Oct 12, 2023
@codecov
Copy link

codecov bot commented Oct 12, 2023

Codecov Report

Merging #58003 (a0ea82d) into master (334dedf) will increase coverage by 0.00%.
Report is 3 commits behind head on master.
The diff coverage is 81.81%.

❗ Current head a0ea82d differs from pull request most recent head d9f8f22. Consider uploading reports for the commit d9f8f22 to get more accurate results

@@           Coverage Diff           @@
##           master   #58003   +/-   ##
=======================================
  Coverage   79.03%   79.03%           
=======================================
  Files        5130     5129    -1     
  Lines      223024   223059   +35     
  Branches    37559    37567    +8     
=======================================
+ Hits       176256   176304   +48     
+ Misses      41131    41120   -11     
+ Partials     5637     5635    -2     
Files Coverage Δ
src/sentry/monitors/consumers/monitor_consumer.py 85.65% <100.00%> (ø)
src/sentry/monitors/tasks.py 87.27% <75.00%> (-0.23%) ⬇️

... and 36 files with indirect coverage changes

@evanpurkhiser evanpurkhiser force-pushed the evanpurkhiser/feat-crons-limit-clock-ticks-to-the-slowest-partition branch from a0ea82d to d9f8f22 Compare October 12, 2023 20:37
This function keeps a global clock driven by the monitor ingest topic to
trigger the monitor tasks once per minute.

This change updates this function to track the slowest partition within
the topic. This will help to avoid the clock being moved forward when a
single partition has a large number of check-ins read out of it (in a
backlog situation), causing check-ins to be marked missed since they
were not read before the clock ticked.

This change does NOT yet use the slowest partition timestamp as the
driver of the global clock, but simply logs the timestamp so we can
validate in production that it is still accurately moving forward.
@evanpurkhiser evanpurkhiser force-pushed the evanpurkhiser/feat-crons-limit-clock-ticks-to-the-slowest-partition branch from d9f8f22 to bed3e48 Compare October 12, 2023 20:48
Comment on lines +121 to +126
slowest_partitions = redis_client.zrange(
name=MONITOR_TASKS_PARTITION_CLOCKS,
withscores=True,
start=0,
end=0,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol so weird that start/end both 0 is how you fetch one item

Copy link
Member Author

@evanpurkhiser evanpurkhiser Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah inclusive, starts with one and ends with one

@evanpurkhiser evanpurkhiser merged commit f33a3f7 into master Oct 12, 2023
@evanpurkhiser evanpurkhiser deleted the evanpurkhiser/feat-crons-limit-clock-ticks-to-the-slowest-partition branch October 12, 2023 21:45
evanpurkhiser added a commit that referenced this pull request Oct 12, 2023
This is a follow up after GH-58003.

The clock which dispatches tasks will now only tick forward once all
partitions have been read up to to the synchronized time
evanpurkhiser added a commit that referenced this pull request Oct 13, 2023
Part of #55821

This is a follow up after GH-58003.

The clock which dispatches tasks will now only tick forward once all
partitions have been read up to to the synchronized time
@github-actions github-actions bot locked and limited conversation to collaborators Oct 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants