Skip to content

Fix job scheduling for same scheduled time #64598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 4, 2020

Conversation

jaymode
Copy link
Member

@jaymode jaymode commented Nov 4, 2020

The SchedulerEngine used by SLM uses a custom runnable that will
schedule itself for its next execution if there is one to run. For the
majority of jobs, this scheduling could be many hours or days away. Due
to the scheduling so far in advance, there is a chance that time drifts
on the machine or even that time varies core to core so there is no
guarantee that the job actually runs on or after the scheduled time.
This can cause some jobs to reschedule themselves for the same
scheduled time even if they ran only a millisecond prior to the
scheduled time, which causes unexpected actions to be taken such as
what appears as duplicated snapshots.

This change resolves this by checking the triggered time against the
scheduled time and using the appropriate value to ensure that we do
not have unexpected job runs.

Relates #63754
Backport of #64501

The SchedulerEngine used by SLM uses a custom runnable that will
schedule itself for its next execution if there is one to run. For the
majority of jobs, this scheduling could be many hours or days away. Due
to the scheduling so far in advance, there is a chance that time drifts
on the machine or even that time varies core to core so there is no
guarantee that the job actually runs on or after the scheduled time.
This can cause some jobs to reschedule themselves for the same
scheduled time even if they ran only a millisecond prior to the
scheduled time, which causes unexpected actions to be taken such as
what appears as duplicated snapshots.

This change resolves this by checking the triggered time against the
scheduled time and using the appropriate value to ensure that we do
not have unexpected job runs.

Relates elastic#63754
Backport of elastic#64501
@jaymode jaymode merged commit 4c3300b into elastic:7.10 Nov 4, 2020
@jaymode jaymode deleted the job_rescheduling_time_710 branch November 4, 2020 17:15
@andreidan andreidan added v7.10.0 and removed v7.10.1 labels Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants