-
Notifications
You must be signed in to change notification settings - Fork 25.2k
DatafeedJobsIT.testRealtime_multipleStopCalls failure on CI because task doesn't exist #45518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
Pinging @elastic/ml-core |
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this issue
Sep 9, 2019
Investigating the test failure reported in elastic#45518 it appears that the datafeed task was not found during a tast state update. There are only two places where such an update is performed: when we set the state to `started` and when we set it to `stopping`. We handle `ResourceNotFoundException` in the latter but not in the former. Thus the test reveals a rare race condition where the datafeed gets requested to stop before we managed to update its state to `started`. I could not reproduce this scenario but it would be my best guess. This commit catches `ResourceNotFoundException` while updating the state to `started` and lets the task terminate smoothly. Closes elastic#45518
dimitris-athanasiou
added a commit
that referenced
this issue
Sep 10, 2019
Investigating the test failure reported in #45518 it appears that the datafeed task was not found during a tast state update. There are only two places where such an update is performed: when we set the state to `started` and when we set it to `stopping`. We handle `ResourceNotFoundException` in the latter but not in the former. Thus the test reveals a rare race condition where the datafeed gets requested to stop before we managed to update its state to `started`. I could not reproduce this scenario but it would be my best guess. This commit catches `ResourceNotFoundException` while updating the state to `started` and lets the task terminate smoothly. Closes #45518
dimitris-athanasiou
added a commit
to dimitris-athanasiou/elasticsearch
that referenced
this issue
Sep 10, 2019
…astic#46495) Investigating the test failure reported in elastic#45518 it appears that the datafeed task was not found during a tast state update. There are only two places where such an update is performed: when we set the state to `started` and when we set it to `stopping`. We handle `ResourceNotFoundException` in the latter but not in the former. Thus the test reveals a rare race condition where the datafeed gets requested to stop before we managed to update its state to `started`. I could not reproduce this scenario but it would be my best guess. This commit catches `ResourceNotFoundException` while updating the state to `started` and lets the task terminate smoothly. Closes elastic#45518 Backport of elastic#46495
dimitris-athanasiou
added a commit
that referenced
this issue
Sep 11, 2019
…6495) (#46542) Investigating the test failure reported in #45518 it appears that the datafeed task was not found during a tast state update. There are only two places where such an update is performed: when we set the state to `started` and when we set it to `stopping`. We handle `ResourceNotFoundException` in the latter but not in the former. Thus the test reveals a rare race condition where the datafeed gets requested to stop before we managed to update its state to `started`. I could not reproduce this scenario but it would be my best guess. This commit catches `ResourceNotFoundException` while updating the state to `started` and lets the task terminate smoothly. Closes #45518 Backport of #46495
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hit this on CI in an SLM PR that doesn't touch anything related to ML or tasks. Per build stats, this test has failed 7 times in the last 60 days, but only this time was this specific failure.
Build scan
Public Jenkins build
Reproduce line (does not reproduce locally):
Stack trace:
The text was updated successfully, but these errors were encountered: