Skip to content

[ML] Revise refresh policy on requests to the config index #38089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dimitris-athanasiou opened this issue Jan 31, 2019 · 2 comments
Closed

[ML] Revise refresh policy on requests to the config index #38089

dimitris-athanasiou opened this issue Jan 31, 2019 · 2 comments
Assignees
Labels
:ml Machine learning

Comments

@dimitris-athanasiou
Copy link
Contributor

Configurations are now stored in the config index. Throughout the code, requests to the config index have their refresh policy set to IMMEDIATE. This is problematic. When an action is performed that causes multiple refreshes to the index (e.g. deleting mutliple jobs/datafeeds), the refreshes are throttled and long delays (~10s) have been observed.

This is also probably the reason behind #30300.

Futhermore, requests that are not a result of a user action need not have a refresh. For example, when the established model memory is updated during runtime. Refreshing on those is delaying the job for no reason.

In order to avoid these delays, but also make sure single requests are responsive we need to:

  • Use WAIT_FOR for requests that are a result of a user action
  • Use NONE for internal requests
  • Reduce the refresh_interval to a value that is responsive without putting pressure on the index (e.g. 100ms)? Note the config index is a low volume index so it should be able to handle quick refreshing.
@dimitris-athanasiou dimitris-athanasiou added the :ml Machine learning label Jan 31, 2019
@dimitris-athanasiou dimitris-athanasiou self-assigned this Jan 31, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@dimitris-athanasiou
Copy link
Contributor Author

This was a false alarm. The delays I observed were caused by a sleep in the master operation of the job delete action. This was blocking the master node and thus the datafeed delete action was stalled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning
Projects
None yet
Development

No branches or pull requests

2 participants