-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[ML] Prevent node potentially going out of memory due to loading quantiles #70376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… avoid memory overhead. fixes elastic#70372
Pinging @elastic/ml-core (Team:ML) |
droberts195
approved these changes
Mar 15, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
hendrikmuhs
pushed a commit
to hendrikmuhs/elasticsearch
that referenced
this pull request
Mar 15, 2021
…tiles (elastic#70376) Large jobs with lots of partitions can get very big, retrieving snapshots for such a job can cause a node to go out of memory. With this change do not fetch quantiles when querying for (multiple) modelSnapshots to avoid memory overhead. Quantiles aren't needed for the API's using JobResultsProvider.modelSnapshots(...) fixes elastic#70372
This was referenced Mar 15, 2021
droberts195
added a commit
to droberts195/elasticsearch
that referenced
this pull request
Dec 18, 2023
As a followup to elastic#70376 this change further reduces the number of places where we fetch the `quantiles` field of model snapshot documents. The quantiles can be very large and can cause out-of-memory errors on small nodes, especially if more than one document containing quantiles is loaded into memory at one time. The method `JobManager.validateModelSnapshotIdUpdate` was a place where two model snapshot documents were being loaded simultaneously, both with their quantiles unnecessarily included. Following this change there should be no risk of that method causing an out-of-memory exception.
droberts195
added a commit
that referenced
this pull request
Dec 19, 2023
…103530) As a followup to #70376 this change further reduces the number of places where we fetch the `quantiles` field of model snapshot documents. The quantiles can be very large and can cause out-of-memory errors on small nodes, especially if more than one document containing quantiles is loaded into memory at one time. The method `JobManager.validateModelSnapshotIdUpdate` was a place where two model snapshot documents were being loaded simultaneously, both with their quantiles unnecessarily included. Following this change there should be no risk of that method causing an out-of-memory exception.
droberts195
added a commit
to droberts195/elasticsearch
that referenced
this pull request
Dec 19, 2023
…lastic#103530) As a followup to elastic#70376 this change further reduces the number of places where we fetch the `quantiles` field of model snapshot documents. The quantiles can be very large and can cause out-of-memory errors on small nodes, especially if more than one document containing quantiles is loaded into memory at one time. The method `JobManager.validateModelSnapshotIdUpdate` was a place where two model snapshot documents were being loaded simultaneously, both with their quantiles unnecessarily included. Following this change there should be no risk of that method causing an out-of-memory exception.
elasticsearchmachine
pushed a commit
that referenced
this pull request
Dec 19, 2023
…103530) (#103551) As a followup to #70376 this change further reduces the number of places where we fetch the `quantiles` field of model snapshot documents. The quantiles can be very large and can cause out-of-memory errors on small nodes, especially if more than one document containing quantiles is loaded into memory at one time. The method `JobManager.validateModelSnapshotIdUpdate` was a place where two model snapshot documents were being loaded simultaneously, both with their quantiles unnecessarily included. Following this change there should be no risk of that method causing an out-of-memory exception.
navarone-feekery
pushed a commit
to navarone-feekery/elasticsearch
that referenced
this pull request
Dec 22, 2023
…lastic#103530) As a followup to elastic#70376 this change further reduces the number of places where we fetch the `quantiles` field of model snapshot documents. The quantiles can be very large and can cause out-of-memory errors on small nodes, especially if more than one document containing quantiles is loaded into memory at one time. The method `JobManager.validateModelSnapshotIdUpdate` was a place where two model snapshot documents were being loaded simultaneously, both with their quantiles unnecessarily included. Following this change there should be no risk of that method causing an out-of-memory exception.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Large jobs with lots of partitions can get very big, retrieving snapshots
for such a job can cause a node to go out of memory.
With this change do not fetch quantiles when querying for (multiple)
modelSnapshots to avoid memory overhead. Quantiles aren't needed for
the API's using
JobResultsProvider.modelSnapshots(...)
fixes #70372