-
Notifications
You must be signed in to change notification settings - Fork 82
Make default target_bulk_bytes below AWS limit, but make it configurable #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,20 @@ module APIConfigs | |
# this defaults to a concatenation of the path parameter and "_bulk" | ||
:bulk_path => { :validate => :string }, | ||
|
||
# Maximum number of bytes in bulk requests | ||
# The criteria for deciding the default value of target_bulk_bytes is: | ||
# 1. We need a number that's less than 10MiB because OpenSearch is commonly | ||
# configured (particular in AWS Opensearch Service) to not accept | ||
# bulks larger than that. | ||
# 2. It must be large enough to amortize the connection constant | ||
# across multiple requests. | ||
# 3. It must be small enough that even if multiple threads hit this size | ||
# we won't use a lot of heap. | ||
:target_bulk_bytes => { | ||
:validate => :number, | ||
:default => 9 * 1024 * 1024 # 9MiB | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not keep default as 20MiB for backward compatible and allow AWS OpenSearch Service users to set value? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because it doesn't work for many users giving problems that are hard to resolve. I think it's better if the plugin works out of the box. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changing default value will break backwards compatibility. Correct me if I am wrong, but if a user is sending bulk requests to a self managed cluster configured with Instead of changing default, maybe a better solution would be to update this documentation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not at all. The only drawback is a slightly lower maximum throughput. But with this fix a user can tweak to their hearts content. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. Thanks for the correction. Still, I feel this may violate backwards compatibility. If a user is sending bulk requests to a self managed cluster configured with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One problem i could see is that we might be flushing bulk request frequent than before for size > 9 and < 20. Let's say, my total request size is 18 MB, with this change it will send two request than one, and proposal is to configure the value to say 20MB @msvticket can you correct if my understanding is wrong? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's correct. My opinion is that it is more important to prevent errors than preserving probably not that common cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the only thing you're arguing about is whether to increment a minor version for this change, or a major version. Sounds like the default was poorly chosen. If it causes bugs, then it's a bug fix, and changing the value of the default in a minor version is OK. Otherwise do the change in a major version increment. My take for this case is that it's ok to do it in a minor version increment. Since it's configurable, you are preserving backwards compatibility. Behavior may change, but it's not broken. You will surprise some people who didn't read the release notes. |
||
}, | ||
|
||
# Pass a set of key value pairs as the URL query string. This query string is added | ||
# to every host listed in the 'hosts' configuration. If the 'hosts' list contains | ||
# urls that already have query strings, the one specified here will be appended. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we are here, does specifying timeout = nil make the request not timeout? For another PR/workitem I'm used to seeing
if params.key?(:timeout)
and support all values with predictable behavior.