Skip to content

Make default target_bulk_bytes below AWS limit, but make it configurable #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 28, 2021

Conversation

msvticket
Copy link
Contributor

@msvticket msvticket commented Sep 14, 2021

Description

This fix prevents http status 413

Fixes old bug carrying over from logstash-output-elasticsearch: logstash-plugins/logstash-output-elasticsearch#785

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has documentation added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

# we won't use a lot of heap.
:target_bulk_bytes => {
:validate => :number,
:default => 9 * 1024 * 1024 # 9MiB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep default as 20MiB for backward compatible and allow AWS OpenSearch Service users to set value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it doesn't work for many users giving problems that are hard to resolve. I think it's better if the plugin works out of the box.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing default value will break backwards compatibility. Correct me if I am wrong, but if a user is sending bulk requests to a self managed cluster configured with http.max_content_length > 9 MiB, upgrading to a version with this change would cause the plugin to fail to send all data.

Instead of changing default, maybe a better solution would be to update this documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing default value will break backwards compatibility. Correct me if I am wrong, but if a user is sending bulk requests to a self managed cluster configured with http.max_content_length > 9 MiB, upgrading to a version with this change would cause the plugin to fail to send all data.

Not at all. The only drawback is a slightly lower maximum throughput. But with this fix a user can tweak to their hearts content.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the correction.

Still, I feel this may violate backwards compatibility. If a user is sending bulk requests to a self managed cluster configured with http.max_content_length > 9 MiB, and they upgrade to this version, they may see performance degradation due to the decrease in maximum throughput. So, Im still in favor of keeping it at 20 MiB and letting users modify to work with AWS OpenSearch Service. The documentation above could be updated to prevent confusion.

What are your thoughts @dblock and @VijayanB?

Copy link
Member

@VijayanB VijayanB Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem i could see is that we might be flushing bulk request frequent than before for size > 9 and < 20. Let's say, my total request size is 18 MB, with this change it will send two request than one, and proposal is to configure the value to say 20MB @msvticket can you correct if my understanding is wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's correct. My opinion is that it is more important to prevent errors than preserving probably not that common cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only thing you're arguing about is whether to increment a minor version for this change, or a major version.

Sounds like the default was poorly chosen. If it causes bugs, then it's a bug fix, and changing the value of the default in a minor version is OK. Otherwise do the change in a major version increment. My take for this case is that it's ok to do it in a minor version increment.

Since it's configurable, you are preserving backwards compatibility. Behavior may change, but it's not broken. You will surprise some people who didn't read the release notes.

Copy link
Member

@VijayanB VijayanB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@VijayanB VijayanB added the enhancement Increases software capabilities beyond original client specifications label Sep 22, 2021
# we won't use a lot of heap.
:target_bulk_bytes => {
:validate => :number,
:default => 9 * 1024 * 1024 # 9MiB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only thing you're arguing about is whether to increment a minor version for this change, or a major version.

Sounds like the default was poorly chosen. If it causes bugs, then it's a bug fix, and changing the value of the default in a minor version is OK. Otherwise do the change in a major version increment. My take for this case is that it's ok to do it in a minor version increment.

Since it's configurable, you are preserving backwards compatibility. Behavior may change, but it's not broken. You will surprise some people who didn't read the release notes.

@@ -35,6 +35,7 @@ def self.build(logger, hosts, params)
end

common_options[:timeout] = params["timeout"] if params["timeout"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are here, does specifying timeout = nil make the request not timeout? For another PR/workitem I'm used to seeing if params.key?(:timeout) and support all values with predictable behavior.

@VijayanB VijayanB requested a review from vamshin September 24, 2021 17:52
Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@VijayanB VijayanB merged commit aeee1cf into opensearch-project:main Sep 28, 2021
@VijayanB VijayanB added the 1.2.0 latest release label Sep 28, 2021
@VijayanB
Copy link
Member

Thank you @msvticket for your contribution and looking forward for more. We will release this feature as part of our next release (1.2.0) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.2.0 latest release enhancement Increases software capabilities beyond original client specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants