Skip to content

Default buffer type to 'heap' for 9.0 #16500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 13, 2024

Conversation

andsel
Copy link
Contributor

@andsel andsel commented Oct 2, 2024

Release notes

Switch the default value of pipeline.buffer.type to use the heap memory instead of direct one.

What does this PR do?

Change the default value of the setting pipeline.buffer.type from direct to heap and update consequently the documentation.

Why is it important/What is the impact to the user?

It's part of the work to make more explicit the usage of memory used by Netty based plugins. Using heap instead of direct for Netty buffers, make easier the debug of memory issues at the cost of bigger heap size.

Checklist

  • My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • [ ] I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

As a developer of Netty based plugin I want to have better insight in memory usage patterns.

@@ -126,7 +126,7 @@ to provide better performance, especially when interacting with the network stac
Under heavy load, namely large number of connections and large messages, the direct memory space can be exhausted and lead to Out of Memory (OOM) errors in off-heap space.

An off-heap OOM is difficult to debug, so {ls} provides a `pipeline.buffer.type` setting in <<logstash-settings-file>> that lets you control where to allocate memory buffers for plugins that use them.
Currently it is set to `direct` by default, but you can change it to `heap` to use Java heap space instead, which will be become the default in the future.
Currently it is set to `heap` by default, but you can change it to `direct` to use direct memory space instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important to note that the Java heap sizing requirements will be impacted by this change since
allocations that previously resided on the direct memory will use heap instead. 
Performance-wise there shouldn't be a noticeable impact, since while direct memory IO is faster, Logstash Event objects produced by these plugins end up being allocated on the Java Heap, incurring the cost of copying from direct memory to heap memory regardless of the setting.

These must be changed as well, same with:

* When you set `pipeline.buffer.type` to `heap`, consider incrementing the Java heap by the 
amount of memory that had been reserved for direct space. 

Copy link
Member

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the rest of the docs on the config-details asciidoc as it assumes direct memory is the default, that heap will be come the default, and that the user will be changing it to heap (which doesn't make sense for someone starting fresh with 9.x)

Comment on lines 132 to 134
Performance-wise there shouldn't be a noticeable impact in switching to `direct`, since while direct memory IO is faster,
Logstash Event objects produced by these plugins end up being allocated on the Java Heap,
incurring the cost of copying from direct memory to heap memory regardless of the setting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we use pooled buffers anyway? I'm under the impression that allocations of buffers may be faster in direct, but that our pool usage is a primary driver for our buffer use being performant. We also know that large buffer allocations can succeed in heap where they would fail in direct because the GC will rearrange the existing object space to ensure that it can allocate while the direct allocation will simply not have an appropriately-sized continuous chunk. I would rather a small performance cost in this case than a crashing OOM :)

Suggested change
Performance-wise there shouldn't be a noticeable impact in switching to `direct`, since while direct memory IO is faster,
Logstash Event objects produced by these plugins end up being allocated on the Java Heap,
incurring the cost of copying from direct memory to heap memory regardless of the setting.
Performance-wise there shouldn't be a noticeable impact in switching to `direct`.
While allocating direct memory for an individual buffer is faster, these plugins use buffer pools to reduce allocations, and heap-managed allocations are significantly safer.
Logstash Event objects produced by these plugins also end up being allocated on the Java Heap, incurring the cost of copying from buffers to heap memory regardless of the setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always use the Netty PooledAllocator, but instead of using direct memory buffers it uses Java heap byte[]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yaauie

While allocating direct memory for an individual buffer is faster,

The is not the allocation that's faster, it's the transfer from OS buffers to direct buffers that's faster, that's usually needed when data is moved around in OS, so from network to filesystem, or network to network. In our case that direct buffers content always flow into Java heap space to inflate Logstash's events, so we loose the benefit of direct buffer, or at least is not so dominant.

@andsel andsel force-pushed the feature/default_buffer_type_to_heap branch from 0183fd0 to 44fdfaf Compare October 14, 2024 15:12
@andsel andsel requested review from jsvd and yaauie October 15, 2024 07:03
@jsvd jsvd changed the title Default buffer type to 'heap' fo 9.0 Default buffer type to 'heap' for 9.0 Oct 16, 2024
@andsel andsel requested a review from jsvd October 17, 2024 08:19
Copy link
Member

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor tweaks, otherwise LGTM, but I'd like @karenzone's input here as well.

Copy link
Contributor

@karenzone karenzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some suggestions for your consideration. Please LMKWYT.

@andsel
Copy link
Contributor Author

andsel commented Nov 4, 2024

Thanks @karen for all your precious suggestions. I've integrated all because results in more fluent wording without changing the meaning.

@andsel andsel requested a review from karenzone November 4, 2024 08:13
Copy link

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @andsel

Copy link
Contributor

@karenzone karenzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@andsel andsel merged commit a94659c into elastic:main Nov 13, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set pipeline.buffer.type to heap by default
5 participants