-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make behaviour consistent when big oneliners JSON is provided #44
Comments
Also facing this issue when using logstash-to-logstash integration which internally uses this codec. I think #43 completely broke logstash-input-http. The pipeline basically stops within a few minutes after startup. I'm not entirely sure how the current default 500MB buffer gets filled so fast, the forwarded data is nowhere near that size (at least it's much smaller at rest). Maybe the integration plugin is using this codec wrong? (If that's even a possible thing.) |
Looking into this, I'll post here soon with findings. |
I think I know what's wrong. The logstash integration's "logstash input" is using the json_lines codec instead of the normal json codec. This is coupled with a separate bug in the BufferedTokenizer when the tokenizer doesn't properly reset its state when data is only extracted from it through flushing:
Because the logstash output sends json payloads without a delimiter (they're just json arrays), then data only comes out of the codec through flushing, triggering this bug. |
Fixes are therefore needed in:
[edit] after chatting with @yaauie it makes more sense to fix the logstash output to properly emit ndjson like it was supposed to in the first place. |
@andsel Can this be closed? |
No can't be closed because the feature is still need to be implemented. As it's at version
|
With PR #43 was introduced the param
decode_size_limit_bytes
to provide a limit to the length of the json line that can be parsed to avoid potential OOM errors with very big oneliner files.The setting has a default value of
20Mb
, and this introduces a breakage in behaviour. If a user normally consumes lines of a size bigger than 20MB but that doesn't result in OOM error, with the new version3.2.0
he will experience a looping error like:that stuck the pipeline without any progression.
Proposal
This issue propose to get back to the original behaviour by default and eventually, when the codec has such
decode_size_limit_bytes
configured, if a line trigger is bigger than the limit, anyway create an event containing the partial string data. It also tag the event so that the pipeline can route and manage the error condition.This can be implemented after
BufferedTokenizerExt
is fixed to throw an exception also when the offending token is not the first of the fragment (elastic/logstash#17017).Ideally the tokenizer should return an iterator that verifies the size limit on every
next
method call.The text was updated successfully, but these errors were encountered: