-
Notifications
You must be signed in to change notification settings - Fork 25.2k
7.3.1 master only nodes (not as as ingestion nodes) receive java.lang.NullPointerException / HTTP 500 error during _BULK operations from filebeat #46678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-distributed |
Don't hesitate to reach out to me if you need more data, material or assistance in investigation of this reported issue. Thanks. |
Thanks @gregwjacobs :). |
The stack trace suggests that the problem is with an update request whose response has somehow lost the id field. @rtkgjacobs can you share the full input with me? Please send an e-mail to yannick AT elastic co Thank you. |
(emailed pcap of above captured issue directly to yannick) |
Quick update here. While I have not tried yet to reproduce this with the provided inputs, I was able to identify an issue when using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat as well) where such a NullPointerException could be produced. The issue is that I'm looking into our options on how to patch this. |
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat as well) + coordinating nodes that do not have the ingest processor functionality, this can lead to a NullPointerException. The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when the request contains auto-generated IDs. The response serialization is lenient for our REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for communication between nodes) assumes for this field to be non-null, which means that it can't be serialized between nodes. Bulk requests with ingest functionality are processed on the coordinating node if the node has the ingest capability, and only otherwise sent to a different node. This means that, in order to reproduce this, one needs two nodes, with the coordinating node not having the ingest functionality. Closes #46678
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat as well) + coordinating nodes that do not have the ingest processor functionality, this can lead to a NullPointerException. The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when the request contains auto-generated IDs. The response serialization is lenient for our REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for communication between nodes) assumes for this field to be non-null, which means that it can't be serialized between nodes. Bulk requests with ingest functionality are processed on the coordinating node if the node has the ingest capability, and only otherwise sent to a different node. This means that, in order to reproduce this, one needs two nodes, with the coordinating node not having the ingest functionality. Closes #46678
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat as well) + coordinating nodes that do not have the ingest processor functionality, this can lead to a NullPointerException. The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when the request contains auto-generated IDs. The response serialization is lenient for our REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for communication between nodes) assumes for this field to be non-null, which means that it can't be serialized between nodes. Bulk requests with ingest functionality are processed on the coordinating node if the node has the ingest capability, and only otherwise sent to a different node. This means that, in order to reproduce this, one needs two nodes, with the coordinating node not having the ingest functionality. Closes #46678
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat as well) + coordinating nodes that do not have the ingest processor functionality, this can lead to a NullPointerException. The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when the request contains auto-generated IDs. The response serialization is lenient for our REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for communication between nodes) assumes for this field to be non-null, which means that it can't be serialized between nodes. Bulk requests with ingest functionality are processed on the coordinating node if the node has the ingest capability, and only otherwise sent to a different node. This means that, in order to reproduce this, one needs two nodes, with the coordinating node not having the ingest functionality. Closes #46678
Elasticsearch version (
bin/elasticsearch --version
):Plugins installed: []
JVM version (
java -version
):OS version (
uname -a
if on a Unix-like system):** Filebeat version ** (while the issue seems Elasticsearch centric noting version of filebeat for repo )
Description of the problem including expected versus actual behaviour:
When using filebeat locally on master only nodes, Elasticsearch 6.7.1 versions of both elastic and filebeat were able to send metrics and log events without issue into elastic itself.
Depoying a new version of both elasticsearch and filebeat, to version 7.3.1 we noted that master only nodes throw
java.lang.NullPointerException
exceptions, with_BULK
payloads sent from filebeat into its logging ingestion pipeline.If we configured the filebeat running on the master node, to send its events to any other non-master node in the cluster - the issue is avoided. The clusters above are configured to use dedicated ingestion nodes (so masters cannot execute pipelines directly)
One key point of interest. If we ENABLE the master node to also act as an ingestion node, the
java.lang.NullPointerException
does NOT occur. Hopefully this helps narrow down the bug.Problem Statement Summary
Filebeat 7.3.1 sending logging events to the _BULK ingestion pipeline named
filebeat-7.3.1-elasticsearch-server-pipeline
causes the master node filebeat is using to throw areceive java.lang.NullPointerException / HTTP 500
exception / error. This did not occur with identical deployments of version 6.7.1 of both elastic and filebeat. And as noted above, if we enable ingestion on the masters, this does not occur.Other nodes in our cluster do NOT have ingestion enabled, but do not have this issue. We only have found this on MASTER only nodes thus far and only with version 7.3.1 vs our prior 6.7.1.
Steps to reproduce:
1.Configure a cluster with dedicated ingestion nodes, such that master nodes Can not execute pipeline ingestions, version 7.3.1 we've confirmed this issue with. And we've confirmed this does not occur with version 6.7.1 we've ran the same configuration prior
2.Configure filebeat to run on the master nodes and send events via
localhost:9200
3. You should see the following errors as we have
4. Workaround - if we send events to non-master nodes,
_BULK
ingestion from the filebeat module worksExample Configurations used:
Filebeat
Elasticseach Cluster
Provide logs (if relevant):
Example exception log from a master receiving the _BULK payload from the localhost instance of file-beat.
The localhost instance of filebeat reports the same error on its side as per
A packet capture of the same failures above show similar as
And the corresponding response back from ES to Filebeat being
We can see that we're able to receive x-pack metrics from the filebeat running locally on this master only node, but notice the failure for the logging events rates wise

I have done further tests to try and isolate and reproduce the issue but it seems very specific to the content/payload perhaps of filebeat in part (or else related to gzip, or content size)
I could not reproduce the issue by hand using non-filebeat data to the same ingestion pipeline - see curl example below
I can provide a localhost:9200 pcap of the data from filebeat that causes the NullPointers to fire if required.
The text was updated successfully, but these errors were encountered: