Skip to content

7.3.1 master only nodes (not as as ingestion nodes) receive java.lang.NullPointerException / HTTP 500 error during _BULK operations from filebeat #46678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gregwjacobs opened this issue Sep 12, 2019 · 6 comments · Fixed by #46773
Assignees
Labels
>bug :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search.

Comments

@gregwjacobs
Copy link

Elasticsearch version (bin/elasticsearch --version):

OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Version: 7.3.1, Build: default/deb/4749ba6/2019-08-19T20:19:25.651794Z, JVM: 12.0.2

Plugins installed: []

# curl -sS http://localhost:9200/_cat/plugins |awk '{print $2 " " $3}' |sort |uniq |sort
discovery-ec2 7.3.1
repository-s3 7.3.1

JVM version (java -version):

/usr/share/elasticsearch/jdk/bin/java --version
openjdk 12.0.2 2019-07-16
OpenJDK Runtime Environment (build 12.0.2+10)
OpenJDK 64-Bit Server VM (build 12.0.2+10, mixed mode, sharing)

OS version (uname -a if on a Unix-like system):

Linux obs6esmaster-i-YYYYYYYobsstore01.XXXXX.awn 4.4.0-1079-aws #89-Ubuntu SMP Tue Mar 26 15:25:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

** Filebeat version ** (while the issue seems Elasticsearch centric noting version of filebeat for repo )

root@obs6esmaster-i-094de1714b4abf10f:/usr/share/java# filebeat version
filebeat version 7.3.1 (amd64), libbeat 7.3.1 [a4be71b90ce3e3b8213b616adfcd9e455513da45 built 2019-08-19 19:30:50 +0000 UTC]

Description of the problem including expected versus actual behaviour:

When using filebeat locally on master only nodes, Elasticsearch 6.7.1 versions of both elastic and filebeat were able to send metrics and log events without issue into elastic itself.
Depoying a new version of both elasticsearch and filebeat, to version 7.3.1 we noted that master only nodes throw java.lang.NullPointerException exceptions, with _BULK payloads sent from filebeat into its logging ingestion pipeline.

If we configured the filebeat running on the master node, to send its events to any other non-master node in the cluster - the issue is avoided. The clusters above are configured to use dedicated ingestion nodes (so masters cannot execute pipelines directly)

One key point of interest. If we ENABLE the master node to also act as an ingestion node, the java.lang.NullPointerException does NOT occur. Hopefully this helps narrow down the bug.

Problem Statement Summary
Filebeat 7.3.1 sending logging events to the _BULK ingestion pipeline named filebeat-7.3.1-elasticsearch-server-pipeline causes the master node filebeat is using to throw a receive java.lang.NullPointerException / HTTP 500 exception / error. This did not occur with identical deployments of version 6.7.1 of both elastic and filebeat. And as noted above, if we enable ingestion on the masters, this does not occur.

Other nodes in our cluster do NOT have ingestion enabled, but do not have this issue. We only have found this on MASTER only nodes thus far and only with version 7.3.1 vs our prior 6.7.1.

Steps to reproduce:

1.Configure a cluster with dedicated ingestion nodes, such that master nodes Can not execute pipeline ingestions, version 7.3.1 we've confirmed this issue with. And we've confirmed this does not occur with version 6.7.1 we've ran the same configuration prior
2.Configure filebeat to run on the master nodes and send events via localhost:9200
3. You should see the following errors as we have
4. Workaround - if we send events to non-master nodes, _BULK ingestion from the filebeat module works

Example Configurations used:

Filebeat

root@obs6esmaster-i-XXXXXXXXXX:/etc/filebeat# cat filebeat.yml |grep -v "#\|^$"
filebeat.inputs:
  - type: log
    enabled: false
    paths:
      - /var/log/*.log
filebeat.modules:
  - module: system
  - module: elasticsearch
    gc:
      enabled: false
setup.template.settings:
  index.number_of_shards: 3
  index.codec: best_compression
  _source.enabled: true
output.elasticsearch:
  hosts: ["localhost:9200"]
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
logging.level: debug
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0644
xpack.monitoring.enabled: true
xpack.monitoring.elasticsearch:

Elasticseach Cluster

root@obs6esmaster-i-XXXXXXXXXX:/etc/filebeat# cat /etc/elasticsearch/elasticsearch.yml |grep -v "#\|^$"
cluster.name: obsstore01.YYYYY.ZZZZZ
cluster.initial_master_nodes: [ larry, curly, moe ]
node.master: true
node.data: false
node.ingest: false
node.ml: false
path:
  data: /mnt/disk1/elasticsearch/data
  logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: [ 10.113.89.170, _local_ ]
http.port: 9200
transport.port: 9300
transport.publish_port: 9300
xpack.monitoring.enabled: true
xpack.monitoring.collection.enabled: true
xpack.monitoring.collection.cluster.stats.timeout : 20s
xpack.monitoring.collection.node.stats.timeout: 20s
xpack.monitoring.collection.index.stats.timeout: 30s
xpack.watcher.enabled: false
xpack.graph.enabled: false
xpack.ml.enabled: false
xpack.security.enabled: false
discovery:
  seed_providers: ec2
  ec2:
    any_group: false
    endpoint: ec2.<some_region_seeeecret>.amazonaws.com
    tag:
      Cell: not_telling_you_sorry
      Environment: some_super_secret_env_name
gateway.expected_nodes: 1
gateway.recover_after_time: 2m
gateway.recover_after_nodes: 1
cluster.routing.allocation.node_initial_primaries_recoveries: 4
cluster.routing.allocation.node_concurrent_recoveries: 2
reindex.remote.whitelist: "obs7esingest*:*, obs7esquery*:*"
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 95%
cluster.routing.allocation.disk.watermark.high: 98%
cluster.routing.allocation.disk.watermark.flood_stage: 99.5%
action.destructive_requires_name: false
action.auto_create_index: .security,.monitoring*,.watches,.triggered_watches,.watcher-history-*,.ml*, filebeat-*
http.cors.enabled: true
http.cors.allow-origin: /.*/
http.cors.allow-credentials: true
http.cors.allow-headers: X-Requested-With, Content-Type, Content-Length
indices.memory.index_buffer_size: 10%

Provide logs (if relevant):

Example exception log from a master receiving the _BULK payload from the localhost instance of file-beat.

[2019-09-11T20:24:10,815][WARN ][r.suppressed             ] [obs6esmaster-i-0ee5f1df872af9b2f.obsstore01.gjacobs.dev2.awn] path: /_bulk, params: {}
org.elasticsearch.transport.RemoteTransportException: [obs6esingest-i-0257e445e9599463d.obsstore01.gjacobs.dev2.awn][10.113.147.73:9300][indices:data/write/bulk]
Caused by: java.lang.NullPointerException
	at org.elasticsearch.common.io.stream.StreamOutput.writeString(StreamOutput.java:403) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.DocWriteResponse.writeTo(DocWriteResponse.java:283) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.update.UpdateResponse.writeTo(UpdateResponse.java:82) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.bulk.BulkItemResponse.writeTo(BulkItemResponse.java:510) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.bulk.BulkResponse.writeTo(BulkResponse.java:148) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.OutboundMessage.writeMessage(OutboundMessage.java:70) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.OutboundMessage.serialize(OutboundMessage.java:53) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.OutboundHandler$MessageSerializer.get(OutboundHandler.java:169) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.OutboundHandler$MessageSerializer.get(OutboundHandler.java:155) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.OutboundHandler$SendContext.get(OutboundHandler.java:202) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.OutboundHandler.internalSend(OutboundHandler.java:132) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.OutboundHandler.sendMessage(OutboundHandler.java:127) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.OutboundHandler.sendResponse(OutboundHandler.java:107) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:64) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:47) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:30) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkRequestModifier.lambda$wrapActionListenerIfNeeded$1(TransportBulkAction.java:675) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:112) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.finishHim(TransportBulkAction.java:473) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.onResponse(TransportBulkAction.java:454) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.onResponse(TransportBulkAction.java:443) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:68) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:64) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.finishOnSuccess(TransportReplicationAction.java:846) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$1.handleResponse(TransportReplicationAction.java:765) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$1.handleResponse(TransportReplicationAction.java:749) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1101) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:224) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) ~[elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:216) [elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:141) [elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) [elasticsearch-7.3.1.jar:7.3.1]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) [transport-netty4-client-7.3.1.jar:7.3.1]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.36.Final.jar:4.1.36.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.36.Final.jar:4.1.36.Final]
	at java.lang.Thread.run(Thread.java:835) [?:?]

The localhost instance of filebeat reports the same error on its side as per

2019-09-11T20:33:03 obs6esmaster-i-0ee5f1df872af9b2f.obsstore01.gjacobs.dev2.awn filebeat[20238]: 2019-09-11T20:33:03.717Z#011ERROR#011elasticsearch/client.go:343#011Failed to perform any bulk index operations: 500 Internal Server Error: {"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[obs6esingest-i-0257e445e9599463d.obsstore01.gjacobs.dev2.awn][10.113.147.73:9300][indices:data/write/bulk]"}],"type":"null_pointer_exception","reason":null},"status":500}

A packet capture of the same failures above show similar as

POST /_bulk HTTP/1.1
Host: localhost:9200
User-Agent: Go-http-client/1.1
Content-Length: 71073
Accept: application/json
Content-Type: application/json; charset=UTF-8
Accept-Encoding: gzip

{"index":{"_index":"filebeat-7.3.1","pipeline":"filebeat-7.3.1-elasticsearch-server-pipeline"}}
{"@timestamp":"2019-09-11T20:32:23.520Z","message":"[2019-09-11T20:23:07.523+0000][18816][safepoint    ] Application time: 0.0326183 seconds","fileset":{"name":"server"},"input":{"type":"log"},"agent":{"type":"filebeat","ephemeral_id":"ddc04c98-fe46-41bf-934e-165abc3b2a25","hostname":"obs6esmaster-i-0ee5f1df872af9b2f","id":"5de46c1a-080a-4c23-8962-ec876ecfdee6","version":"7.3.1"},"ecs":{"version":"1.0.1"},"cloud":{"provider":"aws","instance":{"id":"i-0ee5f1df872af9b2f"},"machine":{"type":"m5.large"},"region":"us-west-2","availability_zone":"us-west-2a","account":{"id":"099078926092"},"image":{"id":"ami-0eb23bc37ff79dad8"}},"log":{"file":{"path":"/var/log/elasticsearch/gc.log"},"offset":68058},"event":{"module":"elasticsearch","dataset":"elasticsearch.server","timezone":"+00:00"},"host":{"name":"obs6esmaster-i-0ee5f1df872af9b2f","containerized":false,"hostname":"obs6esmaster-i-0ee5f1df872af9b2f","architecture":"x86_64","os":{"version":"16.04.6 LTS (Xenial Xerus)","family":"debian","name":"Ubuntu","kernel":"4.4.0-1079-aws","codename":"xenial","platform":"ubuntu"},"id":"95d49d0d589d4c9d9923414a4fb3d908"},"service":{"type":"elasticsearch"}}
---- Ommited rest of events to reduce size of this GitHub issue ----

And the corresponding response back from ES to Filebeat being

HTTP/1.1 500 Internal Server Error
content-type: application/json; charset=UTF-8
access-control-allow-credentials: true
content-encoding: gzip
content-length: 204

{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[obs6esingest-i-0f6a4745df63266f5.obsstore01.gjacobs.dev2.awn][10.113.123.205:9300][indices:data/write/bulk]"}],"type":"null_pointer_exception","reason":null},"status":500}

We can see that we're able to receive x-pack metrics from the filebeat running locally on this master only node, but notice the failure for the logging events rates wise
image

I have done further tests to try and isolate and reproduce the issue but it seems very specific to the content/payload perhaps of filebeat in part (or else related to gzip, or content size)

I could not reproduce the issue by hand using non-filebeat data to the same ingestion pipeline - see curl example below

root@obs6esmaster-i-XXXXXXX:/etc/filebeat# curl -H 'Content-Type: application/json' -vvvv -XPUT http://localhost:9200/_bulk -d $'{\"index\":{\"_index\":\"filebeat-7.3.1\",\"pipeline\":\"filebeat-7.3.1-elasticsearch-server-pipeline\"}}  \0/_bulk -d $'{\"index\":{\"_index\":\"myindex\
{\"field": \"value\"}\n'
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 9200 (#0)
> PUT /_bulk HTTP/1.1
> Host: localhost:9200
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 118
>
* upload completely sent off: 118 out of 118 bytes
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 281
< access-control-allow-credentials: true
<
* Connection #0 to host localhost left intact
{"took":169,"ingest_took":0,"errors":false,"items":[{"index":{"_index":"filebeat-7.3.1-2019.09.11-000001","_type":"_doc","_id":"2ZlCIm0BZsr1j6UGjRqc","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":333417,"_primary_term":1,"status":201}}]}

I can provide a localhost:9200 pcap of the data from filebeat that causes the NullPointers to fire if required.

@polyfractal polyfractal added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >bug labels Sep 12, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn dnhatn self-assigned this Sep 12, 2019
@rtkgjacobs
Copy link

Don't hesitate to reach out to me if you need more data, material or assistance in investigation of this reported issue. Thanks.

@dnhatn
Copy link
Member

dnhatn commented Sep 12, 2019

Don't hesitate to reach out to me if you need more data, material or assistance in investigation of this reported issue.

Thanks @gregwjacobs :).

@ywelsch
Copy link
Contributor

ywelsch commented Sep 16, 2019

The stack trace suggests that the problem is with an update request whose response has somehow lost the id field.

@rtkgjacobs can you share the full input with me? Please send an e-mail to yannick AT elastic co

Thank you.

@ywelsch ywelsch self-assigned this Sep 16, 2019
@rtkgjacobs
Copy link

(emailed pcap of above captured issue directly to yannick)

@ywelsch
Copy link
Contributor

ywelsch commented Sep 16, 2019

Quick update here. While I have not tried yet to reproduce this with the provided inputs, I was able to identify an issue when using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat as well) where such a NullPointerException could be produced.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when the request contains auto-generated IDs. The response serialization is lenient for our REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for communication between nodes) assumes for this field to be non-null, which means that it can't be serialized between nodes. Bulk requests with ingest functionality are processed on the coordinating node if the node has the ingest capability, and only otherwise sent to a different node. This means that, in order to reproduce this, one needs two nodes, with the coordinating node not having the ingest functionality, just as @rtkgjacobs has outlined in the opening post.

I'm looking into our options on how to patch this.

ywelsch added a commit that referenced this issue Sep 19, 2019
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.

Closes #46678
ywelsch added a commit that referenced this issue Sep 19, 2019
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.

Closes #46678
ywelsch added a commit that referenced this issue Sep 19, 2019
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.

Closes #46678
ywelsch added a commit that referenced this issue Sep 20, 2019
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.

Closes #46678
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants