Skip to content

ES 6.8.1/7.2.0 varying responses on /<index>/_analyze request in a two node cluster #44078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tanya-f opened this issue Jul 8, 2019 · 5 comments · Fixed by #44284
Closed
Assignees
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@tanya-f
Copy link

tanya-f commented Jul 8, 2019

Windows 10, JRE version 1.8.0_201

  1. Download and unpack elasticsearch zip archive
  2. Add a row to the configuration file:
    node.max_local_storage_nodes: 2
  3. Run elasticsearch.bat in two separate powershell windows
  4. Create an index: PUT /foo
  5. Analysis request (against index) of text containing only punctuation returns different responses.
  • Elasticsearch 6.8.1

Request:
GET /foo/_analyze { "text": "." }
Response: { "tokens": [] } alternates with empty response {}

Request against index with explain option:
GET /foo/_analyze { "text": ".", "explain": true }
Normal response

{
   "detail": {
      "custom_analyzer": false,
      "analyzer": {
         "name": "default",
         "tokens": []
      }
   }
}

alternates with error response

{
   "error": {
      "root_cause": [
         {
            "type": "null_pointer_exception",
            "reason": null
         }
      ],
      "type": "null_pointer_exception",
      "reason": null
   },
   "status": 500
}

Logs of error response:

[2019-07-08T17:47:36,979][WARN ][r.suppressed             ] [S6SaS48] path: /foo/_analyze, params: {index=foo}
java.lang.NullPointerException: null
	at org.elasticsearch.action.admin.indices.analyze.DetailAnalyzeResponse$AnalyzeTokenList.toXContentWithoutObject(DetailAnalyzeResponse.java:299) ~[elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.action.admin.indices.analyze.DetailAnalyzeResponse.toXContent(DetailAnalyzeResponse.java:140) ~[elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.action.admin.indices.analyze.AnalyzeResponse.toXContent(AnalyzeResponse.java:261) ~[elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.rest.action.RestToXContentListener.buildResponse(RestToXContentListener.java:47) ~[elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.rest.action.RestToXContentListener.buildResponse(RestToXContentListener.java:42) ~[elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.rest.action.RestToXContentListener.buildResponse(RestToXContentListener.java:34) ~[elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37) ~[elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47) [elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:85) [elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:81) [elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$2.handleResponse(TransportSingleShardAction.java:268) [elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction$2.handleResponse(TransportSingleShardAction.java:252) [elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1104) [elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.transport.TcpTransport$1.doRun(TcpTransport.java:985) [elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:193) [elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.transport.TcpTransport.handleResponse(TcpTransport.java:977) [elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:952) [elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:763) [elasticsearch-6.8.1.jar:6.8.1]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-6.8.1.jar:6.8.1]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
	at java.lang.Thread.run(Unknown Source) [?:1.8.0_201]
  • Elasticsearch 7.2.0

Request against empty foo index:
GET /foo/_analyze { "text": "." }
Response: { "tokens": [] } alternates with {}
But request with explain option always returns normal response.

@markharwood markharwood added :Search Relevance/Analysis How text is split into tokens >bug labels Jul 9, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@cbuescher cbuescher self-assigned this Jul 12, 2019
@cbuescher
Copy link
Member

I can also reproduce the behaviour on 7.2.0 with two local nodes. Will focus on that version first and see if we can reproduce in an integration test.

@cbuescher
Copy link
Member

The problem is still present on master as well, it is caused by the way we serialize and deserialize the analyzed token lists on the transport layer. We allow token lists to be null if the response contains the "explain" details. During writing to the stream we write a "0" value to the stream if the token list is null or if it is an empty list of size 0, so we cannot distinguish those two cases upon rean and read both cases back to a null value. That means we effectively cannot send a zero-sized token list through the transport layer.
This requires a fix on the serialization layer. I will open a PR on master, we will see how far we can backport this and if there might be ways on 6.8 how we can mitigate the results of this edge case otherwise.

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Jul 12, 2019
Currently we loose information about whether a token list in an AnalyzeAction
response is null or an empty list, because we write a 0 value to the stream in
both cases and deserialize to a null value on the receiving side. This change
fixes this so we write an additional flag indicating whether the value is null
or not, followed by the size of the list and its content.

Closes elastic#44078
cbuescher pushed a commit that referenced this issue Jul 12, 2019
Currently we loose information about whether a token list in an AnalyzeAction
response is null or an empty list, because we write a 0 value to the stream in
both cases and deserialize to a null value on the receiving side. This change
fixes this so we write an additional flag indicating whether the value is null
or not, followed by the size of the list and its content.

Closes #44078
cbuescher pushed a commit that referenced this issue Jul 14, 2019
Currently we loose information about whether a token list in an AnalyzeAction
response is null or an empty list, because we write a 0 value to the stream in
both cases and deserialize to a null value on the receiving side. This change
fixes this so we write an additional flag indicating whether the value is null
or not, followed by the size of the list and its content.

Closes #44078
cbuescher pushed a commit that referenced this issue Jul 14, 2019
Currently we loose information about whether a token list in an AnalyzeAction
response is null or an empty list, because we write a 0 value to the stream in
both cases and deserialize to a null value on the receiving side. This change
fixes this so we write an additional flag indicating whether the value is null
or not, followed by the size of the list and its content.

Closes #44078
@cbuescher
Copy link
Member

The serialization root cause of this is fixed in 7.3 and upcoming branches by #44284 but backporting the serialization protocol change to 6.8 isn't straight forward because we already released other 7.x versions that would not follow the fixed protocol. I'm looking into maybe patching the different outputs by some changes to the x-content rendering on 6.8.

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Jul 15, 2019
Currently we loose information about whether a token list in an AnalyzeAction
response is null or an empty list, because we write a 0 value to the stream in
both cases and deserialize to a null value on the receiving side. This was fixed
in elastic#44284 by a change in the serialization protocol starting in 7.3. However
this PR fixes the symptoms without changing the wire protocol which we cannot to
easily on 6.8 because we already released incompatible versions in the 7.x line.
This change adds special handling on xcontent output and if getToken() is
callled on either AnalyzeResponse or DetailedAnalyzeResponse to always return
empty lists instead of null values.

Relates to elastic#44078
cbuescher pushed a commit that referenced this issue Jul 16, 2019
Currently we loose information about whether a token list in an AnalyzeAction
response is null or an empty list, because we write a 0 value to the stream in
both cases and deserialize to a null value on the receiving side. This was fixed
in #44284 by a change in the serialization protocol starting in 7.3. However
this PR fixes the symptoms without changing the wire protocol which we cannot to
easily on 6.8 because we already released incompatible versions in the 7.x line.
This change adds special handling on xcontent output and if getToken() is
callled on either AnalyzeResponse or DetailedAnalyzeResponse to always return
empty lists instead of null values.

Relates to #44078
@cbuescher
Copy link
Member

Also fixed on the 6.8 branch with #44342, so if we release a 6.8.2 it should be fixed there.

@javanna javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants