Skip to content

Occassionally, queries return results from the wrong index #198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
clintongormley opened this issue May 28, 2010 · 12 comments
Closed

Occassionally, queries return results from the wrong index #198

clintongormley opened this issue May 28, 2010 · 12 comments

Comments

@clintongormley
Copy link
Contributor

Hiya

See the query below - I query the alias ia_object which points to the index ia_object_1274409377, for objects of type notice, but the resuts returned include results from my other index iannounce_object_1274399922 of types other than notice.

I'm wondering if one node has dispatched the query, then picked up the wrong resultset?

curl -XGET 'http://idb2:9200/ia_object/notice/_search?search_type=dfs_query_then_fetch'  -d '
{
   "sort" : [
      {
         "score" : "desc"
      }
   ],
   "fields" : [],
   "from" : 0,
   "query" : {
      "filtered" : {
         "filter" : {
            "bool" : {
               "must" : [
                  {
                     "term" : {
                        "sub_type" : "engagement"
                     }
                  },
                  {
                     "term" : {
                        "status" : "active"
                     }
                  },
                  {
                     "term" : {
                        "location_id" : "40"
                     }
                  }
               ]
            }
         },
         "query" : {
            "dis_max" : {
               "queries" : [
                  {
                     "query_string" : {
                        "fields" : [
                           "name"
                        ],
                        "boost" : 1,
                        "query" : "\"David Mcgeoch\"~7"
                     }
                  },
                  {
                     "filtered" : {
                        "filter" : {
                           "term" : {
                              "has_name" : "0"
                           }
                        },
                        "query" : {
                           "query_string" : {
                              "fields" : [
                                 "text"
                              ],
                              "boost" : 1.3,
                              "query" : "\"David Mcgeoch\"~7"
                           }
                        }
                     }
                  },
                  {
                     "query_string" : {
                        "boost" : 1,
                        "query" : "David Mcgeoch"
                     }
                  }
               ],
               "tie_breaker" : 0.7
            }
         }
      }
   },
   "size" : 300
}
'
# {
#    "hits" : {
#       "hits" : [
#          {
#             "_index" : "iannounce_object_1274399922",
#             "_id" : "3686897",
#             "_type" : "memorial"
#          },
#          {
#             "_index" : "ia_object_1274409377",
#             "_id" : "1336845",
#             "_type" : "notice"
#          },
#          {
#             "_index" : "iannounce_object_1274399922",
#             "_id" : "2081719",
#             "_type" : "memorial"
#          },
#          {
#             "_index" : "iannounce_object_1274399922",
#             "_id" : "384424",
#             "_type" : "memorial"
#          },
#          {
#             "_index" : "iannounce_object_1274399922",
#             "_id" : "234160",
#             "_type" : "memorial"
#          },
#          {
#             "_index" : "iannounce_object_1274399922",
#             "_id" : "238852",
#             "_type" : "memorial"
#          },

[ SNIPPED ]

#       ],
#       "total" : 7390
#    },
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    }
# }

There were some errors in the ES log around that time:

[07:14:35,985][DEBUG][action.search.type       ] [Ringleader] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Hulkling][inet[/192.168.10.50:9300]][search/phase/fetch/id]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [9785964], timed out
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:444)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:435)
        at org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChannelHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
[07:14:35,986][DEBUG][action.search.type       ] [Ringleader] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Hulkling][inet[/192.168.10.50:9300]][search/phase/fetch/id]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [9785961], timed out
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:444)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:435)
        at org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChannelHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
[07:14:35,986][DEBUG][action.search.type       ] [Ringleader] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Hulkling][inet[/192.168.10.50:9300]][search/phase/fetch/id]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [9785965], timed out
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:444)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:435)
        at org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChannelHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
[07:14:35,991][DEBUG][action.search.type       ] [Ringleader] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Hulkling][inet[/192.168.10.50:9300]][search/phase/fetch/id]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [9785963], timed out
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:444)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:435)
        at org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChannelHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
[07:14:36,212][DEBUG][action.search.type       ] [Ringleader] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context found for id [4594371], timed out
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
        at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:309)
        at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.executeFetch(TransportSearchDfsQueryThenFetchAction.java:226)
        at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.access$700(TransportSearchDfsQueryThenFetchAction.java:64)
        at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction$4.run(TransportSearchDfsQueryThenFetchAction.java:199)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
@kimchy
Copy link
Member

kimchy commented May 28, 2010

Hey,

Very, very strange. Lets start with the second failures. Is there a chance that you get a long search requests or load that is higher than 5 minutes? The default timeout between a query phase and a fetch phase is 5 minutes (or controlled by setting the keep alive parameter).

Second, regarding the first problem. This is very very strange. I think your assumption is correct, and for some reason, requests got mixed up. I don't see where it can happen in elasticsearch, but I will investigate further. It must be in the HTTP layer, and thats pretty straightforward. It might also be on the http client side..., trying to think how best to nail down where and why this happens.... (well, it can also be in the logging layer :) )

@clintongormley
Copy link
Contributor Author

Re long search requests - I don't honestly know. I haven't seen that happening, but my app times out requests after 15 seconds, then retries, so it may well be hidden.

I'm certainly not seeing anything unusual regarding CPU or memory use in jvisualvm.

I think you'll need to add some more logging to nail it down. Also, this is v 0.7.1 - I haven't upgraded yet.

@kimchy
Copy link
Member

kimchy commented May 28, 2010

When I said client side, I meant the perl http client, which might mix messages up, if that was not obvious :). I am trying to think of the best logging to add. I can generate a unique id for each rest request when it is received, and log the request, and then, log the response sent under that unique id, what do you think?

@clintongormley
Copy link
Contributor Author

I think it is highly unlikely that it is happening on the Perl side. I'm using a core module that has been stable for many years - something as obvious as this would have been noticed before. Also, it is not an async request, so nothing to mix up there.

Re logging - you can add that if you like, but i think you're probably looking in the wrong place. The reason I posted the errors was that they happened at the same time as the bad results (i can't be certain that they were generated by the same request or not). I think it is likelier that the problem is node communication getting mixed up, while retrieving results from the other shards.

but what do I know :)

@kimchy
Copy link
Member

kimchy commented May 29, 2010

I agree with you, the perl side is probably not to blame.

So, some background. You execute a dfs query then fetch. Thats three phases, first, the dfs, that goes to all shards. Then, query, that goes to all shards with the dfs result. Last, the fetch, that goes to the relevant shards.

A certain shard communication might have got messes up, which I am trying to understand if can happen. In your response (if you still have all of it) does it look like some of the response are correct, while others are wrong?

If the response looks like simply a response for a another query, then its more high level then that, if its a mixed result, with some good and some bad, then its a messed up specific shard communication in the search.

When do you plan to upgrade to 0.8? It will be easier to help with that version.

@clintongormley
Copy link
Contributor Author

I don't have the full results any more, but looking at the above, the second hit matches the query correctly, but all the others (i think it was all) were from the wrong index.

Hopefully I'll be on 0.8 next weekend. It won't be this weekend.

@kimchy
Copy link
Member

kimchy commented May 29, 2010

Yea, missed the second hit. Ok, I will look into this more try and see why this might happen. Ping me when you upgrade to 0.8.

@clintongormley
Copy link
Contributor Author

Bad news, kimchy - i'm still getting this error on v 0.8

@kimchy
Copy link
Member

kimchy commented Jun 11, 2010

Do you still get the exceptions in the logs?

@clintongormley
Copy link
Contributor Author

There are still exceptions in the logs, but the times don't coincide, so I'm not sure that they're related:

[02:20:53,165][DEBUG][action.search.type       ] [Fontanelle] [12798594] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Blue Bullet][inet[/192.168.10.41:9300]][search/phase/fetch/id]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [12798594], timed out
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:444)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:435)
    at org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChannelHandler.java:175)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
[02:41:38,114][DEBUG][action.search.type       ] [Fontanelle] [6167654] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context found for id [6167654], timed out
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
    at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:309)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.executeFetch(TransportSearchDfsQueryThenFetchAction.java:226)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.access$700(TransportSearchDfsQueryThenFetchAction.java:64)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction$4.run(TransportSearchDfsQueryThenFetchAction.java:199)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
[02:41:38,114][DEBUG][action.search.type       ] [Fontanelle] [6167652] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context found for id [6167652], timed out
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
    at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:309)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.executeFetch(TransportSearchDfsQueryThenFetchAction.java:226)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.access$700(TransportSearchDfsQueryThenFetchAction.java:64)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction$4.run(TransportSearchDfsQueryThenFetchAction.java:199)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
[02:41:38,115][DEBUG][action.search.type       ] [Fontanelle] [6167653] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context found for id [6167653], timed out
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
    at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:309)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.executeFetch(TransportSearchDfsQueryThenFetchAction.java:226)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.access$700(TransportSearchDfsQueryThenFetchAction.java:64)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction$4.run(TransportSearchDfsQueryThenFetchAction.java:199)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
[02:41:38,160][DEBUG][action.search.type       ] [Fontanelle] [12820997] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Blue Bullet][inet[/192.168.10.41:9300]][search/phase/fetch/id]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [12820997], timed out
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:444)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:435)
    at org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChannelHandler.java:175)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
[03:38:57,733][DEBUG][action.search.type       ] [Fontanelle] [12872042] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Blue Bullet][inet[/192.168.10.41:9300]][search/phase/fetch/id]
Caused by: java.lang.NullPointerException
[03:40:05,941][DEBUG][action.search.type       ] [Fontanelle] [6193567] Failed to execute query phase
org.elasticsearch.search.query.QueryPhaseExecutionException: [iannounce_object_1275333493][0]: query[filtered((name:"pamela goody"~7 | filtered(short_text:"pamela goody"~7^1.3)->FilterCacheFilterWrapper(org.elasticsearch.util.lucene.search.TermFilter@3e3f4f7) | (_all:pamela _all:goody))~0.7)->FilterCacheFilterWrapper(BooleanFilter( +FilterCacheFilterWrapper(org.elasticsearch.util.lucene.search.TermFilter@33d4352b) +FilterCacheFilterWrapper(org.elasticsearch.util.lucene.search.TermFilter@3a1bb115) +FilterCacheFilterWrapper(org.elasticsearch.util.lucene.search.TermFilter@40ea8cf6) +FilterCacheFilterWrapper(parent_id:[1 TO 1])))],from[0],size[300],sort[<score>]: Query Failed []
    at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:103)
    at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:193)
    at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:159)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.executeQuery(TransportSearchDfsQueryThenFetchAction.java:141)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.access$200(TransportSearchDfsQueryThenFetchAction.java:64)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction$1.run(TransportSearchDfsQueryThenFetchAction.java:114)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IllegalArgumentException: df for term pamela not available
    at org.elasticsearch.search.dfs.CachedDfSource.docFreq(CachedDfSource.java:51)
    at org.apache.lucene.search.Similarity.idfExplain(Similarity.java:769)
    at org.apache.lucene.search.PhraseQuery$PhraseWeight.<init>(PhraseQuery.java:122)
    at org.apache.lucene.search.PhraseQuery.createWeight(PhraseQuery.java:264)
    at org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.<init>(DisjunctionMaxQuery.java:107)
    at org.apache.lucene.search.DisjunctionMaxQuery.createWeight(DisjunctionMaxQuery.java:184)
    at org.apache.lucene.search.FilteredQuery.createWeight(FilteredQuery.java:63)
    at org.apache.lucene.search.FilteredQuery.createWeight(FilteredQuery.java:63)
    at org.apache.lucene.search.Query.weight(Query.java:101)
    at org.elasticsearch.search.internal.ContextIndexSearcher.createWeight(ContextIndexSearcher.java:78)
    at org.apache.lucene.search.Searcher.search(Searcher.java:49)
    at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:97)
    ... 8 more
[03:52:03,125][DEBUG][action.search.type       ] [Fontanelle] [12881061] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Blue Bullet][inet[/192.168.10.41:9300]][search/phase/fetch/id]
Caused by: java.lang.ArrayIndexOutOfBoundsException
[03:52:03,142][DEBUG][action.search.type       ] [Fontanelle] [12881063] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Blue Bullet][inet[/192.168.10.41:9300]][search/phase/fetch/id]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [12881063], timed out
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:444)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:435)
    at org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChannelHandler.java:175)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
[13:15:55,961][DEBUG][index.gateway            ] [Fontanelle][iannounce_object_1275333493][3] Snapshotting on close ...
[13:15:56,470][DEBUG][index.shard.recovery     ] [Fontanelle][iannounce_object_1275333493][3] Starting recovery from [Blue Bullet][b6b87b7e-89da-4644-b17a-2fb638b16041][inet[/192.168.10.41:9300]]
[13:16:50,518][DEBUG][index.shard.recovery     ] [Fontanelle][iannounce_object_1275333493][3] Recovery completed from [Blue Bullet][b6b87b7e-89da-4644-b17a-2fb638b16041][inet[/192.168.10.41:9300]], took[54s], throttling_wait [0s]
   Phase1: recovered [27] files with total size of [1.2g], took [53.5s], throttling_wait [0s]
   Phase2: recovered [1644] transaction log operations, took [390ms]
   Phase3: recovered [0] transaction log operations, took [65ms]
[22:34:33,612][DEBUG][action.search.type       ] [Fontanelle] [14161052] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException: [Blue Bullet][inet[/192.168.10.41:9300]][search/phase/query/id]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [14161052], timed out
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
    at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:185)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:388)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:379)
    at org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChannelHandler.java:175)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
[22:34:33,630][DEBUG][action.search.type       ] [Fontanelle] [6837580] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context found for id [6837580], timed out
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
    at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:309)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.executeFetch(TransportSearchDfsQueryThenFetchAction.java:226)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.access$700(TransportSearchDfsQueryThenFetchAction.java:64)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction$4.run(TransportSearchDfsQueryThenFetchAction.java:199)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
[22:34:33,630][DEBUG][action.search.type       ] [Fontanelle] [6837581] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context found for id [6837581], timed out
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
    at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:309)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.executeFetch(TransportSearchDfsQueryThenFetchAction.java:226)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.access$700(TransportSearchDfsQueryThenFetchAction.java:64)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction$4.run(TransportSearchDfsQueryThenFetchAction.java:199)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
[22:34:33,631][DEBUG][action.search.type       ] [Fontanelle] [6837579] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context found for id [6837579], timed out
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
    at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:309)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.executeFetch(TransportSearchDfsQueryThenFetchAction.java:226)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction.access$700(TransportSearchDfsQueryThenFetchAction.java:64)
    at org.elasticsearch.action.search.type.TransportSearchDfsQueryThenFetchAction$AsyncAction$4.run(TransportSearchDfsQueryThenFetchAction.java:199)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
[22:34:33,720][DEBUG][action.search.type       ] [Fontanelle] [14161053] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Blue Bullet][inet[/192.168.10.41:9300]][search/phase/fetch/id]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [14161053], timed out
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:444)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:435)
    at org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChannelHandler.java:175)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
[22:34:33,722][DEBUG][action.search.type       ] [Fontanelle] [14161054] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [Blue Bullet][inet[/192.168.10.41:9300]][search/phase/fetch/id]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [14161054], timed out
    at org.elasticsearch.search.SearchService.findContext(SearchService.java:274)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:257)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:444)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchFetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:435)
    at org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChannelHandler.java:175)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

@clintongormley
Copy link
Contributor Author

In 0.16.2, an occurrence of this problem (ie ES returning the wrong results) coincided with this error in the logs:

Caught exception while handling client http traffic, closing connection
java.lang.IllegalStateException: cannot send more responses than requests

Full logs https://gist.github.com/85a707a6edf8e8534b4a

@clintongormley
Copy link
Contributor Author

Fixed in issue #1152

costin pushed a commit that referenced this issue Dec 6, 2022
…198)

This PR adds the basic infrastructure for turning a physical plan into a list of drivers that can (locally) execute the given physical plan. It adds the PlanNode class to represent a physical plan (which is just a tree / digraph of PlanNode objects). The PR assumes that this physical plan makes sense (i.e. it does not do any kind of extra validation). It then implements a LocalExecutionPlanner to turn the given plan into a list of drivers, allowing parallel execution of the given plan (in-so-far as parallelism has been designed into the plan). It covers all the parallel executions explored as part of #189, showing the flexibility of the planner.
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Oct 2, 2023
Since February 7 we noticed an increase in the 100th percentile service
time of the `country_agg_cached` query in geonames when the trial
license is enabled. In order to dig whether this is related to JVM
activity (GC, safepoints) we temporarily enable GC logs. This is
expected to be low-impact on performance but allows us to dig deeper.

Relates elastic#181
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants