-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Occassionally, queries return results from the wrong index #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey, Very, very strange. Lets start with the second failures. Is there a chance that you get a long search requests or load that is higher than 5 minutes? The default timeout between a query phase and a fetch phase is 5 minutes (or controlled by setting the keep alive parameter). Second, regarding the first problem. This is very very strange. I think your assumption is correct, and for some reason, requests got mixed up. I don't see where it can happen in elasticsearch, but I will investigate further. It must be in the HTTP layer, and thats pretty straightforward. It might also be on the http client side..., trying to think how best to nail down where and why this happens.... (well, it can also be in the logging layer :) ) |
Re long search requests - I don't honestly know. I haven't seen that happening, but my app times out requests after 15 seconds, then retries, so it may well be hidden. I'm certainly not seeing anything unusual regarding CPU or memory use in jvisualvm. I think you'll need to add some more logging to nail it down. Also, this is v 0.7.1 - I haven't upgraded yet. |
When I said client side, I meant the perl http client, which might mix messages up, if that was not obvious :). I am trying to think of the best logging to add. I can generate a unique id for each rest request when it is received, and log the request, and then, log the response sent under that unique id, what do you think? |
I think it is highly unlikely that it is happening on the Perl side. I'm using a core module that has been stable for many years - something as obvious as this would have been noticed before. Also, it is not an async request, so nothing to mix up there. Re logging - you can add that if you like, but i think you're probably looking in the wrong place. The reason I posted the errors was that they happened at the same time as the bad results (i can't be certain that they were generated by the same request or not). I think it is likelier that the problem is node communication getting mixed up, while retrieving results from the other shards. but what do I know :) |
I agree with you, the perl side is probably not to blame. So, some background. You execute a dfs query then fetch. Thats three phases, first, the dfs, that goes to all shards. Then, query, that goes to all shards with the dfs result. Last, the fetch, that goes to the relevant shards. A certain shard communication might have got messes up, which I am trying to understand if can happen. In your response (if you still have all of it) does it look like some of the response are correct, while others are wrong? If the response looks like simply a response for a another query, then its more high level then that, if its a mixed result, with some good and some bad, then its a messed up specific shard communication in the search. When do you plan to upgrade to 0.8? It will be easier to help with that version. |
I don't have the full results any more, but looking at the above, the second hit matches the query correctly, but all the others (i think it was all) were from the wrong index. Hopefully I'll be on 0.8 next weekend. It won't be this weekend. |
Yea, missed the second hit. Ok, I will look into this more try and see why this might happen. Ping me when you upgrade to 0.8. |
Bad news, kimchy - i'm still getting this error on v 0.8 |
Do you still get the exceptions in the logs? |
There are still exceptions in the logs, but the times don't coincide, so I'm not sure that they're related:
|
In 0.16.2, an occurrence of this problem (ie ES returning the wrong results) coincided with this error in the logs: Caught exception while handling client http traffic, closing connection |
Fixed in issue #1152 |
…198) This PR adds the basic infrastructure for turning a physical plan into a list of drivers that can (locally) execute the given physical plan. It adds the PlanNode class to represent a physical plan (which is just a tree / digraph of PlanNode objects). The PR assumes that this physical plan makes sense (i.e. it does not do any kind of extra validation). It then implements a LocalExecutionPlanner to turn the given plan into a list of drivers, allowing parallel execution of the given plan (in-so-far as parallelism has been designed into the plan). It covers all the parallel executions explored as part of #189, showing the flexibility of the planner.
Since February 7 we noticed an increase in the 100th percentile service time of the `country_agg_cached` query in geonames when the trial license is enabled. In order to dig whether this is related to JVM activity (GC, safepoints) we temporarily enable GC logs. This is expected to be low-impact on performance but allows us to dig deeper. Relates elastic#181
Hiya
See the query below - I query the alias
ia_object
which points to the indexia_object_1274409377
, for objects of typenotice
, but the resuts returned include results from my other indexiannounce_object_1274399922
of types other thannotice
.I'm wondering if one node has dispatched the query, then picked up the wrong resultset?
There were some errors in the ES log around that time:
The text was updated successfully, but these errors were encountered: