Skip to content

Remove _field_stats endpoint #25577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jimczi opened this issue Jul 6, 2017 · 24 comments
Closed

Remove _field_stats endpoint #25577

jimczi opened this issue Jul 6, 2017 · 24 comments
Labels
blocker >breaking :Search/Search Search-related issues that do not fall into other categories

Comments

@jimczi
Copy link
Contributor

jimczi commented Jul 6, 2017

_field_stats endpoint has been deprecated in 5 and should be removed in 6.

@jimczi jimczi added :Search/Search Search-related issues that do not fall into other categories blocker >breaking labels Jul 6, 2017
@jimczi
Copy link
Contributor Author

jimczi commented Jul 6, 2017

@Bargs @spalger can you confirm that Kibana 6 will not use _field_stats at all and rely only on _field_caps and ES request cache for aggs ?

@Bargs
Copy link

Bargs commented Jul 6, 2017

We no longer use field_stats to get field info at index pattern creation time, but we do still use it prior to executing searches for some index patterns. There are plans to remove that option (it's already deprecated) but it does not look like that's happened yet. I'm not sure if it's targeted for 6.0. @epixa ?

@spalger
Copy link
Contributor

spalger commented Jul 6, 2017

Pretty sure we are still planning to ship the "expand indices" option with K 6.0 (where we convert index* into an index list using the _field_stats API). IIRC it was intended as a backup plan in case the wildcard optimizations that ES has implemented aren't sufficient for some users.

If we are confident that removing this optimization from Kibana won't result in worse, or specifically unusable, performance then I'm not opposed to removing it now.

@jimczi
Copy link
Contributor Author

jimczi commented Jul 6, 2017

If we are confident that removing this optimization from Kibana won't result in worse, or specifically unusable, performance then I'm not opposed to removing it now.

If the optim is to check if an index matches document in a specific timeframe with _field_stats then I think it's safe to remove. The only difference with the _field_stats solution is that the optim will happen locally on each shard rather than on Kibana side. If you have 2000 shards you'll send 2000 queries that should return very fast if the min/max timeframe is empty for them.
@colings86 should confirm but the min/max rewriting for time range queries plus the request cache should be enough to have decent perf even when doing a gazillion shard query.

@colings86
Copy link
Contributor

@colings86 should confirm but the min/max rewriting for time range queries plus the request cache should be enough to have decent perf even when doing a gazillion shard query.

I agree

@trevan
Copy link

trevan commented Jul 10, 2017

With _field_stats gone, how does one easily calculate "old indices"? https://www.elastic.co/blog/managing-time-based-indices-efficiently has us using _field_stats and I don't think a min/max aggregation will be performant against lots of indices. I had mentioned this at #23914 (comment) as well but no one seemed to respond.

A problem with the min/max rewriting that I've just noticed is that it appears to require some warming up that is painful on our cluster. Our Kibana currently uses the _field_stats for index calculation and requests usually are <10 seconds. But if I run the same query without the index subset, the first time takes several minutes. After that, they are <10 seconds (though a few seconds slower). This warm up seems to happen if I haven't run a query in an hour and it increases the load average in my cluster fairly substantially.

@spalger
Copy link
Contributor

spalger commented Jul 10, 2017

@trevan sounds like you would benefit from #14835

@trevan
Copy link

trevan commented Jul 10, 2017

@spalger, thanks for pointing that out. Unfortunately, we do have deletions and updates. Not a lot (<5% of the data), though.

@jimczi
Copy link
Contributor Author

jimczi commented Jul 10, 2017

@trevan the "expand time based indices" feature should be done on ES side transparently. That's what we're trying to achieve. Kibana does not replace _field_stats with a min/max aggregation but simply remove the optimization and let ES do the right thing regarding the time range filters.
Now what you're describing is not expected, can you share the query that takes several minutes "without the index subset" ?

@trevan
Copy link

trevan commented Jul 10, 2017

@jimczi, I know Kibana is trying to let ES handle the time range filters. I guess that must have been lost in what I said.

Here's the query that can take several minutes:

localhost:9200/logstash-*/_search?pretty' -d'{"size":0,"aggs":{"suggestions":{"terms":{"field":"ip.raw"}}},"query":{"range":{"@timestamp":{"gte":1499640596171,"lte":1499726996171,"format":"epoch_millis"}}}}'

I just did a test to grab the times using both the old _field_stats method and the new "let ES do it all". I did the following 3 steps in order.

  1. _field_stats call took <1s
  2. Above request using the index list took 6 seconds the first time and then 2 seconds the next two
  3. Above request using the wildcard pattern took 47 seconds the first time and then 3 seconds the next two

This is using version 5.4.1

Should this be moved to a separate issue?

@epixa
Copy link
Contributor

epixa commented Jul 13, 2017

An update for those following along: the field_stats API was removed from master today, and the corresponding "expand indices" option in Kibana was removed as well.

This was made possible by a new change to the search/msearch APIs that automatically optimizes requests to only hit a subset of relevant shards that actually could have documents that match the given filters when there are a non-trivial amount of shards that match the given index pattern. This is a pretty naive summary of the change, so I encourage folks to take a look at the PR that added this improvement for more details: #25658

@jimczi
Copy link
Contributor Author

jimczi commented Jul 13, 2017

@trevan we fixed a long list of issues to make _field_stats completely obsolete in the last few days: #25658 and #25632 are the main one.
They are all merged in master (v6) and should bring significant improvements in Kibana regarding the handling of time range filters. The issue that you described should also be fixed so _field_stats is effectively gone but so much more has been added to es core in the mean time to make the transition transparent in v6.

@trevan
Copy link

trevan commented Jul 13, 2017

@jimczi, can these changes be backported to 5.x? I'd like to be able to see if these changes will actually help while staying in 5.x where _field_stats is still available as a back up option.

@s1monw
Copy link
Contributor

s1monw commented Jul 13, 2017

@jimczi, can these changes be backported to 5.x? I'd like to be able to see if these changes will actually help while staying in 5.x where _field_stats is still available as a back up option.

we don't plan to backport these changes into 5.x

@s1monw
Copy link
Contributor

s1monw commented Jul 13, 2017

we don't plan to backport these changes into 5.x

that said, I will spend some time looking into backporting it since I see the benefit here for a broader audience. so bare with me I will update this issue @trevan

@trevan
Copy link

trevan commented Jul 13, 2017

@s1monw, are all the changes basically on the coordinating node? So if I have a 6.x client node talking to my 5.x cluster, would I be able to see if this is performant enough? My worry is that I currently have to use _field_stats in 5.x because of the performance hit and I won't be able to know if you've made it performant enough in 6.x until after I upgrade and by that time, I'm now stuck. That's why I'm asking about a backport. I would like someway to test these changes in my cluster before loosing _field_stats.

@epixa
Copy link
Contributor

epixa commented Jul 13, 2017

@trevan I can't answer your question in terms of Elasticsearch, but Kibana 5.5 will fail with a red status if it targets an ES cluster that has a 6.x node in it. This requirement will be relaxed in the final version of 5.x that is released to support a migration/rolling upgrade scenario, but you'll need to test the performance of this change with a raw query to Elasticsearch in that scenario.

@trevan
Copy link

trevan commented Jul 13, 2017

@epixa, yeah, I was planning on doing a raw query. That's how I've been testing out the non _field_stats queries so far since it would otherwise Kibana would kill our cluster.

@s1monw
Copy link
Contributor

s1monw commented Jul 13, 2017

@trevan you can only see these changes once you upgraded to the upcoming 5.6 and use a 6.0 client node that is correct. I will nevertheless look into backporting this feature it seems low risk at this poing

@s1monw
Copy link
Contributor

s1monw commented Jul 15, 2017

@trevan FYI I back-ported the change that went into master to 5.x it will be released with 5.6

@s1monw
Copy link
Contributor

s1monw commented Jul 17, 2017

@trevan it would be very much appreciated if you could report back once you upgraded to 5.6 is this helps you or not. The sooner the better, we might still have time to fix stuff if needed. Note, your nodes must all run 5.6 or higher in order to make use of the optimization. If you have questions about how you should structure your query please feel free to shard your current setup. Ideally we would do this in a discuss forum instead of here but you are more than welcome to paste the link to the discuss thread once you created it so we can follow up

@trevan
Copy link

trevan commented Jul 17, 2017

@s1monw, thanks for the backport. I'll try and get us upgraded in the next month or two and report back.

@s1monw
Copy link
Contributor

s1monw commented Sep 11, 2017

@trevan 5.6.0 is out. If would be fantastic if you could report back on this.

@trevan
Copy link

trevan commented Oct 30, 2017

@s1monw, we just upgraded to 5.6.x. I ran the logstash-* version and then waited 5 hours before running the index list version. Both of them took about 16 seconds the first time. So it is looking good. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker >breaking :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

7 participants