random_sort on query with has_child eating insane amounts of memory in field data #20141

AndreCimander · 2016-08-24T14:09:49Z

Elasticsearch version: 2.3.4

Plugins installed: [license, marvel, kibana, kopf, elastic-hq]

JVM version: Java(TM) SE Runtime Environment (build 1.8.0_101-b13)

OS version: Ubuntu 14.04 with kernel 4.2

Description of the problem including expected versus actual behavior:

Hey everyone,

first of all, thanks for this fine piece of software! 👍

I noticed extremely high memory usage and cache trashing in our elasticsearch cluster, after some digging I pinned it down to one random-sort query with a has_child filter. We currently have about 70 million parents with 1.3 billion children.

Applying the query without the random_score function uses just a few GB of field data memory, using the random_sort skyrockets the field data memory usage to 60GB per query, which is a little... unsettling.

The query:
GET /instagram-user/instagram-user/_search { "query": { "function_score": { "filter": { "bool": { "must_not": [ { "exists": { "field": "calculated" } } ], "filter": [ { "term": { "private": false } }, { "has_child": { "query": { "bool": { "filter": [ { "term": { "instagram_user_id": "1397123079" } } ] } }, "score_mode": "none", "type": "instagram-like" } } ] } }, "functions": [ { "random_score": { "seed": 12345 } } ] } } }

I wonder if I missed some config parameter for the random_score, resulting in including all child documents and bloating the filter bitsets?

Happy to supply additional logs.

The text was updated successfully, but these errors were encountered:

clintongormley · 2016-08-24T14:14:37Z

No you're not missing anything. Unfortunately, this is the way it works at the moment. The problem is that random scoring uses the _uid field, which currently doesn't have doc values (see #11887). That means that you have to load the UID into fielddata (on the heap) which, considering the number of docs you have, is going to be costly.

Sorry I can't give you a better answer at the moment.

clintongormley closed this as completed Aug 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

random_sort on query with has_child eating insane amounts of memory in field data #20141

random_sort on query with has_child eating insane amounts of memory in field data #20141

AndreCimander commented Aug 24, 2016

clintongormley commented Aug 24, 2016

Uh oh!

random_sort on query with has_child eating insane amounts of memory in field data #20141

random_sort on query with has_child eating insane amounts of memory in field data #20141

Comments

AndreCimander commented Aug 24, 2016

clintongormley commented Aug 24, 2016

Uh oh!