You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
JVM version: Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
OS version: Ubuntu 14.04 with kernel 4.2
Description of the problem including expected versus actual behavior:
Hey everyone,
first of all, thanks for this fine piece of software! 👍
I noticed extremely high memory usage and cache trashing in our elasticsearch cluster, after some digging I pinned it down to one random-sort query with a has_child filter. We currently have about 70 million parents with 1.3 billion children.
Applying the query without the random_score function uses just a few GB of field data memory, using the random_sort skyrockets the field data memory usage to 60GB per query, which is a little... unsettling.
No you're not missing anything. Unfortunately, this is the way it works at the moment. The problem is that random scoring uses the _uid field, which currently doesn't have doc values (see #11887). That means that you have to load the UID into fielddata (on the heap) which, considering the number of docs you have, is going to be costly.
Sorry I can't give you a better answer at the moment.
Elasticsearch version: 2.3.4
Plugins installed: [license, marvel, kibana, kopf, elastic-hq]
JVM version: Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
OS version: Ubuntu 14.04 with kernel 4.2
Description of the problem including expected versus actual behavior:
Hey everyone,
first of all, thanks for this fine piece of software! 👍
I noticed extremely high memory usage and cache trashing in our elasticsearch cluster, after some digging I pinned it down to one random-sort query with a has_child filter. We currently have about 70 million parents with 1.3 billion children.
Applying the query without the random_score function uses just a few GB of field data memory, using the random_sort skyrockets the field data memory usage to 60GB per query, which is a little... unsettling.
The query:
GET /instagram-user/instagram-user/_search { "query": { "function_score": { "filter": { "bool": { "must_not": [ { "exists": { "field": "calculated" } } ], "filter": [ { "term": { "private": false } }, { "has_child": { "query": { "bool": { "filter": [ { "term": { "instagram_user_id": "1397123079" } } ] } }, "score_mode": "none", "type": "instagram-like" } } ] } }, "functions": [ { "random_score": { "seed": 12345 } } ] } } }
I wonder if I missed some config parameter for the random_score, resulting in including all child documents and bloating the filter bitsets?
Happy to supply additional logs.
The text was updated successfully, but these errors were encountered: