Query DSL: Terms filter to allow for terms lookup from another document #2674

kimchy · 2013-02-22T12:55:26Z

The terms filter requires providing all the terms as part of the filter itself. Allow to automatically extract them from an external document.

Here is an example:

# index the information for user with id 2, specifically, its friends
curl -XPUT localhost:9200/users/user/2 -d '{
   "friends" : ["1", "3"]
}'

# index a tweet, from user with id 2
curl -XPUT localhost:9200/tweets/tweet/1 -d '{
   "user" : "2"
}'

# search on all the tweets that match the friends of user 2
curl -XGET localhost:9200/tweets/_search -d '{
  "query" : {
    "filtered" : {
        "filter" : {
            "terms" : {
                "user" : {
                    "index" : "users",
                    "type" : "user",
                    "id" : "2",
                    "path" : "friends"
                },
                "_cache_key" : "user_2_friends"
            }
        }
    }
  }
}'

The above is higly optimized, both in a sense that the list of friends will not be fetched if the filter is already cached in the filter cache, and with internal LRU cache for fetching external values for the terms filter. Also, the entry in teh filter cache will not hold all the terms reducing the memory required for it.

_cache_key is recommedned to be set, so its simple to clear the cache associated with it using the clear cache API. For example:

curl -XPOST 'localhost:9200/tweets/_cache/clear?filter_keys=user_2_friends'

The structure of the external terms document can also include array of inner objects, for example:

curl -XPUT localhost:9200/users/user/2 -d '{
   "friends" : [
     {
       "id" : "1"
     },
     {
       "id" : "2"
     }
   ]
}'

In which case, the lookup path will be friends.id.

There is an additional cache involved, which caches the lookup of the lookup document to the actual terms. It is by default set to 10mb LRU size, but can be explicitly set using indices.cache.filter.terms.size.

Also, consider using an index with a single shard and fully replicated across all nodes if the "reference" terms data is not large. The lookup terms filter will prefer to execute the get request on a local node if possible, reducing the need for networking.

The text was updated successfully, but these errors were encountered:

Downchuck · 2013-02-22T17:16:38Z

This nearly finishes/fixes the feature issue #2671

telvis07 · 2013-04-26T18:46:02Z

In this example, shouldn't tweet/1 have user "1" or "3"? This example doesn't return hits for me but it does when I change it to 1 or 3. I have a gist here: https://gist.github.com/telvis07/5469479

loris · 2013-05-17T19:50:58Z

@kimchy Quick question about using this feature vs using the IDs filter
I have a use case where I would need to fetch IDs from an external datastore (mysql and redis) and make some get (with multi get) or search (with the IDs filter) in ElasticSearch against the list of documents matching the IDs.
The amount of IDs per search can vary from some dozens to a few thousands.
That said, will this perform poorly? Should I use the lookup term feature instead (would also mean that I would need to index the IDs and maintain sync with the primary datastores) ?

I will probably implement both for benchmark purpose but would love to hear from your feedback!

clintongormley · 2013-05-18T11:34:05Z

If you index the terms into ES, and use the "external terms filter", you
will get significantly better performance, because:

you greatly reduce the amount of network traffic
you greatly reduce the amount of query parsing
your filter will be cached after the first use, and thus very fast on
subsequent uses

clint

On 17 May 2013 21:51, Loris Guignard [email protected] wrote:

@kimchy https://github.com/kimchy Quick question about using this
feature vs using the IDs filter
I have a use case where I would need to fetch IDs from an external
datastore (mysql and redis) and make some get (with multi get) or search
(with the IDs filter) in ElasticSearch against the list of documents
matching the IDs.
The amount of IDs per search can vary from some dozens to a few thousands.
That said, will this perform poorly? Should I use the lookup term feature
instead (would also mean that I would need to index the IDs and maintain
sync with the primary datastores) ?

I will probably implement both for benchmark purpose but would love to
hear from your feedback!

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2674#issuecomment-18082512
.

junjun-zhang · 2014-01-12T23:07:31Z

This is a very useful feature. Just curious whether it's possible to generalize this to support JOIN. In the case of join, the list of lookup terms is not fetched from another document, but rather it's the result of a query from a related document. Replicating this related document in all nodes can also eliminate networking.

I have a use case where I need to embed a particular document under another related document as nested doc. As it is a many-to-many relationship, this embedding introduced a huge number of redundant docs. If JOIN is supported, I will not need to embed the actual doc, include a field keeping the related doc IDs will be sufficient.

It seems Solr supports join in a similar fashion: http://wiki.apache.org/solr/Join. It is somewhat limited, but if used properly, it can be very helpful.

mattweber · 2014-01-13T16:17:31Z

@junjun-zhang see #3278. Hopefully @martijnvg and @kimchy will get a chance to have a look at this soon.

brupm · 2015-04-17T22:09:46Z

In this example:

curl -XGET localhost:9200/tweets/_search -d '{
  "query" : {
    "filtered" : {
        "filter" : {
            "terms" : {
                "user" : {
                    "index" : "users",
                    "type" : "user",
                    "id" : "2",
                    "path" : "friends"
                }
            }
        }
    }
  }
}

Say I wanted to pass an array of ids instead of a single id as it's shown "id" : "2"

Reason is I have several documents I want to combine.

clintongormley · 2015-04-25T14:41:31Z

@brupm then just use several terms lookup filters, wrapped in a bool.should filter. Doing this lookup is not cheap, so I would prefer not to add syntax that makes it look cheap to the naive user.

brupm · 2015-04-25T18:09:16Z

Is there an upper limit on who many terms filters I can have wrapped in a bool.should? @clintongormley - thank you!

clintongormley · 2015-04-26T17:52:05Z

Probably 1024, which should be more than enough...

banupriya20 · 2016-10-05T06:24:42Z

please provide a suggestion on this Index 1 and index 2 had common entity (Ex. Empl no.)
how to Create join query to search on index1 and get the document from index2 based on the common entity(emp no)

saralamuralikrishna · 2017-09-18T11:32:46Z

I am getting only 400 even if the lookup type has 524 documnts. any suggestion on what could be wrong. below is the query
{"from":0,"size":1000,"sort":[{"Id":{"order":"asc"}}],"query":{"terms":{"Id.Raw":{"index":"myindex","type":"Infos","id":32939,"path":"ArticleNumbers"}}}}

kimchy closed this as completed in 03fdc6a Feb 22, 2013

Downchuck mentioned this issue Feb 22, 2013

Support autocomplete indexes on select fields #2671

Closed

Mpdreamz mentioned this issue Jun 10, 2013

0.11.0.0 elastic/elasticsearch-net#283

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query DSL: Terms filter to allow for terms lookup from another document #2674

Query DSL: Terms filter to allow for terms lookup from another document #2674

kimchy commented Feb 22, 2013

Downchuck commented Feb 22, 2013

telvis07 commented Apr 26, 2013

loris commented May 17, 2013

clintongormley commented May 18, 2013

junjun-zhang commented Jan 12, 2014

mattweber commented Jan 13, 2014

brupm commented Apr 17, 2015

clintongormley commented Apr 25, 2015

brupm commented Apr 25, 2015

clintongormley commented Apr 26, 2015

banupriya20 commented Oct 5, 2016

saralamuralikrishna commented Sep 18, 2017

Query DSL: Terms filter to allow for terms lookup from another document #2674

Query DSL: Terms filter to allow for terms lookup from another document #2674

Comments

kimchy commented Feb 22, 2013

Downchuck commented Feb 22, 2013

telvis07 commented Apr 26, 2013

loris commented May 17, 2013

clintongormley commented May 18, 2013

junjun-zhang commented Jan 12, 2014

mattweber commented Jan 13, 2014

brupm commented Apr 17, 2015

clintongormley commented Apr 25, 2015

brupm commented Apr 25, 2015

clintongormley commented Apr 26, 2015

banupriya20 commented Oct 5, 2016

saralamuralikrishna commented Sep 18, 2017