Skip to content

Using inner_hits on nested query causes an index_out_of_bounds_exception #25315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BLZB0B opened this issue Jun 20, 2017 · 6 comments
Closed
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@BLZB0B
Copy link

BLZB0B commented Jun 20, 2017

raised here following this conversation with Martijn .

Config: ES 5.4, AWS Linux, single node (test server), 3 shards, 0 replicas.

OS version (uname -a if on a Unix-like system):
Linux 4.9.27-14.31.amzn1.x86_64 #1 SMP Wed May 10 01:58:40 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Plugins installed: [none]

JVM version (java -version): build 1.8.0_131-b11

Description of the problem including expected versus actual behavior:

We have a set of documents which are replicated to Elastic from Couchbase.

in the document, we have a "holiday product", this product has a nested array of "prices", this price object contains fields such as number of passengers, date, promo code, price etc. For each holiday product, this list of prices can be very long as there can be hundreds/thousands of permutations.

When running a query against the data, we don't want to carry the 1000's of lines of data over the wire (can be >5MB) so are trying to use inner_hits to only return the rows that match. (approx 10KB)

we have a query such as:

{
"_source": false,
   "query": {
    "nested": {
      "path": "doc.products.calculatedPrices",
      "score_mode": "avg",
        "query": {
        "bool": {
          "must": [
            { "match": { "doc.products.calculatedPrices.promoCode": "DRH725C" } },
            { "match": { "doc.products.calculatedPrices.pax": 2 } },
            { "range": { "doc.products.calculatedPrices.now" : {"gte": 100, "lte":120} } }
           ]
        }    
	  },
	  "inner_hits": {}
    }
  }
}

this generates the error:

{
    "took": 41,
    "timed_out": false,
    "_shards": {
        "total": 3,
        "successful": 2,
        "failed": 1,
        "failures": [
            {
                "shard": 2,
                "index": "testavail5",
                "node": "AfXKKfqrSPqOrNO7mLreEQ",
                "reason": {
                    "type": "index_out_of_bounds_exception",
                    "reason": "Index: 7290, Size: 134"
                }
            }
        ]
    },
    "hits": {
        "total": 62,
        "max_score": 5.845086,
        "hits": []
    }
}

On this test server, we are running a single node with 3 shards. so not sure why one shard reports as failed.

if we change the
inner_hits": {} to inner_hits": {"_source":false}

we get results but obviously don't get any useful information in the output!

trace log:

[2017-06-20T11:46:58,972][TRACE][o.e.s.SearchService      ] [AfXKKfq] Fetch phase failed
java.lang.IndexOutOfBoundsException: Index: 7290, Size: 134
        at java.util.ArrayList.rangeCheck(ArrayList.java:653) ~[?:1.8.0_131]
        at java.util.ArrayList.get(ArrayList.java:429) ~[?:1.8.0_131]
        at org.elasticsearch.search.fetch.FetchPhase.createNestedSearchHit(FetchPhase.java:256) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:150) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.search.fetch.subphase.InnerHitsFetchSubPhase.hitExecute(InnerHitsFetchSubPhase.java:65) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:161) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:417) [elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.action.search.SearchTransportService$12.messageReceived(SearchTransportService.java:394) [elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.action.search.SearchTransportService$12.messageReceived(SearchTransportService.java:391) [elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:627) [elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.4.0.jar:5.4.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2017-06-20T11:46:58,973][DEBUG][o.e.a.s.TransportSearchAction] [AfXKKfq] [6] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [AfXKKfq][172.31.35.8:9300][indices:data/read/search[phase/fetch/id]]
Caused by: java.lang.IndexOutOfBoundsException: Index: 7290, Size: 134
        at java.util.ArrayList.rangeCheck(ArrayList.java:653) ~[?:1.8.0_131]
        at java.util.ArrayList.get(ArrayList.java:429) ~[?:1.8.0_131]
        at org.elasticsearch.search.fetch.FetchPhase.createNestedSearchHit(FetchPhase.java:256) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:150) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.search.fetch.subphase.InnerHitsFetchSubPhase.hitExecute(InnerHitsFetchSubPhase.java:65) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:161) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:417) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.action.search.SearchTransportService$12.messageReceived(SearchTransportService.java:394) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.action.search.SearchTransportService$12.messageReceived(SearchTransportService.java:391) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:627) [elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.4.0.jar:5.4.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.4.0.jar:5.4.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
@martijnvg
Copy link
Member

@BLZB0B Are you able to share the 62 documents that matched with that query? (sending it me privately is good too) This would make it easier to reproduce the error. I haven't been able yet to figure out what the cause is.

@BLZB0B
Copy link
Author

BLZB0B commented Jun 21, 2017

Hi Martijn,

I've sent a link to your gmail account..

Regards

Phil

@martijnvg
Copy link
Member

martijnvg commented Jun 21, 2017

@BLZB0B I've taken a look at the document that causes this error and your mapping and the reason this happens is because the logic that extracts the relevant nested part from the _source doesn't work well when a nested object field is wrapped inside an object field.

In your case the calculatedPrices nested field is wrapped in a products object field, which is again wrapped in a doc object field. The extract nested source logic falsely assumes that all calculatedPrices elements are in the first level, while these elements actually are on the third level.

I think the nested source extraction logic can be fixed to flatten all the levels that don't use a nested field mapper, but it is going to make the this logic more complicated and unfortunately it is already complicated. I currently lean towards throwing a descriptive error (including a hint with a workaround) in the case of when nested fields are wrapped by regular object fields. Also if we in the future decide to change how the source is stored (as is descibed in #9034) then the extraction of the nested source is no longer needed.

There are two workarounds:

  • Change doc and products field to be of type nested then the extract nested source logic should work. This does also mean that you need to nested the nested query, so for the query you shared, you will need to first use nested query for doc level then for the doc.products level and then doc.products.calculatedPrices level.
  • The workaround that you're already using; disabling fetching the nested source (_source to false inside inner hits definition) and enable fetching nested doc values fields (using docvalue_fields parameter in the inner hits definition).

@BLZB0B
Copy link
Author

BLZB0B commented Jun 22, 2017

Thanks Martijn,

I've updated our mapping as suggested and are getting results back with "inner_hits": { } rather than the error mentioned above.

A more useful error message would help as you suggested.

I have a question related to the above but not part of the bug, so will switch to the forum rather than ask here if that's OK.

Regards

Phil

@martijnvg martijnvg removed the discuss label Jun 23, 2017
@martijnvg
Copy link
Member

This was discussed in the fix it friday meeting and there was agreement on making the nested source extraction not more complicated than it already is. So we should throw a descriptive error and document the two possible workarounds.

@BLZB0B
Copy link
Author

BLZB0B commented Jun 23, 2017

Thanks @martijnvg

martijnvg added a commit that referenced this issue Sep 19, 2017
…has a object field

as parent field and that parent field is defined as an array field in the _source
of the document inner hits are being computed for.

Closes #25315
martijnvg added a commit that referenced this issue Sep 19, 2017
…has a object field

as parent field and that parent field is defined as an array field in the _source
of the document inner hits are being computed for.

Closes #25315
martijnvg added a commit that referenced this issue Sep 19, 2017
…has a object field

as parent field and that parent field is defined as an array field in the _source
of the document inner hits are being computed for.

Closes #25315
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Inner Hits labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

3 participants