Skip to content

Fix missing index exception handling #126738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Apr 24, 2025

Conversation

smalyshev
Copy link
Contributor

@smalyshev smalyshev commented Apr 12, 2025

This fixes the issue with missing indices being ignored (see #126275).

I am not in love with this because it looks like a hack, and proper fix would be to move handling of this into verifier on the planning stage, but the way field caps works now it doesn't seem possible without protocol changes.

Also, it doesn't fix the case of

FROM transform-es-search-*,missing | LIMIT 0

which now seems to be broken regardless of partials being enabled (see #114495)

I think this patch makes sense in any case as from what I understand we never want to ignore missing concrete index, but with proper index resolution fix the only way we should ever encounter it would be if the index is gone between planning and execution stage.

@smalyshev smalyshev added :Analytics/ES|QL AKA ESQL >bug v8.19.0 auto-backport Automatically create backport pull requests when merged labels Apr 12, 2025
@smalyshev smalyshev requested a review from Copilot April 12, 2025 00:23
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java:324

  • [nitpick] Consider using the negation operator instead of '== false' (i.e. '&& !(ExceptionsHelper.unwrapCause(e) instanceof IndexNotFoundException)') to improve readability.
&& (ExceptionsHelper.unwrapCause(e) instanceof IndexNotFoundException) == false) {

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ClusterComputeHandler.java:94

  • [nitpick] Consider replacing '== false' with a negation operator (i.e. '&& !(ExceptionsHelper.unwrapCause(e) instanceof IndexNotFoundException)') to enhance clarity.
&& (ExceptionsHelper.unwrapCause(e) instanceof IndexNotFoundException) == false) {

@elasticsearchmachine
Copy link
Collaborator

Hi @smalyshev, I've created a changelog YAML for you.

@smalyshev smalyshev changed the title Fix missing index exception Fix missing index exception handling Apr 12, 2025
@smalyshev smalyshev marked this pull request as ready for review April 23, 2025 15:32
@smalyshev smalyshev requested a review from quux00 April 23, 2025 15:32
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch labels Apr 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@smalyshev smalyshev requested a review from dnhatn April 23, 2025 15:32
Copy link
Contributor

@quux00 quux00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@smalyshev smalyshev enabled auto-merge (squash) April 23, 2025 17:33
@smalyshev smalyshev disabled auto-merge April 23, 2025 17:33
@smalyshev smalyshev enabled auto-merge (squash) April 23, 2025 17:40
@smalyshev smalyshev disabled auto-merge April 23, 2025 17:43
@@ -138,32 +138,6 @@ public void testFailed() throws Exception {
assertThat(telemetry.getByRemoteCluster().size(), equalTo(0));
}

// TODO: enable when skip-un patch is merged
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just old test we never enabled and we don't really need it anymore, so cleaning it up.

@dnhatn
Copy link
Member

dnhatn commented Apr 23, 2025

Thanks @smalyshev.

I took a look at the issue and I think there are several problems:

  1. There is a disparity in the indices option between the field-caps API (planning time) and the search-shards API (runtime). We use ALLOW_UNAVAILABLE_TARGETS for field-caps and ERROR_WHEN_UNAVAILABLE_TARGETS for search-shards. This leads to cases where field-caps does not return failures, but the runtime does. With allow_partial_results, we then ignore the runtime failures and return partial results instead of failing the request.

  2. We do not strictly check the index failures returned by the field-caps API.

  3. Another issue is related to security exceptions. Since we use ALLOW_UNAVAILABLE_TARGETS in the field-caps API, it returns unknown index if users lack the privilege to access it. However, if multiple index patterns are specified, we return an unauthorized error from the runtime instead (see EsqlSecurityIT).

  4. There are cases where we return a 400 error, and others where we return a 404.

I wonder if we should spend more time addressing these inconsistencies rather than fixing the surface issues. I am concerned that more surface fixes could accumulate over time. (@astefan WDYT?)

I can give it a try to address these.

@smalyshev
Copy link
Contributor Author

@dnhatn You are right, to make a proper and comprehensive fix we need to change how we call field-caps, and maybe to change field-caps API. This patch is not that fix, it's just a quick plug to unblock progress with partial results. I agree that we should work on fixing this, I am just not sure how long it could take, so if partial results work remains blocked until we finish fixing that, and it takes long, we may have problem wrapping it up in time for 8.19. But if you can think about a fix that we can implement reasonably quickly then it's of course better.

That said, I think even after that fix, if we get IndexNotFound in runtime, we probably need to produce an error. I can't see any scenario where we get a missing index and don't want an error?

The discrepancy between 400 and 404 is because VerificationError is 400, and runtime not-found error is 404. If we alawys want 404 on unknown index, we'd need to make verification error more flexible, probably.

@dnhatn
Copy link
Member

dnhatn commented Apr 24, 2025

@smalyshev I agree. Let's try this until the end of the week. If we can't find a proper fix, we should proceed with this to unblock the partial results.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, I am okay with getting this in now. We can make the proper changes later. Thanks @smalyshev

@smalyshev smalyshev merged commit 54ef165 into elastic:main Apr 24, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126738

@smalyshev
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

smalyshev added a commit to smalyshev/elasticsearch that referenced this pull request Apr 24, 2025
* Fix missing index handling for partial-enabled queries

(cherry picked from commit 54ef165)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ClusterComputeHandler.java
elasticsearchmachine pushed a commit that referenced this pull request Apr 24, 2025
* Fix missing index handling for partial-enabled queries

(cherry picked from commit 54ef165)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ClusterComputeHandler.java
@astefan
Copy link
Contributor

astefan commented Apr 24, 2025

Thanks @smalyshev.

I took a look at the issue and I think there are several problems:

  1. There is a disparity in the indices option between the field-caps API (planning time) and the search-shards API (runtime). We use ALLOW_UNAVAILABLE_TARGETS for field-caps and ERROR_WHEN_UNAVAILABLE_TARGETS for search-shards. This leads to cases where field-caps does not return failures, but the runtime does. With allow_partial_results, we then ignore the runtime failures and return partial results instead of failing the request.
  2. We do not strictly check the index failures returned by the field-caps API.
  3. Another issue is related to security exceptions. Since we use ALLOW_UNAVAILABLE_TARGETS in the field-caps API, it returns unknown index if users lack the privilege to access it. However, if multiple index patterns are specified, we return an unauthorized error from the runtime instead (see EsqlSecurityIT).
  4. There are cases where we return a 400 error, and others where we return a 404.

I wonder if we should spend more time addressing these inconsistencies rather than fixing the surface issues. I am concerned that more surface fixes could accumulate over time. (@astefan WDYT?)

I can give it a try to address these.

@dnhatn I agree. The difference between field_caps and _search time behavior has surfaced previously in one of your earlier PRs, but with partial results the impact is more severe. I agree that we need to have a clean look at the way we use ignore_unvailable in these two cases; it is likely this was a leftover from ES SQL where ignore_unavailable=true for field_caps might have had a very specific use (one that is, so far, not needed in ES|QL).

@smalyshev
Copy link
Contributor Author

@dnhatn @astefan I tried to switch the ignore_unavailable here: #126737 to see what happens but it causes a ton of test failures, some of them to do with authorization, so I wasn't able to triage them properly yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug >non-issue :Search Foundations/CCS Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants