Skip to content

Improve error messaging on allocation explain for IndexNotFoundException #53142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aliciascott opened this issue Mar 4, 2020 · 2 comments
Closed
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@aliciascott
Copy link

aliciascott commented Mar 4, 2020

Elasticsearch version (bin/elasticsearch --version): 7.6.0

Plugins installed: [] ESS

JVM version (java -version): ESS

OS version (uname -a if on a Unix-like system): ESS

Description of the problem including expected versus actual behavior:

Related to: #44054

Steps to reproduce:

  1. Add settings to IndexA that ensures it's allocation filter forces it to a single node
PUT IndexA/_settings
{
    "index.routing.allocation.include._ip" : xxx
}
}
  1. Shrink IndexA to IndexB
  2. Remove IndexA's allocation filter before IndexB has been allocated
  3. Note error in _cluster/allocation/explain below that leads you to believe the original source index is missing, when in fact the problem is that the shrink is supposed to be taking place on node defined in step 1, but all the shards of the source index IndexA are on other nodes.

Issue:
When previously there was an allocation filter on IndexA to force it onto a specific node, but later on when the filter is removed , the target shrunk index will never allocate.

Workaround:

  1. Reinstate the allocation filter on the source index
  2. Wait for the shards to finish moving
  3. POST _cluster/reroute?retry_failed
  4. Remove allocation filter on source index

Provide logs (if relevant):

Full cluster/allocation/explain:

{
  "index": "shrink-records-xmode-000033",
  "shard": 0,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "ALLOCATION_FAILED",
    "at": "2020-03-04T20:09:58.132Z",
    "failed_allocation_attempts": 5,
    "details": "failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]",
    "last_allocation_status": "no"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "ox_wykRmSba7gFAtcBJWnw",
      "node_name": "instance-0000000044",
      "transport_address": "xxxxx",
      "node_attributes": {
        "logical_availability_zone": "zone-0",
        "server_name": "instance-0000000044.xxxxxx",
        "availability_zone": "us-west1-a",
        "xpack.installed": "true",
        "data": "warm",
        "instance_configuration": "gcp.data.highstorage.1",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 1,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-03-04T20:09:58.132Z], failed_attempts[5], failed_nodes[[UynZBdX-SdGJIkSsinn71Q]], delayed=false, details[failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]], allocation_status[deciders_no]]]"
        },
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"UynZBdX-SdGJIkSsinn71Q\"] that hold a copy of every shard in the index"
        }
      ]
    },
    {
      "node_id": "9fT8-dGwTD-Ody3Whzb25Q",
      "node_name": "instance-0000000045",
      "transport_address": "xxxxx",
      "node_attributes": {
        "logical_availability_zone": "zone-1",
        "server_name": "instance-0000000045.xxxx",
        "availability_zone": "us-west1-c",
        "xpack.installed": "true",
        "data": "warm",
        "instance_configuration": "gcp.data.highstorage.1",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 2,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-03-04T20:09:58.132Z], failed_attempts[5], failed_nodes[[UynZBdX-SdGJIkSsinn71Q]], delayed=false, details[failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]], allocation_status[deciders_no]]]"
        },
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"UynZBdX-SdGJIkSsinn71Q\"] that hold a copy of every shard in the index"
        }
      ]
    },
    {
      "node_id": "UynZBdX-SdGJIkSsinn71Q",
      "node_name": "instance-0000000054",
      "transport_address": "xxxxx",
      "node_attributes": {
        "logical_availability_zone": "zone-2",
        "server_name": "instance-0000000054.xxxx",
        "availability_zone": "us-west1-b",
        "xpack.installed": "true",
        "data": "warm",
        "instance_configuration": "gcp.data.highstorage.1",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 3,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-03-04T20:09:58.132Z], failed_attempts[5], failed_nodes[[UynZBdX-SdGJIkSsinn71Q]], delayed=false, details[failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]], allocation_status[deciders_no]]]"
        }
      ]
    },
    {
      "node_id": "BiEo_vWcQ--x1zblcO7Fwg",
      "node_name": "instance-0000000053",
      "transport_address": "xxxxx",
      "node_attributes": {
        "logical_availability_zone": "zone-2",
        "server_name": "instance-0000000053.xxxxx",
        "availability_zone": "us-west1-b",
        "xpack.installed": "true",
        "data": "hot",
        "instance_configuration": "gcp.data.highio.1",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 4,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-03-04T20:09:58.132Z], failed_attempts[5], failed_nodes[[UynZBdX-SdGJIkSsinn71Q]], delayed=false, details[failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]], allocation_status[deciders_no]]]"
        },
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"UynZBdX-SdGJIkSsinn71Q\"] that hold a copy of every shard in the index"
        }
      ]
    },
    {
      "node_id": "ywuaR0Z4TZG5gqt_r_qF3w",
      "node_name": "instance-0000000052",
      "transport_address": "xxxxx",
      "node_attributes": {
        "logical_availability_zone": "zone-1",
        "server_name": "instance-0000000052.xxx",
        "availability_zone": "us-west1-c",
        "xpack.installed": "true",
        "data": "hot",
        "instance_configuration": "gcp.data.highio.1",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 5,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-03-04T20:09:58.132Z], failed_attempts[5], failed_nodes[[UynZBdX-SdGJIkSsinn71Q]], delayed=false, details[failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]], allocation_status[deciders_no]]]"
        },
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"UynZBdX-SdGJIkSsinn71Q\"] that hold a copy of every shard in the index"
        }
      ]
    },
    {
      "node_id": "xISVwgRRTwCMTbgjhiWL9g",
      "node_name": "instance-0000000051",
      "transport_address": "xxxx",
      "node_attributes": {
        "logical_availability_zone": "zone-0",
        "server_name": "instance-xxx",
        "availability_zone": "us-west1-a",
        "xpack.installed": "true",
        "data": "hot",
        "instance_configuration": "gcp.data.highio.1",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 6,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-03-04T20:09:58.132Z], failed_attempts[5], failed_nodes[[UynZBdX-SdGJIkSsinn71Q]], delayed=false, details[failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]], allocation_status[deciders_no]]]"
        },
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"UynZBdX-SdGJIkSsinn71Q\"] that hold a copy of every shard in the index"
        }
      ]
    }
  
@aliciascott aliciascott added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Mar 4, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Allocation)

@rjernst rjernst added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 4, 2020
@DaveCTurner
Copy link
Contributor

I'm not sure if anything has changed in this area since this issue was opened, but it's been a few years and we haven't seen any further reports of confusion in this area, so I'm closing this to reflect that it's unlikely we'll address it in the foreseeable future.

@DaveCTurner DaveCTurner closed this as not planned Won't fix, can't repro, duplicate, stale May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

4 participants