Improve error messaging on allocation explain for IndexNotFoundException #53142

aliciascott · 2020-03-04T23:33:31Z

Elasticsearch version (bin/elasticsearch --version): 7.6.0

Plugins installed: [] ESS

JVM version (java -version): ESS

OS version (uname -a if on a Unix-like system): ESS

Description of the problem including expected versus actual behavior:

Related to: #44054

Steps to reproduce:

Add settings to IndexA that ensures it's allocation filter forces it to a single node

PUT IndexA/_settings
{
    "index.routing.allocation.include._ip" : xxx
}
}

Shrink IndexA to IndexB
Remove IndexA's allocation filter before IndexB has been allocated
Note error in _cluster/allocation/explain below that leads you to believe the original source index is missing, when in fact the problem is that the shrink is supposed to be taking place on node defined in step 1, but all the shards of the source index IndexA are on other nodes.

Issue:
When previously there was an allocation filter on IndexA to force it onto a specific node, but later on when the filter is removed , the target shrunk index will never allocate.

Workaround:

Reinstate the allocation filter on the source index
Wait for the shards to finish moving
POST _cluster/reroute?retry_failed
Remove allocation filter on source index

Provide logs (if relevant):

Full cluster/allocation/explain:

{
  "index": "shrink-records-xmode-000033",
  "shard": 0,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "ALLOCATION_FAILED",
    "at": "2020-03-04T20:09:58.132Z",
    "failed_allocation_attempts": 5,
    "details": "failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]",
    "last_allocation_status": "no"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "ox_wykRmSba7gFAtcBJWnw",
      "node_name": "instance-0000000044",
      "transport_address": "xxxxx",
      "node_attributes": {
        "logical_availability_zone": "zone-0",
        "server_name": "instance-0000000044.xxxxxx",
        "availability_zone": "us-west1-a",
        "xpack.installed": "true",
        "data": "warm",
        "instance_configuration": "gcp.data.highstorage.1",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 1,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-03-04T20:09:58.132Z], failed_attempts[5], failed_nodes[[UynZBdX-SdGJIkSsinn71Q]], delayed=false, details[failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]], allocation_status[deciders_no]]]"
        },
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"UynZBdX-SdGJIkSsinn71Q\"] that hold a copy of every shard in the index"
        }
      ]
    },
    {
      "node_id": "9fT8-dGwTD-Ody3Whzb25Q",
      "node_name": "instance-0000000045",
      "transport_address": "xxxxx",
      "node_attributes": {
        "logical_availability_zone": "zone-1",
        "server_name": "instance-0000000045.xxxx",
        "availability_zone": "us-west1-c",
        "xpack.installed": "true",
        "data": "warm",
        "instance_configuration": "gcp.data.highstorage.1",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 2,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-03-04T20:09:58.132Z], failed_attempts[5], failed_nodes[[UynZBdX-SdGJIkSsinn71Q]], delayed=false, details[failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]], allocation_status[deciders_no]]]"
        },
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"UynZBdX-SdGJIkSsinn71Q\"] that hold a copy of every shard in the index"
        }
      ]
    },
    {
      "node_id": "UynZBdX-SdGJIkSsinn71Q",
      "node_name": "instance-0000000054",
      "transport_address": "xxxxx",
      "node_attributes": {
        "logical_availability_zone": "zone-2",
        "server_name": "instance-0000000054.xxxx",
        "availability_zone": "us-west1-b",
        "xpack.installed": "true",
        "data": "warm",
        "instance_configuration": "gcp.data.highstorage.1",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 3,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-03-04T20:09:58.132Z], failed_attempts[5], failed_nodes[[UynZBdX-SdGJIkSsinn71Q]], delayed=false, details[failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]], allocation_status[deciders_no]]]"
        }
      ]
    },
    {
      "node_id": "BiEo_vWcQ--x1zblcO7Fwg",
      "node_name": "instance-0000000053",
      "transport_address": "xxxxx",
      "node_attributes": {
        "logical_availability_zone": "zone-2",
        "server_name": "instance-0000000053.xxxxx",
        "availability_zone": "us-west1-b",
        "xpack.installed": "true",
        "data": "hot",
        "instance_configuration": "gcp.data.highio.1",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 4,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-03-04T20:09:58.132Z], failed_attempts[5], failed_nodes[[UynZBdX-SdGJIkSsinn71Q]], delayed=false, details[failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]], allocation_status[deciders_no]]]"
        },
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"UynZBdX-SdGJIkSsinn71Q\"] that hold a copy of every shard in the index"
        }
      ]
    },
    {
      "node_id": "ywuaR0Z4TZG5gqt_r_qF3w",
      "node_name": "instance-0000000052",
      "transport_address": "xxxxx",
      "node_attributes": {
        "logical_availability_zone": "zone-1",
        "server_name": "instance-0000000052.xxx",
        "availability_zone": "us-west1-c",
        "xpack.installed": "true",
        "data": "hot",
        "instance_configuration": "gcp.data.highio.1",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 5,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-03-04T20:09:58.132Z], failed_attempts[5], failed_nodes[[UynZBdX-SdGJIkSsinn71Q]], delayed=false, details[failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]], allocation_status[deciders_no]]]"
        },
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"UynZBdX-SdGJIkSsinn71Q\"] that hold a copy of every shard in the index"
        }
      ]
    },
    {
      "node_id": "xISVwgRRTwCMTbgjhiWL9g",
      "node_name": "instance-0000000051",
      "transport_address": "xxxx",
      "node_attributes": {
        "logical_availability_zone": "zone-0",
        "server_name": "instance-xxx",
        "availability_zone": "us-west1-a",
        "xpack.installed": "true",
        "data": "hot",
        "instance_configuration": "gcp.data.highio.1",
        "region": "unknown-region"
      },
      "node_decision": "no",
      "weight_ranking": 6,
      "deciders": [
        {
          "decider": "max_retry",
          "decision": "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-03-04T20:09:58.132Z], failed_attempts[5], failed_nodes[[UynZBdX-SdGJIkSsinn71Q]], delayed=false, details[failed shard on node [UynZBdX-SdGJIkSsinn71Q]: failed to create shard, failure IndexNotFoundException[no such index [records-xmode-000033]]], allocation_status[deciders_no]]]"
        },
        {
          "decider": "filter",
          "decision": "NO",
          "explanation": "initial allocation of the shrunken index is only allowed on nodes [_id:\"UynZBdX-SdGJIkSsinn71Q\"] that hold a copy of every shard in the index"
        }
      ]
    }

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-03-04T23:33:33Z

Pinging @elastic/es-distributed (:Distributed/Allocation)

DaveCTurner · 2024-05-30T21:14:05Z

I'm not sure if anything has changed in this area since this issue was opened, but it's been a few years and we haven't seen any further reports of confusion in this area, so I'm closing this to reflect that it's unlikely we'll address it in the foreseeable future.

aliciascott added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Mar 4, 2020

rjernst added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 4, 2020

DaveCTurner closed this as not planned Won't fix, can't repro, duplicate, stale May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve error messaging on allocation explain for IndexNotFoundException #53142

Improve error messaging on allocation explain for IndexNotFoundException #53142

aliciascott commented Mar 4, 2020 •

edited

Loading

elasticmachine commented Mar 4, 2020

DaveCTurner commented May 30, 2024

Improve error messaging on allocation explain for IndexNotFoundException #53142

Improve error messaging on allocation explain for IndexNotFoundException #53142

Comments

aliciascott commented Mar 4, 2020 • edited Loading

elasticmachine commented Mar 4, 2020

DaveCTurner commented May 30, 2024

aliciascott commented Mar 4, 2020 •

edited

Loading