-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Monitoring plugin indices stuck in failed recovery status #19275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There is not an official Docker image that is affiliated with Elastic. To be clear, the "official" Docker image on Docker Hub is not affiliated with Elastic. |
Was this a fresh install? The monitoring index was created the first time you started Elasticsearch? Was there a disk full event? |
Also, i don't understand why the log message says |
@krystalcode Any chance we can have to logs from the first startup through shutdown and the second startup to the point where you start seeing this message? |
@krystalcode I've just failed to replicate this without Docker. Any chance you could try without Docker to see if it still happens? |
@clintongormley yes this was a fresh install (or better a fresh launch of docker image), with elastic search data volume mounted on a clean directory. The monitoring index must have been created when I first started elastic search since it certainly did not exist before, and Kibana did not report any problems with the monitoring plugin the first time - so it must have been successfully created. I will look for more logs tomorrow, even though I don't think there is more information apart from 1+ gigabyte of the log provided, repeated. I will also try to recreate the situation with and without docker and report back. |
Also, no there was no disk full event, and I do not know what "from null" should normally be, so I can't interpret it. |
@clintongormley the null is not important, it just means the |
@krystalcode so far we have been unable to reproduce this. Any chance you can check again starting from a clean docker instance and see if you can reliably reproduce? |
I tried to reproduce this 2 more times, but I haven't been able to either. Unfortunately I haven't kept the logs from the 2 times I had this failure, because they were larger than 2 gigabytes and I deleted them. I think we can assume that there was a disk failure that caused data corruption. The fact that it happened consistently on the monitoring plugin indexes, which is why I raised this issue, might be because the monitoring indexes had more data compared to other indexes on them so it was more likely that they would get corrupted. Is there any need to discuss the fact that elastic search failed to stop reallocating the shard (as pointed out #19164 (comment))? If not, or if this discussion belongs to #19164, we can close this issue. |
thanks for the feedback @krystalcode |
Elasticsearch version:
5.0.0-alpha3
JVM version:
The docker image inherits from the official java:8-jre image.
OS version:
Official docker image (https://hub.docker.com/_/elasticsearch/, Debian Jessie) running on a Fedora 23 host.
Description of the problem including expected versus actual behavior:
Shards created by the Monitoring X-Pack plugin are stuck trying to recover. Kibana reports "Elasticsearch is still initializing the Monitoring indices". The logs are given below, notably "marking and sending shard failed due to [failed recovery]" and "IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine];".
This has happened twice (out of 2 times tried) on a testing environment following the steps given below. A few things to note:
I do not recall having the same problem with Marvel and ElasticSearch 2 when following the same steps.
Any suggestions on how to figure out if this is due to data corruption, and why only on the monitoring plugin indices? Not sure if it is related, but other indices would have really low write rate, since this is a test environment, not sure how many writes per second the monitoring plugin would do.
Steps to reproduce:
Provide logs (if relevant):
The text was updated successfully, but these errors were encountered: