Skip to content

Refresh should not fail already closed engine #51281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jan 22, 2020

We do not hold any lock during refresh. If the engine was closed, then AlreadyClosedException is expected. In this case, we should not fail the engine.

Relates #48414

@dnhatn dnhatn added >bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.7.0 labels Jan 22, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Engine)

@dnhatn dnhatn added the v7.6.1 label Jan 22, 2020
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Nhat, I left a smaller comment to consider.

applyOperations(engine, generateHistoryOnReplica(between(20, 200), randomBoolean(), randomBoolean(), randomBoolean()));
Phaser phaser = new Phaser(2);
Thread refresh = new Thread(() -> {
phaser.arriveAndAwaitAdvance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the additional thread is necessary here? AFAICS, if we close the engine and the do refresh, we should also get the same AlreadyClosedException, but we could then strenghthen the test to ensure we get the exception and do not allow refresh succeeding.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this a bug in Lucene, which registers the AlreadyClosedException as a tragic exception (i.e. in failOnTragicEvent we have indexWriter.getTragicException() != null with indexWriter.getTragicException() instanceof AlreadyClosedException, which IMO is bogus behavior of Lucene if IndexWriter.getReader() is concurrently called to IndexWriter.close.

I'm concerned that the "fix" here will hide proper tragedies on the Lucene IndexWriter.

Stack trace showing the issue:

WARN ][o.e.i.c.IndicesClusterStateService] [node-0] [test][0] marking and sending shard failed due to [shard failure, reason [already closed by tragic event on the index writer]]
»  org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
»       at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:681) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.IndexFileDeleter.ensureOpen(IndexFileDeleter.java:346) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.IndexFileDeleter.deleteFiles(IndexFileDeleter.java:669) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.IndexFileDeleter.decRef(IndexFileDeleter.java:589) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.FrozenBufferedUpdates.finishApply(FrozenBufferedUpdates.java:383) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.FrozenBufferedUpdates.lambda$forceApply$0(FrozenBufferedUpdates.java:246) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.FrozenBufferedUpdates.forceApply(FrozenBufferedUpdates.java:251) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.FrozenBufferedUpdates.tryApply(FrozenBufferedUpdates.java:159) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.IndexWriter.lambda$publishFrozenUpdates$3(IndexWriter.java:2592) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5116) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:507) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:297) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:262) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:165) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:66) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»       at org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:40) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»       at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:332) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»       at org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:314) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»       at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:225) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
»       at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1581) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»       at org.elasticsearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:1560) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»       at org.elasticsearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:3208) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»       at org.elasticsearch.index.IndexService.maybeRefreshEngine(IndexService.java:823) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»       at org.elasticsearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:955) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»       at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:144) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»       at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:629) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
»       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
»       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
»       at java.lang.Thread.run(Thread.java:830) [?:?]

@dnhatn
Copy link
Member Author

dnhatn commented Jan 23, 2020

@dnhatn
Copy link
Member Author

dnhatn commented Feb 21, 2020

Close in favor of the Lucene issue. @henningandersen @ywelsch Thanks for reviews!

@dnhatn dnhatn closed this Feb 21, 2020
@dnhatn dnhatn deleted the refresh-not-fail branch February 21, 2020 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants