-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Refresh should not fail already closed engine #51281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-distributed (:Distributed/Engine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks Nhat, I left a smaller comment to consider.
applyOperations(engine, generateHistoryOnReplica(between(20, 200), randomBoolean(), randomBoolean(), randomBoolean())); | ||
Phaser phaser = new Phaser(2); | ||
Thread refresh = new Thread(() -> { | ||
phaser.arriveAndAwaitAdvance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the additional thread is necessary here? AFAICS, if we close the engine and the do refresh, we should also get the same AlreadyClosedException
, but we could then strenghthen the test to ensure we get the exception and do not allow refresh succeeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider this a bug in Lucene, which registers the AlreadyClosedException
as a tragic exception (i.e. in failOnTragicEvent
we have indexWriter.getTragicException() != null
with indexWriter.getTragicException() instanceof AlreadyClosedException
, which IMO is bogus behavior of Lucene if IndexWriter.getReader()
is concurrently called to IndexWriter.close
.
I'm concerned that the "fix" here will hide proper tragedies on the Lucene IndexWriter.
Stack trace showing the issue:
WARN ][o.e.i.c.IndicesClusterStateService] [node-0] [test][0] marking and sending shard failed due to [shard failure, reason [already closed by tragic event on the index writer]]
» org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
» at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:681) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.IndexFileDeleter.ensureOpen(IndexFileDeleter.java:346) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.IndexFileDeleter.deleteFiles(IndexFileDeleter.java:669) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.IndexFileDeleter.decRef(IndexFileDeleter.java:589) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.FrozenBufferedUpdates.finishApply(FrozenBufferedUpdates.java:383) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.FrozenBufferedUpdates.lambda$forceApply$0(FrozenBufferedUpdates.java:246) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.FrozenBufferedUpdates.forceApply(FrozenBufferedUpdates.java:251) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.FrozenBufferedUpdates.tryApply(FrozenBufferedUpdates.java:159) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.IndexWriter.lambda$publishFrozenUpdates$3(IndexWriter.java:2592) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:5116) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:507) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:297) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:262) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:165) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:66) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:40) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:332) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:314) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:225) ~[lucene-core-8.4.0-snapshot-08b8d116f8f.jar:8.4.0-snapshot-08b8d116f8f 08b8d116f8ffacf35a6b05ff4d37f2263b712347 - ivera - 2019-12-12 10:52:52]
» at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1581) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:1560) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:3208) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.index.IndexService.maybeRefreshEngine(IndexService.java:823) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:955) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:144) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:629) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
» at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
» at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
» at java.lang.Thread.run(Thread.java:830) [?:?]
I've opened https://issues.apache.org/jira/browse/LUCENE-9164. |
Close in favor of the Lucene issue. @henningandersen @ywelsch Thanks for reviews! |
We do not hold any lock during refresh. If the engine was closed, then AlreadyClosedException is expected. In this case, we should not fail the engine.
Relates #48414