[CI] CreateIndexIT testCreateAndDeleteIndexConcurrently failing #87094

masseyke · 2022-05-24T22:00:01Z

Build scan:
https://gradle-enterprise.elastic.co/s/dbdgtxjsl3u5y/tests/:server:internalClusterTest/org.elasticsearch.action.admin.indices.create.CreateIndexIT/testCreateAndDeleteIndexConcurrently

Reproduction line:
./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.action.admin.indices.create.CreateIndexIT.testCreateAndDeleteIndexConcurrently" -Dtests.seed=96A4BBE39F241797 -Dtests.locale=ro-RO -Dtests.timezone=Australia/Yancowinna -Druntime.java=17

Applicable branches:
master

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.action.admin.indices.create.CreateIndexIT&tests.test=testCreateAndDeleteIndexConcurrently

Failure excerpt:

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=694, name=Thread-5, state=RUNNABLE, group=TGRP-CreateIndexIT]

  at __randomizedtesting.SeedInfo.seed([96A4BBE39F241797:AE5DDD9C7EB982FD]:0)

  Caused by: java.lang.AssertionError: Expected current thread [Thread[elasticsearch[node_t3][transport_worker][T#3],5,TGRP-CreateIndexIT]] to not be a transport thread. Reason: [failEngine can block on IO]

    at __randomizedtesting.SeedInfo.seed([96A4BBE39F241797]:0)
    at org.elasticsearch.transport.Transports.assertNotTransportThread(Transports.java:56)
    at org.elasticsearch.index.engine.Engine.failEngine(Engine.java:1107)
    at org.elasticsearch.index.shard.IndexShard.failShard(IndexShard.java:1462)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.failShard(TransportReplicationAction.java:1134)
    at org.elasticsearch.action.support.replication.ReplicationOperation.updateCheckPoints(ReplicationOperation.java:314)
    at org.elasticsearch.action.support.replication.ReplicationOperation$2.onResponse(ReplicationOperation.java:225)
    at org.elasticsearch.action.support.replication.ReplicationOperation$2.onResponse(ReplicationOperation.java:220)
    at org.elasticsearch.action.support.RetryableAction$RetryingListener.onResponse(RetryableAction.java:144)
    at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:43)
    at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1329)
    at org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:365)
    at org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:352)
    at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:142)
    at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:94)
    at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:790)
    at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:149)
    at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:121)
    at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:86)
    at org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:63)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.lang.Thread.run(Thread.java:833)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-05-24T22:00:03Z

Pinging @elastic/es-distributed (Team:Distributed)

Failing a shard may block on IO so must not happen on a transport worker thread. With this commit we use a `WRITE` thread to handle shard failures caused by exceptions thrown within `updateCheckPoints`. Closes elastic#87094

Failing a shard may block on IO so must not happen on a transport worker thread. With this commit we use a `WRITE` thread to handle shard failures caused by exceptions thrown within `updateCheckPoints`. Closes #87094

…ic#87458) Failing a shard may block on IO so must not happen on a transport worker thread. With this commit we use a `WRITE` thread to handle shard failures caused by exceptions thrown within `updateCheckPoints`. Closes elastic#87094

… (#87495) Failing a shard may block on IO so must not happen on a transport worker thread. With this commit we use a `WRITE` thread to handle shard failures caused by exceptions thrown within `updateCheckPoints`. Closes #87094

… (#87496) Failing a shard may block on IO so must not happen on a transport worker thread. With this commit we use a `WRITE` thread to handle shard failures caused by exceptions thrown within `updateCheckPoints`. Closes #87094

masseyke added Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI labels May 24, 2022

DaveCTurner self-assigned this Jun 7, 2022

DaveCTurner added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Jun 7, 2022

DaveCTurner mentioned this issue Jun 7, 2022

Fork to WRITE thread before failing shard in updateCheckPoints #87458

Merged

DaveCTurner closed this as completed in #87458 Jun 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] CreateIndexIT testCreateAndDeleteIndexConcurrently failing #87094

[CI] CreateIndexIT testCreateAndDeleteIndexConcurrently failing #87094

masseyke commented May 24, 2022

elasticmachine commented May 24, 2022

Uh oh!

[CI] CreateIndexIT testCreateAndDeleteIndexConcurrently failing #87094

[CI] CreateIndexIT testCreateAndDeleteIndexConcurrently failing #87094

Comments

masseyke commented May 24, 2022

elasticmachine commented May 24, 2022

Uh oh!