Skip to content

[CI] CreateIndexIT testCreateAndDeleteIndexConcurrently failing #87094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
masseyke opened this issue May 24, 2022 · 1 comment · Fixed by #87458
Closed

[CI] CreateIndexIT testCreateAndDeleteIndexConcurrently failing #87094

masseyke opened this issue May 24, 2022 · 1 comment · Fixed by #87458
Assignees
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI

Comments

@masseyke
Copy link
Member

Build scan:
https://gradle-enterprise.elastic.co/s/dbdgtxjsl3u5y/tests/:server:internalClusterTest/org.elasticsearch.action.admin.indices.create.CreateIndexIT/testCreateAndDeleteIndexConcurrently

Reproduction line:
./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.action.admin.indices.create.CreateIndexIT.testCreateAndDeleteIndexConcurrently" -Dtests.seed=96A4BBE39F241797 -Dtests.locale=ro-RO -Dtests.timezone=Australia/Yancowinna -Druntime.java=17

Applicable branches:
master

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.action.admin.indices.create.CreateIndexIT&tests.test=testCreateAndDeleteIndexConcurrently

Failure excerpt:

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=694, name=Thread-5, state=RUNNABLE, group=TGRP-CreateIndexIT]

  at __randomizedtesting.SeedInfo.seed([96A4BBE39F241797:AE5DDD9C7EB982FD]:0)

  Caused by: java.lang.AssertionError: Expected current thread [Thread[elasticsearch[node_t3][transport_worker][T#3],5,TGRP-CreateIndexIT]] to not be a transport thread. Reason: [failEngine can block on IO]

    at __randomizedtesting.SeedInfo.seed([96A4BBE39F241797]:0)
    at org.elasticsearch.transport.Transports.assertNotTransportThread(Transports.java:56)
    at org.elasticsearch.index.engine.Engine.failEngine(Engine.java:1107)
    at org.elasticsearch.index.shard.IndexShard.failShard(IndexShard.java:1462)
    at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.failShard(TransportReplicationAction.java:1134)
    at org.elasticsearch.action.support.replication.ReplicationOperation.updateCheckPoints(ReplicationOperation.java:314)
    at org.elasticsearch.action.support.replication.ReplicationOperation$2.onResponse(ReplicationOperation.java:225)
    at org.elasticsearch.action.support.replication.ReplicationOperation$2.onResponse(ReplicationOperation.java:220)
    at org.elasticsearch.action.support.RetryableAction$RetryingListener.onResponse(RetryableAction.java:144)
    at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:43)
    at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1329)
    at org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:365)
    at org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:352)
    at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:142)
    at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:94)
    at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:790)
    at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:149)
    at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:121)
    at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:86)
    at org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:63)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.lang.Thread.run(Thread.java:833)

@masseyke masseyke added Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI labels May 24, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner self-assigned this Jun 7, 2022
@DaveCTurner DaveCTurner added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Jun 7, 2022
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 7, 2022
Failing a shard may block on IO so must not happen on a transport worker
thread. With this commit we use a `WRITE` thread to handle shard
failures caused by exceptions thrown within `updateCheckPoints`.

Closes elastic#87094
DaveCTurner added a commit that referenced this issue Jun 8, 2022
Failing a shard may block on IO so must not happen on a transport worker
thread. With this commit we use a `WRITE` thread to handle shard
failures caused by exceptions thrown within `updateCheckPoints`.

Closes #87094
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 8, 2022
…ic#87458)

Failing a shard may block on IO so must not happen on a transport worker
thread. With this commit we use a `WRITE` thread to handle shard
failures caused by exceptions thrown within `updateCheckPoints`.

Closes elastic#87094
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Jun 8, 2022
…ic#87458)

Failing a shard may block on IO so must not happen on a transport worker
thread. With this commit we use a `WRITE` thread to handle shard
failures caused by exceptions thrown within `updateCheckPoints`.

Closes elastic#87094
elasticsearchmachine pushed a commit that referenced this issue Jun 8, 2022
… (#87495)

Failing a shard may block on IO so must not happen on a transport worker
thread. With this commit we use a `WRITE` thread to handle shard
failures caused by exceptions thrown within `updateCheckPoints`.

Closes #87094
elasticsearchmachine pushed a commit that referenced this issue Jun 8, 2022
… (#87496)

Failing a shard may block on IO so must not happen on a transport worker
thread. With this commit we use a `WRITE` thread to handle shard
failures caused by exceptions thrown within `updateCheckPoints`.

Closes #87094
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants