Skip to content

[CI] SmokeTestMultiNodeClientYamlTestSuiteIT class failing #118955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
elasticsearchmachine opened this issue Dec 18, 2024 · 12 comments · Fixed by #118996
Closed

[CI] SmokeTestMultiNodeClientYamlTestSuiteIT class failing #118955

elasticsearchmachine opened this issue Dec 18, 2024 · 12 comments · Fixed by #118996
Assignees
Labels
blocker :StorageEngine/Logs You know, for Logs Team:StorageEngine >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented Dec 18, 2024

Build Scans:

Reproduction Line:

./gradlew ":qa:smoke-test-multinode:yamlRestTest" --tests "org.elasticsearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT" -Dtests.method="test {yaml=indices.create/20_synthetic_source/create index with use_synthetic_source}" -Dtests.seed=EEF8314C3457C525 -Dtests.locale=bs -Dtests.timezone=Pacific/Pitcairn -Druntime.java=23

Applicable branches:
main

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.net.SocketTimeoutException: 60.000 milliseconds timeout on connection http-outgoing-1 [ACTIVE]

Issue Reasons:

  • [main] 23 failures in class org.elasticsearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT (4.6% fail rate in 502 executions)
  • [main] 15 failures in step part-1 (6.3% fail rate in 238 executions)
  • [main] 2 failures in step part1 (2.1% fail rate in 94 executions)
  • [main] 2 failures in pipeline elasticsearch-periodic (25.0% fail rate in 8 executions)
  • [main] 15 failures in pipeline elasticsearch-pull-request (6.3% fail rate in 239 executions)
  • [main] 2 failures in pipeline elasticsearch-intake (2.1% fail rate in 94 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Search Relevance/Analysis How text is split into tokens >test-failure Triaged test failures from CI labels Dec 18, 2024
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch main

Mute Reasons:

  • [main] 7 failures in class org.elasticsearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT (1.0% fail rate in 735 executions)
  • [main] 6 failures in step part-1 (2.1% fail rate in 287 executions)
  • [main] 6 failures in pipeline elasticsearch-pull-request (2.1% fail rate in 288 executions)

Build Scans:

elasticsearchmachine added a commit that referenced this issue Dec 18, 2024
…eIT org.elasticsearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT #118955
@elasticsearchmachine elasticsearchmachine added needs:risk Requires assignment of a risk label (low, medium, blocker) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Dec 18, 2024
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-search-relevance (Team:Search Relevance)

rjernst pushed a commit to rjernst/elasticsearch that referenced this issue Dec 18, 2024
…eIT org.elasticsearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT elastic#118955
@benwtrent
Copy link
Member

This is a pretty nasty failure. From what I can tell, a node crashed due to an unhandled assertion:

The assertion tripped was added here: #114618

relabeling to those teams and marking as a blocker as this whole suite is muted now.

[2024-12-18T08:30:09,645][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [test-cluster-1] fatal error in thread [elasticsearch[test-cluster-1][generic][T#5]], exiting
java.lang.AssertionError: null
	at org.elasticsearch.index.engine.LuceneSyntheticSourceChangesSnapshot.<init>(LuceneSyntheticSourceChangesSnapshot.java:80) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.engine.InternalEngine.newChangesSnapshot(InternalEngine.java:3171) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.shard.IndexShard.newChangesSnapshot(IndexShard.java:2662) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$recoverToTarget$15(RecoverySourceHandler.java:321) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:257) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:362) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler$1.onResponse(RecoverySourceHandler.java:408) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:362) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$runUnderPrimaryPermit$22(RecoverySourceHandler.java:437) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$runUnderPrimaryPermit$19(RecoverySourceHandler.java:427) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$runUnderPrimaryPermit$20(RecoverySourceHandler.java:405) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListenerImplementations$DelegatingFailureActionListener.onResponse(ActionListenerImplementations.java:219) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$38(IndexShard.java:3617) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListenerImplementations$DelegatingFailureActionListener.onResponse(ActionListenerImplementations.java:219) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:400) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.shard.IndexShardOperationPermits.innerAcquire(IndexShardOperationPermits.java:255) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:203) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:3588) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:3578) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.runUnderPrimaryPermit(RecoverySourceHandler.java:405) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.runUnderPrimaryPermit(RecoverySourceHandler.java:437) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$recoverToTarget$16(RecoverySourceHandler.java:312) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:257) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.support.SubscribableListener$SuccessResult.complete(SubscribableListener.java:387) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.support.SubscribableListener.tryComplete(SubscribableListener.java:307) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.support.SubscribableListener.setResult(SubscribableListener.java:336) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.support.SubscribableListener.onResponse(SubscribableListener.java:250) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.indices.recovery.RecoverySourceHandler.lambda$prepareTargetForTranslog$36(RecoverySourceHandler.java:1014) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:257) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListenerImplementations$MappedActionListener.onResponse(ActionListenerImplementations.java:97) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener.onResponse(ActionListenerImplementations.java:336) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:400) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:400) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.support.RetryableAction$RetryingListener.onResponse(RetryableAction.java:150) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListenerImplementations$RunBeforeActionListener.onResponse(ActionListenerImplementations.java:336) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:400) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:49) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1499) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:434) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.transport.InboundHandler$2.doRun(InboundHandler.java:391) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.0.0-SNAPSHOT.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1575) ~[?:?]

@benwtrent
Copy link
Member

/cc @jimczi ^

@benwtrent benwtrent added blocker :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. :StorageEngine/Logs You know, for Logs and removed :Search Relevance/Analysis How text is split into tokens needs:risk Requires assignment of a risk label (low, medium, blocker) labels Dec 18, 2024
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine Team:Distributed Indexing Meta label for Distributed Indexing team labels Dec 18, 2024
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine elasticsearchmachine removed the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Dec 18, 2024
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@martijnvg martijnvg self-assigned this Dec 18, 2024
@martijnvg
Copy link
Member

I think the immediate reason this started to fail is because of a yaml test that I added via #118924.
The assertion fails because the index's source mode isn't synthetic. Not sure how that can happen.

@martijnvg martijnvg removed the :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. label Dec 18, 2024
@elasticsearchmachine elasticsearchmachine removed the Team:Distributed Indexing Meta label for Distributed Indexing team label Dec 18, 2024
@martijnvg
Copy link
Member

I'm not yet able to figure out why this fails or reproduce.

However I'm confident that it is caused by a test that was added in the mentioned PR. So I think we can just mute that test instead of the whole suite: #118989

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Dec 18, 2024
In case of synthetic recovery source the mapping is empty.

Closes elastic#118955
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Dec 18, 2024
In case of synthetic recovery source the mapping is empty.

Closes elastic#118955
@breskeby breskeby removed the blocker label Dec 18, 2024
@elasticsearchmachine elasticsearchmachine added the needs:risk Requires assignment of a risk label (low, medium, blocker) label Dec 18, 2024
@elasticsearchmachine elasticsearchmachine removed the needs:risk Requires assignment of a risk label (low, medium, blocker) label Dec 18, 2024
martijnvg added a commit that referenced this issue Dec 19, 2024
)

In case of synthetic recovery source when the mapping is empty.

A test that reproduces failure in #118955 consistently with a potential fix.

`MapperService#updateMapping(...)` doesn't set the mapper field if a mapping has no fields, which is what is used in InternalEngine#newChangesSnapshot(...) . This happens when `newMappingMetadata` variable in `MapperService updateMapping(...)` is `null`. Causing an assertion to trip. This change adjusts that assertion to handle an empty index.

Closes #118955
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Dec 19, 2024
…tic#118996)

In case of synthetic recovery source when the mapping is empty.

A test that reproduces failure in elastic#118955 consistently with a potential fix.

`MapperService#updateMapping(...)` doesn't set the mapper field if a mapping has no fields, which is what is used in InternalEngine#newChangesSnapshot(...) . This happens when `newMappingMetadata` variable in `MapperService updateMapping(...)` is `null`. Causing an assertion to trip. This change adjusts that assertion to handle an empty index.

Closes elastic#118955
elasticsearchmachine added a commit that referenced this issue Dec 19, 2024
…eIT org.elasticsearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT #118955
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch 8.x

Mute Reasons:

  • [8.x] 2 consecutive failures in class org.elasticsearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT
  • [8.x] 13 failures in class org.elasticsearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT (8.0% fail rate in 162 executions)
  • [8.x] 2 failures in step part1 (11.1% fail rate in 18 executions)
  • [8.x] 7 failures in step part-1 (25.0% fail rate in 28 executions)
  • [8.x] 2 failures in pipeline elasticsearch-intake (11.1% fail rate in 18 executions)
  • [8.x] 2 failures in pipeline elasticsearch-periodic (50.0% fail rate in 4 executions)
  • [8.x] 7 failures in pipeline elasticsearch-pull-request (25.9% fail rate in 27 executions)

Build Scans:

martijnvg added a commit that referenced this issue Dec 19, 2024
…#119089)

Backport of #118996 to 8.x branch.

In case of synthetic recovery source when the mapping is empty.

A test that reproduces failure in #118955 consistently with a potential fix.

`MapperService#updateMapping(...)` doesn't set the mapper field if a mapping has no fields, which is what is used in InternalEngine#newChangesSnapshot(...) . This happens when `newMappingMetadata` variable in `MapperService updateMapping(...)` is `null`. Causing an assertion to trip. This change adjusts that assertion to handle an empty index.

Closes #118955
@martijnvg
Copy link
Member

Re-opening of this issue is fixed via #119089. The test got muted before that PR was merged.

navarone-feekery pushed a commit to navarone-feekery/elasticsearch that referenced this issue Dec 26, 2024
…eIT org.elasticsearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT elastic#118955
navarone-feekery pushed a commit to navarone-feekery/elasticsearch that referenced this issue Dec 26, 2024
…tic#118996)

In case of synthetic recovery source when the mapping is empty.

A test that reproduces failure in elastic#118955 consistently with a potential fix.

`MapperService#updateMapping(...)` doesn't set the mapper field if a mapping has no fields, which is what is used in InternalEngine#newChangesSnapshot(...) . This happens when `newMappingMetadata` variable in `MapperService updateMapping(...)` is `null`. Causing an assertion to trip. This change adjusts that assertion to handle an empty index.

Closes elastic#118955
@benwtrent benwtrent reopened this Jan 6, 2025
@benwtrent
Copy link
Member

@martijnvg in main

- class: org.elasticsearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT
  issue: https://github.com/elastic/elasticsearch/issues/119191

This is still muted.

@benwtrent
Copy link
Member

Ah, I see that this is now a separate issue. LOL, I will close this one and reopen the other!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker :StorageEngine/Logs You know, for Logs Team:StorageEngine >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants