Skip to content

[CI] StoreRecoveryTests testAddIndices failing #124104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
elasticsearchmachine opened this issue Mar 5, 2025 · 14 comments
Open

[CI] StoreRecoveryTests testAddIndices failing #124104

elasticsearchmachine opened this issue Mar 5, 2025 · 14 comments
Labels
:Core/Infra/Core Core issues without another label low-risk An open issue or test failure that is a low risk to future releases Team:Core/Infra Meta label for core/infra team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented Mar 5, 2025

Build Scans:

Reproduction Line:

gradlew ":server:test" --tests "org.elasticsearch.index.shard.StoreRecoveryTests.testAddIndices" -Dtests.seed=82669B0473091E08 -Dtests.locale=bem -Dtests.timezone=America/North_Dakota/New_Salem -Druntime.java=24

Applicable branches:
9.0

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.lang.AssertionError: expected:<0.0> but was:<51.0>

Issue Reasons:

  • [9.0] 22 consecutive failures in step windows-2022_checkpart1_platform-support-windows
  • [9.0] 22 failures in test testAddIndices (3.1% fail rate in 705 executions)
  • [9.0] 22 failures in step windows-2022_checkpart1_platform-support-windows (100.0% fail rate in 22 executions)
  • [9.0] 21 failures in pipeline elasticsearch-periodic-platform-support (95.5% fail rate in 22 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch 9.0

Mute Reasons:

  • [9.0] 2 failures in test testAddIndices (1.7% fail rate in 118 executions)

Build Scans:

@elasticsearchmachine elasticsearchmachine added Team:StorageEngine needs:risk Requires assignment of a risk label (low, medium, blocker) labels Mar 5, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch 8.18

Mute Reasons:

  • [8.18] 6 consecutive failures in step windows-2019_checkpart1_platform-support-windows
  • [8.18] 6 consecutive failures in step part-1-windows
  • [8.18] 5 consecutive failures in step windows-2022_checkpart1_platform-support-windows
  • [8.18] 17 failures in test testAddIndices (7.3% fail rate in 234 executions)
  • [8.18] 6 failures in step windows-2019_checkpart1_platform-support-windows (100.0% fail rate in 6 executions)
  • [8.18] 6 failures in step part-1-windows (100.0% fail rate in 6 executions)
  • [8.18] 5 failures in step windows-2022_checkpart1_platform-support-windows (100.0% fail rate in 5 executions)
  • [8.18] 6 failures in pipeline elasticsearch-periodic-platform-support (100.0% fail rate in 6 executions)
  • [8.18] 4 failures in pipeline elasticsearch-pull-request (12.9% fail rate in 31 executions)

Build Scans:

@lkts lkts added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :StorageEngine/TSDB You know, for Metrics labels Mar 7, 2025
@elasticsearchmachine elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team and removed Team:StorageEngine labels Mar 7, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@lkts
Copy link
Contributor

lkts commented Mar 7, 2025

I believe this should go to distributed indexing. It's interesting that it seems to be windows only.

@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch 8.x

Mute Reasons:

  • [8.x] 9 consecutive failures in step windows-2022_checkpart1_platform-support-windows
  • [8.x] 9 consecutive failures in step part-1-windows
  • [8.x] 7 consecutive failures in step windows-2019_checkpart1_platform-support-windows
  • [8.x] 25 failures in test testAddIndices (6.3% fail rate in 396 executions)
  • [8.x] 9 failures in step windows-2022_checkpart1_platform-support-windows (100.0% fail rate in 9 executions)
  • [8.x] 9 failures in step part-1-windows (100.0% fail rate in 9 executions)
  • [8.x] 7 failures in step windows-2019_checkpart1_platform-support-windows (100.0% fail rate in 7 executions)
  • [8.x] 9 failures in pipeline elasticsearch-periodic-platform-support (100.0% fail rate in 9 executions)
  • [8.x] 5 failures in pipeline elasticsearch-pull-request (9.3% fail rate in 54 executions)

Build Scans:

@fcofdez
Copy link
Contributor

fcofdez commented Mar 11, 2025

I wonder if this is fixed by #123676, I run the test a few hundred times in my windows desktop and it seems to pass. I'm going to unmute it.

fcofdez added a commit to fcofdez/elasticsearch that referenced this issue Mar 11, 2025
fcofdez added a commit that referenced this issue Mar 11, 2025
fcofdez added a commit that referenced this issue Mar 11, 2025
@fcofdez
Copy link
Contributor

fcofdez commented Mar 11, 2025

Closed via #124548

@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch 8.x

Mute Reasons:

  • [8.x] 9 consecutive failures in step windows-2019_checkpart1_platform-support-windows
  • [8.x] 9 consecutive failures in step windows-2022_checkpart1_platform-support-windows
  • [8.x] 9 consecutive failures in step part-1-windows
  • [8.x] 27 failures in test testAddIndices (6.0% fail rate in 447 executions)
  • [8.x] 9 failures in step windows-2019_checkpart1_platform-support-windows (100.0% fail rate in 9 executions)
  • [8.x] 9 failures in step windows-2022_checkpart1_platform-support-windows (100.0% fail rate in 9 executions)
  • [8.x] 9 failures in step part-1-windows (100.0% fail rate in 9 executions)
  • [8.x] 10 failures in pipeline elasticsearch-periodic-platform-support (100.0% fail rate in 10 executions)
  • [8.x] 5 failures in pipeline elasticsearch-pull-request (8.2% fail rate in 61 executions)

Build Scans:

@kingherc kingherc added the low-risk An open issue or test failure that is a low risk to future releases label Mar 12, 2025
@elasticsearchmachine elasticsearchmachine removed the needs:risk Requires assignment of a risk label (low, medium, blocker) label Mar 12, 2025
@kingherc kingherc added needs:risk Requires assignment of a risk label (low, medium, blocker) and removed low-risk An open issue or test failure that is a low risk to future releases labels Mar 12, 2025
@tlrx
Copy link
Member

tlrx commented Mar 19, 2025

As of today the test is still muted on 8.x and 8.18 and has recent failures on 9.0.

Looking at the test failure history the test started to fail on 9.0 on March 5th and on 8.x/8.18 on March 6th. It coincides with the bump to JDK 24 RC on those branches and all test failures are on Windows platforms and JDK 24.

I'm going to reassign to @elastic/es-core-infra for qualification of the risk and the next steps: it requires some knowledge around files entitlements that I don't have, and I'm not sure if the method StoreRecoveryTests#hardLinksSupported needs adjustments or if this is an issue with HardlinkCopyDirectoryWrapper and windows/jdk 24.

@tlrx tlrx added :Core/Infra/Core Core issues without another label and removed :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. Team:Distributed Indexing Meta label for Distributed Indexing team labels Mar 19, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Mar 19, 2025
@ldematte
Copy link
Contributor

ldematte commented Mar 19, 2025

Hi @tlrx, I've looked at this quickly and I can only say I think this is not related to entitlements, as this is a unit test: unit tests currently run without entitlements, it's a gap we still have to cover.
I will give it a more thorough look to see if this indeed an issue around hard links and windows/jdk24

@prdoyle
Copy link
Contributor

prdoyle commented Mar 19, 2025

Is this supposed to be muted on main? Because I'm getting it there too.

prdoyle added a commit to prdoyle/elasticsearch that referenced this issue Mar 19, 2025
@prdoyle
Copy link
Contributor

prdoyle commented Mar 19, 2025

I muted it on main in my PR.

@ldematte ldematte added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label low-risk An open issue or test failure that is a low risk to future releases Team:Core/Infra Meta label for core/infra team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

7 participants