[CI] DocsClientYamlTestSuiteIT test {yaml=reference/cat/nodes/line_361} failing #124103

elasticsearchmachine · 2025-03-05T14:36:06Z

Build Scans:

Reproduction Line:

gradlew ":docs:yamlRestTest" --tests "org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT.test {yaml=reference/cat/nodes/line_361}" -Dtests.seed=245967704EFCFB2C -Dtests.locale=rn-BI -Dtests.timezone=ACT -Druntime.java=24

Applicable branches:
8.x

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.lang.AssertionError: Failure at [reference/cat/nodes:15]: field [$body] was expected to match the provided regex but didn't
Expected: ip        \s+heap.percent \s+ram.percent \s+cpu \s+load_1m \s+load_5m \s+load_15m \s+node.role \s+master \s+name\s* 127.0.0.1           \s+\d+ \s+\d+ \s+\d+    \s+(\d+\.\d+( \s+\d+\.\d+ \s+(\d+\.\d+)?)?)?                  \s+.+       \s+[*]      \s+.+\s*
     but: was "ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role   master name\n127.0.0.1           59          39  -1                          cdfhilmrstw *      node-0\n"

Issue Reasons:

[8.x] 2 consecutive failures in test test {yaml=reference/cat/nodes/line_361}
[8.x] 8 consecutive failures in step windows-2019_checkpart1_platform-support-windows
[8.x] 9 consecutive failures in step windows-2022_checkpart1_platform-support-windows
[8.x] 9 consecutive failures in step part-1-windows
[8.x] 26 failures in test test {yaml=reference/cat/nodes/line_361} (6.5% fail rate in 401 executions)
[8.x] 8 failures in step windows-2019_checkpart1_platform-support-windows (100.0% fail rate in 8 executions)
[8.x] 9 failures in step windows-2022_checkpart1_platform-support-windows (100.0% fail rate in 9 executions)
[8.x] 9 failures in step part-1-windows (100.0% fail rate in 9 executions)
[8.x] 9 failures in pipeline elasticsearch-periodic-platform-support (100.0% fail rate in 9 executions)
[8.x] 5 failures in pipeline elasticsearch-pull-request (8.6% fail rate in 58 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

The text was updated successfully, but these errors were encountered:

…=reference/cat/nodes/line_361} #124103

elasticsearchmachine · 2025-03-05T14:36:10Z

This has been muted on branch 9.0

Mute Reasons:

[9.0] 2 failures in test test {yaml=reference/cat/nodes/line_361} (1.7% fail rate in 118 executions)

Build Scans:

elasticsearchmachine · 2025-03-05T14:36:29Z

Pinging @elastic/es-delivery (Team:Delivery)

elasticsearchmachine · 2025-03-07T22:15:13Z

This has been muted on branch 8.18

Mute Reasons:

[8.18] 6 consecutive failures in step windows-2019_checkpart1_platform-support-windows
[8.18] 6 consecutive failures in step part-1-windows
[8.18] 5 consecutive failures in step windows-2022_checkpart1_platform-support-windows
[8.18] 17 failures in test test {yaml=reference/cat/nodes/line_361} (7.2% fail rate in 236 executions)
[8.18] 6 failures in step windows-2019_checkpart1_platform-support-windows (100.0% fail rate in 6 executions)
[8.18] 6 failures in step part-1-windows (100.0% fail rate in 6 executions)
[8.18] 5 failures in step windows-2022_checkpart1_platform-support-windows (100.0% fail rate in 5 executions)
[8.18] 6 failures in pipeline elasticsearch-periodic-platform-support (100.0% fail rate in 6 executions)
[8.18] 4 failures in pipeline elasticsearch-pull-request (12.1% fail rate in 33 executions)

Build Scans:

…=reference/cat/nodes/line_361} #124103

elasticsearchmachine · 2025-03-08T22:24:27Z

This has been muted on branch 8.x

Mute Reasons:

[8.x] 9 consecutive failures in step windows-2022_checkpart1_platform-support-windows
[8.x] 9 consecutive failures in step part-1-windows
[8.x] 7 consecutive failures in step windows-2019_checkpart1_platform-support-windows
[8.x] 25 failures in test test {yaml=reference/cat/nodes/line_361} (6.3% fail rate in 400 executions)
[8.x] 9 failures in step windows-2022_checkpart1_platform-support-windows (100.0% fail rate in 9 executions)
[8.x] 9 failures in step part-1-windows (100.0% fail rate in 9 executions)
[8.x] 7 failures in step windows-2019_checkpart1_platform-support-windows (100.0% fail rate in 7 executions)
[8.x] 9 failures in pipeline elasticsearch-periodic-platform-support (100.0% fail rate in 9 executions)
[8.x] 5 failures in pipeline elasticsearch-pull-request (8.6% fail rate in 58 executions)

Build Scans:

…=reference/cat/nodes/line_361} #124103

The .security index is created asynchronously on a cluster startup. This affects some of the docs YAML tests in a way that they need to account for the existence of the .security index or wait for the index to be created and green. This PR disables the feature for docs YAML tests. Disabling the feature in docs YAML tests will solve the flakiness without affecting the coverage. Resolves elastic#122343 Resolves elastic#121748 Resolves elastic#121611 Resolves elastic#121345 Resolves elastic#121338 Resolves elastic#121337 Resolves elastic#121288 Resolves elastic#121287 Resolves elastic#121867 Resolves elastic#122335 Resolves elastic#122681 Resolves elastic#121976 Resolves elastic#123094 Resolves elastic#123192 Resolves elastic#122983 Resolves elastic#124671 Resolves elastic#124103

nielsbauman · 2025-03-13T16:10:10Z

Reopening this because the failure in the Failure Message is still relevant.

elasticsearchmachine · 2025-03-13T16:11:17Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

ldematte · 2025-03-13T16:32:56Z

This never worked with OpenJDK on Windows, but we bundle the Oracle JDK (at least on Windows), and we use that for tests launched via gradle.

This is the OpenJDK implementation:

// Windows doesn't provide a loadavg primitive so this is stubbed out for now.
// It does have primitives (PDH API) to get CPU usage and run queue length.
// "\\Processor(_Total)\\% Processor Time", "\\System\\Processor Queue Length"
// If we wanted to implement loadavg on Windows, we have a few options:
//
// a) Query CPU usage and run queue length and "fake" an answer by
//    returning the CPU usage if it's under 100%, and the run queue
//    length otherwise.  It turns out that querying is pretty slow
//    on Windows, on the order of 200 microseconds on a fast machine.
//    Note that on the Windows the CPU usage value is the % usage
//    since the last time the API was called (and the first call
//    returns 100%), so we'd have to deal with that as well.
//
// b) Sample the "fake" answer using a sampling thread and store
//    the answer in a global variable.  The call to loadavg would
//    just return the value of the global, avoiding the slow query.
//
// c) Sample a better answer using exponential decay to smooth the
//    value.  This is basically the algorithm used by UNIX kernels.
//
// Note that sampling thread starvation could affect both (b) and (c).
int os::loadavg(double loadavg[], int nelem) {
  return -1;
}

Apparently, the Oracle JDK has a different (more complete, proprietary?) management bean for Windows.
But there is no Oracle JDK 24 yet, so we always use the OpenJDK 24 in tests now, and this test fails on Windows now.

I think it's reasonable to adjust the test to allow -1. We wouldn't catch cases where the CPU load percentage is unexpectedly -1, but I don't think these CAT tests should be the ones asserting that.

The .security index is created asynchronously on a cluster startup. This affects some of the docs YAML tests in a way that they need to account for the existence of the .security index or wait for the index to be created and green. This PR disables the feature for docs YAML tests. Disabling the feature in docs YAML tests will solve the flakiness without affecting the coverage. Resolves elastic#122343 Resolves elastic#121748 Resolves elastic#121611 Resolves elastic#121345 Resolves elastic#121338 Resolves elastic#121337 Resolves elastic#121288 Resolves elastic#121287 Resolves elastic#121867 Resolves elastic#122335 Resolves elastic#122681 Resolves elastic#121976 Resolves elastic#123094 Resolves elastic#123192 Resolves elastic#122983 Resolves elastic#124671 Resolves elastic#124103 (cherry picked from commit cac356a) # Conflicts: # muted-tests.yml

) The .security index is created asynchronously on a cluster startup. This affects some of the docs YAML tests in a way that they need to account for the existence of the .security index or wait for the index to be created and green. This PR disables the feature for docs YAML tests. Disabling the feature in docs YAML tests will solve the flakiness without affecting the coverage. Resolves #122343 Resolves #121748 Resolves #121611 Resolves #121345 Resolves #121338 Resolves #121337 Resolves #121288 Resolves #121287 Resolves #121867 Resolves #122335 Resolves #122681 Resolves #121976 Resolves #123094 Resolves #123192 Resolves #122983 Resolves #124671 Resolves #124103 (cherry picked from commit cac356a) # Conflicts: # muted-tests.yml

ldematte · 2025-03-14T12:00:52Z

This is not reproducible anymore, as we are transitioning to the new doc system; as we discussed on Slack, the fix would be in the test: -1 is an acceptable value, and we already accept it in related cat API tests (see for example rest-api-spec/test/cat.nodes/10_basic.yml)

nielsbauman · 2025-03-14T12:02:16Z

@ldematte should we add the -1 fix on the 8.x branches to prevent the bot from reopening this (and related) issues?

ldematte · 2025-03-14T13:49:29Z

The only branch that still have the old docs is 8.18. We can add the fix to that branch if we want.

elasticsearchmachine added :Delivery/Build Build or test infrastructure >test-failure Triaged test failures from CI labels Mar 5, 2025

elasticsearchmachine added a commit that referenced this issue Mar 5, 2025

Mute org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT test {yaml…

e47219e

…=reference/cat/nodes/line_361} #124103

elasticsearchmachine added Team:Delivery Meta label for Delivery team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Mar 5, 2025

elasticsearchmachine added a commit that referenced this issue Mar 7, 2025

Mute org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT test {yaml…

7baf78c

…=reference/cat/nodes/line_361} #124103

ldematte mentioned this issue Mar 8, 2025

[8.x] [9.0] Remove duplicate paths (including exclusive) in FileAccessTree (#123776 and #124023) (#123924) #124332

Merged

elasticsearchmachine added a commit that referenced this issue Mar 8, 2025

Mute org.elasticsearch.smoketest.DocsClientYamlTestSuiteIT test {yaml…

67adc88

…=reference/cat/nodes/line_361} #124103

slobodanadamovic self-assigned this Mar 12, 2025

slobodanadamovic mentioned this issue Mar 13, 2025

Disable queryable built-in feature in docs YAML tests #124684

Merged

elasticsearchmachine closed this as completed in #124684 Mar 13, 2025

elasticsearchmachine closed this as completed in cac356a Mar 13, 2025

nielsbauman added :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team and removed :Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team labels Mar 13, 2025

nielsbauman assigned ldematte and unassigned slobodanadamovic Mar 13, 2025

nielsbauman reopened this Mar 13, 2025

nielsbauman mentioned this issue Mar 13, 2025

[CI] DocsClientYamlTestSuiteIT class failing #124671

Closed

ldematte added the low-risk An open issue or test failure that is a low risk to future releases label Mar 13, 2025

elasticsearchmachine removed the needs:risk Requires assignment of a risk label (low, medium, blocker) label Mar 13, 2025

ldematte closed this as completed Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] DocsClientYamlTestSuiteIT test {yaml=reference/cat/nodes/line_361} failing #124103

[CI] DocsClientYamlTestSuiteIT test {yaml=reference/cat/nodes/line_361} failing #124103

elasticsearchmachine commented Mar 5, 2025 •

edited

Loading

elasticsearchmachine commented Mar 5, 2025

Uh oh!

elasticsearchmachine commented Mar 5, 2025

Uh oh!

elasticsearchmachine commented Mar 7, 2025

Uh oh!

elasticsearchmachine commented Mar 8, 2025

Uh oh!

nielsbauman commented Mar 13, 2025

Uh oh!

elasticsearchmachine commented Mar 13, 2025

Uh oh!

ldematte commented Mar 13, 2025

Uh oh!

ldematte commented Mar 14, 2025

Uh oh!

nielsbauman commented Mar 14, 2025

Uh oh!

ldematte commented Mar 14, 2025

Uh oh!

[CI] DocsClientYamlTestSuiteIT test {yaml=reference/cat/nodes/line_361} failing #124103

[CI] DocsClientYamlTestSuiteIT test {yaml=reference/cat/nodes/line_361} failing #124103

Comments

elasticsearchmachine commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

elasticsearchmachine commented Mar 5, 2025

Uh oh!

elasticsearchmachine commented Mar 5, 2025

Uh oh!

elasticsearchmachine commented Mar 7, 2025

Uh oh!

elasticsearchmachine commented Mar 8, 2025

Uh oh!

nielsbauman commented Mar 13, 2025

Uh oh!

elasticsearchmachine commented Mar 13, 2025

Uh oh!

ldematte commented Mar 13, 2025

Uh oh!

ldematte commented Mar 14, 2025

Uh oh!

nielsbauman commented Mar 14, 2025

Uh oh!

ldematte commented Mar 14, 2025

Uh oh!

elasticsearchmachine commented Mar 5, 2025 •

edited

Loading