Skip to content

[CI] SmokeTestWatcherWithSecurityIT.testSearchTransformInsufficientPermissions Failure #33291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
talevy opened this issue Aug 31, 2018 · 6 comments · Fixed by #35271
Closed
Assignees
Labels
:Data Management/Watcher >test-failure Triaged test failures from CI

Comments

@talevy
Copy link
Contributor

talevy commented Aug 31, 2018

REPRODUCE WITH: ./gradlew :x-pack:qa:smoke-test-watcher-with-security:integTestRunner \
  -Dtests.seed=9861B17A9382A49D \
  -Dtests.class=org.elasticsearch.smoketest.SmokeTestWatcherWithSecurityIT \
  -Dtests.method="testSearchTransformInsufficientPermissions" \
  -Dtests.security.manager=true \
  -Dtests.locale=sr-Latn-ME \
  -Dtests.timezone=America/Cambridge_Bay \
  -Dcompiler.java=10 \
  -Druntime.java=8

stacktrace

3:45:09 FAILURE 12.0s | SmokeTestWatcherWithSecurityIT.testSearchTransformInsufficientPermissions <<< FAILURES!
23:45:09    > Throwable #1: java.lang.AssertionError: 
23:45:09    > Expected: is a value equal to or greater than <1>
23:45:09    >      but: <0> was less than <1>
23:45:09    > 	at __randomizedtesting.SeedInfo.seed([9861B17A9382A49D:514B584B289A370]:0)
23:45:09    > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
23:45:09    > 	at org.elasticsearch.smoketest.SmokeTestWatcherWithSecurityIT.lambda$getWatchHistoryEntry$3(SmokeTestWatcherWithSecurityIT.java:326)
23:45:09    > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:847)
23:45:09    > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:821)
...
23:45:30 org.elasticsearch.ElasticsearchSecurityException: action [indices:data/write/bulk[s]] is unauthorized for user [watcher_manager]
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@andrershov
Copy link
Contributor

Occurred again https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+multijob-unix-compatibility/os=centos/58/console

REPRODUCE WITH: ./gradlew :x-pack:qa:smoke-test-watcher-with-security:integTestRunner \
  -Dtests.seed=484163AFA937AD5E \
  -Dtests.class=org.elasticsearch.smoketest.SmokeTestWatcherWithSecurityIT \
  -Dtests.method="testSearchTransformInsufficientPermissions" \
  -Dtests.security.manager=true \
  -Dtests.locale=it \
  -Dtests.timezone=Antarctica/Mawson \
  -Dcompiler.java=11 \
  -Druntime.java=8

@jakelandis jakelandis self-assigned this Oct 25, 2018
@colings86 colings86 added v6.6.0 and removed v6.5.0 labels Oct 25, 2018
@jakelandis
Copy link
Contributor

I haven't reproduced this exact issue... but stepping through the code, there appears to be a bug in the start/stop logic in SmokeTestWatcherWithSecurityIT such that it does not busy wait as (I believe) is the intent of the code.

Starting Watcher here is wrapped in an assertBusy() which appears to attempt to wait for 10 seconds max until the Watcher is indeed started. However, the break here allows not only to break the switch, but also effectively stops the busy wait logic since it exits the lamda normally. assertBusy only busy waits while an assertionError is getting thrown in the (lamda) code block.

Stop Watcher has the same bug here (note the assertion in both is asserting the success/failure of the http call, not the state of Watcher)... and it looks like SmokeTestWatcherWithSecurityClientYamlTestSuiteIT may have the same issue (but haven't stepped trough that yet).

This means that Watcher is not guaranteed to be started or stopped between tests since it is not properly blocking till started or stopped, and is likely the root cause of many of these Watcher failures.

I believe the fix is to simply change the break to a AssertionError for the state change switches.

Also, this specific test is peculiar since it can pass without Watcher ever started.

jakelandis added a commit to jakelandis/elasticsearch that referenced this issue Nov 5, 2018
Ensure that Watcher is correctly started and stopped between tests for
SmokeTestWatcherWithSecurityIT and
SmokeTestWatcherWithSecurityClientYamlTestSuiteIT.

The change here is to throw an `AssertionError` instead of `break;` to
allow the `assertBusy()` to continue to busy wait until the desired
state is reached.

closes elastic#33291
closes elastic#29877
jakelandis added a commit that referenced this issue Nov 7, 2018
#35271)

Ensure that Watcher is correctly started and stopped between tests for
SmokeTestWatcherWithSecurityIT,
SmokeTestWatcherWithSecurityClientYamlTestSuiteIT,
SmokeTestWatcherTestSuiteIT, WatcherRestIT,
XDocsClientYamlTestSuiteIT, and XPackRestIT

The change here is to throw an `AssertionError` instead of `break;` to
allow the `assertBusy()` to continue to busy wait until the desired
state is reached.

closes #33291, closes #29877, closes #34462, closes #30705, closes #34448
jakelandis added a commit that referenced this issue Nov 8, 2018
#35271)

Ensure that Watcher is correctly started and stopped between tests for
SmokeTestWatcherWithSecurityIT,
SmokeTestWatcherWithSecurityClientYamlTestSuiteIT,
SmokeTestWatcherTestSuiteIT, WatcherRestIT,
XDocsClientYamlTestSuiteIT, and XPackRestIT

The change here is to throw an `AssertionError` instead of `break;` to
allow the `assertBusy()` to continue to busy wait until the desired
state is reached.

closes #33291, closes #29877, closes #34462, closes #30705, closes #34448
jakelandis added a commit that referenced this issue Nov 8, 2018
#35271)

Ensure that Watcher is correctly started and stopped between tests for
SmokeTestWatcherWithSecurityIT,
SmokeTestWatcherWithSecurityClientYamlTestSuiteIT,
SmokeTestWatcherTestSuiteIT, WatcherRestIT,
XDocsClientYamlTestSuiteIT, and XPackRestIT

The change here is to throw an `AssertionError` instead of `break;` to
allow the `assertBusy()` to continue to busy wait until the desired
state is reached.

closes #33291, closes #29877, closes #34462, closes #30705, closes #34448
pgomulka pushed a commit to pgomulka/elasticsearch that referenced this issue Nov 13, 2018
elastic#35271)

Ensure that Watcher is correctly started and stopped between tests for
SmokeTestWatcherWithSecurityIT,
SmokeTestWatcherWithSecurityClientYamlTestSuiteIT,
SmokeTestWatcherTestSuiteIT, WatcherRestIT,
XDocsClientYamlTestSuiteIT, and XPackRestIT

The change here is to throw an `AssertionError` instead of `break;` to
allow the `assertBusy()` to continue to busy wait until the desired
state is reached.

closes elastic#33291, closes elastic#29877, closes elastic#34462, closes elastic#30705, closes elastic#34448
@cbuescher
Copy link
Member

@jakelandis I got something that looks quite similar to this today on 6.5, not sure if this should have been fixed you your PR

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.5+multijob-unix-compatibility/os=centos/70/console

./gradlew :x-pack:qa:smoke-test-watcher-with-security:integTestRunner \
  -Dtests.seed=2EA37470CE126E3E \
  -Dtests.class=org.elasticsearch.smoketest.SmokeTestWatcherWithSecurityIT \
  -Dtests.method="testSearchTransformInsufficientPermissions" \
  -Dtests.security.manager=true \
  -Dtests.locale=es-UY \
  -Dtests.timezone=ART \
  -Dcompiler.java=11 \
  -Druntime.java=8
java.lang.AssertionError: 
Expected: is a value equal to or greater than <1>
     but: <0> was less than <1>
	at __randomizedtesting.SeedInfo.seed([2EA37470CE126E3E:B3D6708EEF1969D3]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.junit.Assert.assertThat(Assert.java:923)
	at org.elasticsearch.smoketest.SmokeTestWatcherWithSecurityIT.lambda$getWatchHistoryEntry$3(SmokeTestWatcherWithSecurityIT.java:328)
	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:848)
	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:822)
	at org.elasticsearch.smoketest.SmokeTestWatcherWithSecurityIT.getWatchHistoryEntry(SmokeTestWatcherWithSecurityIT.java:306)
	at org.elasticsearch.smoketest.SmokeTestWatcherWithSecurityIT.getWatchHistoryEntry(SmokeTestWatcherWithSecurityIT.java:301)
	at org.elasticsearch.smoketest.SmokeTestWatcherWithSecurityIT.testSearchTransformInsufficientPermissions(SmokeTestWatcherWithSecurityIT.java:237)

@cbuescher cbuescher reopened this Nov 29, 2018
@pcsanwald pcsanwald added v6.7.0 and removed v6.6.0 labels Jan 17, 2019
@jasontedor jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019
jakelandis added a commit to jakelandis/elasticsearch that referenced this issue May 2, 2019
This commit unmutes the org.elasticsearch.smoketest.SmokeTestWatcherWithSecurityIT
test suite, fixes a bug [1] that was introduced while the test was muted, and
adds some additional debug logging, and enables debug for the ES instance used
in this Watcher test.

The bug fixed here is minor and unlikely to happen. It requires ES to be started
with ILM disabled, Watcher enabled, and Watcher explicitly stopped and restarted.
Due to validation Watcher does not fully start and can result in a partially started
state. This is an unlikely scenerio outside of the testing framework.

Optimistically closing the following
Fixes elastic#35361
Fixes elastic#30777
Fixes elastic#35361
Fixes elastic#33291
Fixes elastic#29893

If this does not fully fix the issue, there will now be better debug logging.
jakelandis added a commit to jakelandis/elasticsearch that referenced this issue May 10, 2019
There are likely multiple root causes to the seemingly random failures
generated by SmokeTestWatcherWithSecurityIT. This commit un-mutes this
this test, address one known cause and adds debug logging for this test.

The known root cause for one failure is that we can have a Watch running
that is reading data from an index. Before we stop Watcher we delete that
index. If Watcher happens to execute after deletion of the index but before
the stop of Watcher the test can fail. The fix here is to simply move the
index deletion after the stop of Watcher.

Related elastic#35361
Related elastic#30777
Related elastic#35361
Related elastic#33291
Related elastic#29893
jakelandis added a commit that referenced this issue May 24, 2019
* Address test failures for SmokeTestWatcherWithSecurityIT

There are likely multiple root causes to the seemingly random failures
generated by SmokeTestWatcherWithSecurityIT. This commit un-mutes this
this test, address one known cause and adds debug logging for this test.

The known root cause for one failure is that we can have a Watch running
that is reading data from an index. Before we stop Watcher we delete that
index. If Watcher happens to execute after deletion of the index but before
the stop of Watcher the test can fail. The fix here is to simply move the
index deletion after the stop of Watcher.

Related #35361
Related #30777
Related #33291
Related #29893
@jakelandis
Copy link
Contributor

Un-muted this test on PR #42409 to obtain additional logs.

If (when?) this test fails again please obtain the following information before muting the test:

  • Copy of the relevant failure
  • Copy of the reproduce line
  • The Jenkins build link
  • The Gradle scan link
  • The relevant cluster logs from "Google Cloud Storage Upload Report" (link found in Jenkins build)

gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
* Address test failures for SmokeTestWatcherWithSecurityIT

There are likely multiple root causes to the seemingly random failures
generated by SmokeTestWatcherWithSecurityIT. This commit un-mutes this
this test, address one known cause and adds debug logging for this test.

The known root cause for one failure is that we can have a Watch running
that is reading data from an index. Before we stop Watcher we delete that
index. If Watcher happens to execute after deletion of the index but before
the stop of Watcher the test can fail. The fix here is to simply move the
index deletion after the stop of Watcher.

Related elastic#35361
Related elastic#30777
Related elastic#33291
Related elastic#29893
@jakelandis jakelandis added v7.3.0 and removed v7.2.0 labels Jun 17, 2019
@martijnvg
Copy link
Member

Closing in favour of #30777 as this test fails in a similar way as the mentioned issue.

PR #50931 will add more logging in case this fails again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment