Skip to content

CI: EvilThreadPoolTests.testExecutionExceptionOnSinglePrioritizingThreadPoolExecutor fails #37708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alpar-t opened this issue Jan 22, 2019 · 5 comments · Fixed by #38014
Closed
Assignees
Labels
:Core/Infra/Core Core issues without another label >test-failure Triaged test failures from CI

Comments

@alpar-t
Copy link
Contributor

alpar-t commented Jan 22, 2019

Example build failure

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+corretto-periodic/ES_BUILD_JAVA=openjdk12,label=amazon/68/

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+intake/1095/

Reproduction line

does not reproduce locally

./gradlew :qa:evil-tests:unitTest -Dtests.seed=6245D3965911131D -Dtests.class=org.elasticsearch.threadpool.EvilThreadPoolTests -Dtests.method="testExecutionExceptionOnSinglePrioritizingThreadPoolExecutor" -Dtests.security.manager=false -Dtests.locale=es-BO -Dtests.timezone=Africa/Asmara -Dcompiler.java=12 -Druntime.java=8

Example relevant log:

14:59:21 ERROR   1199s J7 | EvilThreadPoolTests.testExecutionExceptionOnSinglePrioritizingThreadPoolExecutor <<< FAILURES!
14:59:21    > Throwable #1: java.lang.Exception: Test abandoned because suite timeout was reached.
14:59:21    > 	at __randomizedtesting.SeedInfo.seed([6245D3965911131D]:0)
14:59:21   1> [2019-01-22T15:59:20,961][INFO ][o.e.t.EvilThreadPoolTests] [testExecutionExceptionOnSinglePrioritizingThreadPoolExecutor] after test
14:59:21 ERROR   0.00s J7 | EvilThreadPoolTests (suite) <<< FAILURES!
14:59:21    > Throwable #1: java.lang.Exception: Suite timeout exceeded (>= 1200000 msec).
14:59:21    > 	at __randomizedtesting.SeedInfo.seed([6245D3965911131D]:0)
14:59:21 Completed [15/16] on J7 in 1207.09s, 10 tests, 2 errors <<< FAILURES!
14:59:21 
14:59:21 Suite: org.elasticsearch.cli.EvilEnvironmentAwareCommandTests
14:59:21 Completed [16/16] on J7 in 0.04s, 1 test
14:59:21 
14:59:21   2> "Finalizer" ID=3 WAITING on java.lang.ref.ReferenceQueue$Lock@231ef7e
14:59:21   2> 	at java.lang.Object.wait(Native Method)
14:59:21   2> 	- waiting on java.lang.ref.ReferenceQueue$Lock@231ef7e
14:59:21   2> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
14:59:21   2> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
14:59:21   2> 	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)
14:59:21   2> "Reference Handler" ID=2 WAITING on java.lang.ref.Reference$Lock@16f311b9
14:59:21   2> 	at java.lang.Object.wait(Native Method)
14:59:21   2> 	- waiting on java.lang.ref.Reference$Lock@16f311b9
14:59:21   2> 	at java.lang.Object.wait(Object.java:502)
14:59:21   2> 	at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
14:59:21   2> 	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
14:59:21   2> "main" ID=1 WAITING on com.carrotsearch.randomizedtesting.RandomizedRunner$2@41ddc935
14:59:21   2> 	at java.lang.Object.wait(Native Method)
14:59:21   2> 	- waiting on com.carrotsearch.randomizedtesting.RandomizedRunner$2@41ddc935
14:59:21   2> 	at java.lang.Thread.join(Thread.java:1252)
14:59:21   2> 	at java.lang.Thread.join(Thread.java:1326)
14:59:21   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:639)
14:59:21   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.run(RandomizedRunner.java:496)
14:59:21   2> 	at com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.execute(SlaveMain.java:269)
14:59:21   2> 	at com.carrotsearch.ant.tasks.junit4.slave.SlaveMain.main(SlaveMain.java:394)
14:59:21   2> 	at com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe.main(SlaveMainSafe.java:13)
14:59:21   2> ^^==============================================
14:59:21   2> REPRODUCE WITH: ./gradlew :qa:evil-tests:unitTest -Dtests.seed=6245D3965911131D -Dtests.class=org.elasticsearch.threadpool.EvilThreadPoolTests -Dtests.method="testExecutionExceptionOnSinglePrioritizingThreadPoolExecutor" -Dtests.security.manager=false -Dtests.locale=es-BO -Dtests.timezone=Africa/Asmara -Dcompiler.java=12 -Druntime.java=8
14:59:21   2> ene 22, 2019 3:59:15 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
14:59:21   2> ADVERTENCIA: Will linger awaiting termination of 4 leaked thread(s).
14:59:21   2> ene 22, 2019 3:59:20 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
14:59:21   2> GRAVE: 4 threads leaked from SUITE scope at org.elasticsearch.threadpool.EvilThreadPoolTests: 
14:59:21   2>    1) Thread[id=23, name=TEST-EvilThreadPoolTests.testExecutionExceptionOnSinglePrioritizingThreadPoolExecutor-seed#[6245D3965911131D], state=WAITING, group=TGRP-EvilThreadPoolTests]
14:59:21   2>         at sun.misc.Unsafe.park(Native Method)
14:59:21   2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
14:59:21   2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
14:59:21   2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
14:59:21   2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
14:59:21   2>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
14:59:21   2>         at org.elasticsearch.threadpool.EvilThreadPoolTests.runExecutionTest(EvilThreadPoolTests.java:359)
14:59:21   2>         at org.elasticsearch.threadpool.EvilThreadPoolTests.checkExecutionException(EvilThreadPoolTests.java:270)
14:59:21   2>         at org.elasticsearch.threadpool.EvilThreadPoolTests.testExecutionExceptionOnSinglePrioritizingThreadPoolExecutor(EvilThreadPoolTests.java:228)
14:59:21   2>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
14:59:21   2>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
14:59:21   2>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
14:59:21   2>         at java.lang.reflect.Method.invoke(Method.java:498)
14:59:21   2>         at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
14:59:21   2>         at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
14:59:21   2>         at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
14:59:21   2>         at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
14:59:21   2>         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
14:59:21   2>         at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
14:59:21   2>         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
14:59:21   2>         at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
14:59:21   2>         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
14:59:21   2>         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
14:59:21   2>         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
14:59:21   2>         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
14:59:21   2>         at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
14:59:21   2>         at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
14:59:21   2>         at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
14:59:21   2>         at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
14:59:21   2>         at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
14:59:21   2>         at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
14:59:21   2>         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
14:59:21   2>         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
14:59:21   2>         at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
14:59:21   2>         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
14:59:21   2>         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
14:59:21   2>         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
14:59:21   2>         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
14:59:21   2>         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
14:59:21   2>         at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
14:59:21   2>         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
14:59:21   2>         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
14:59:21   2>         at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
14:59:21   2>         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
14:59:21   2>         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
14:59:21   2>         at java.lang.Thread.run(Thread.java:748)
14:59:21   2>    2) Thread[id=160, name=test[T#2], state=WAITING, group=TGRP-EvilThreadPoolTests]
14:59:21   2>         at sun.misc.Unsafe.park(Native Method)
14:59:21   2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
14:59:21   2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
14:59:21   2>         at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:550)
14:59:21   2>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
14:59:21   2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
14:59:21   2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
14:59:21   2>         at java.lang.Thread.run(Thread.java:748)
14:59:21   2>    3) Thread[id=158, name=elasticsearch[org.elasticsearch.threadpool.EvilThreadPoolTests][[timer]], state=TIMED_WAITING, group=TGRP-EvilThreadPoolTests]
14:59:21   2>         at java.lang.Thread.sleep(Native Method)
14:59:21   2>         at org.elasticsearch.threadpool.ThreadPool$CachedTimeThread.run(ThreadPool.java:569)
14:59:21   2>    4) Thread[id=161, name=elasticsearch[org.elasticsearch.threadpool.EvilThreadPoolTests][scheduler][T#1], state=WAITING, group=TGRP-EvilThreadPoolTests]
14:59:21   2>         at sun.misc.Unsafe.park(Native Method)
14:59:21   2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
14:59:21   2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
14:59:21   2>         at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
14:59:21   2>         at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
14:59:21   2>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
14:59:21   2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
14:59:21   2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
14:59:21   2>         at java.lang.Thread.run(Thread.java:748)
14:59:21   2> ene 22, 2019 3:59:20 PM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
14:59:21   2> INFORMACIÓN: Starting to interrupt leaked threads:
14:59:21   2>    1) Thread[id=23, name=TEST-EvilThreadPoolTests.testExecutionExceptionOnSinglePrioritizingThreadPoolExecutor-seed#[6245D3965911131D], state=WAITING, group=TGRP-EvilThreadPoolTests]
14:59:21   2>    2) Thread[id=160, name=test[T#2], state=WAITING, group=TGRP-EvilThreadPoolTests]
14:59:21   2>    3) Thread[id=158, name=elasticsearch[org.elasticsearch.threadpool.EvilThreadPoolTests][[timer]], state=TIMED_WAITING, group=TGRP-EvilThreadPoolTests]
14:59:21   2>    4) Thread[id=161, name=elasticsearch[org.elasticsearch.threadpool.EvilThreadPoolTests][scheduler][T#1], state=WAITING, group=TGRP-EvilThreadPoolTests]
14:59:21   2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/workspace/elastic+elasticsearch+master+corretto-periodic/ES_BUILD_JAVA/openjdk12/label/amazon/qa/evil-tests/build/testrun/unitTest/J7/temp/org.elasticsearch.threadpool.EvilThreadPoolTests_6245D3965911131D-001
14:59:21   2> Jan 22, 2019 12:59:20 PM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
14:59:21   2> INFO: All leaked threads terminated.
14:59:21   2> REPRODUCE WITH: ./gradlew :qa:evil-tests:unitTest -Dtests.seed=6245D3965911131D -Dtests.class=org.elasticsearch.threadpool.EvilThreadPoolTests -Dtests.security.manager=false -Dtests.locale=en-US -Dtests.timezone=Etc/UTC -Dcompiler.java=12 -Druntime.java=8
14:59:21   2> NOTE: test params are: codec=Asserting(Lucene80): {}, docValues:{}, maxPointsInLeafNode=918, maxMBSortInHeap=6.374011839271727, sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@3b7b8914), locale=es-BO, timezone=Africa/Asmara
14:59:21   2> NOTE: Linux 4.14.72-73.55.amzn2.x86_64 amd64/Oracle Corporation 1.8.0_192 (64-bit)/cpus=16,threads=1,free=489307448,total=514850816
14:59:21   2> NOTE: All tests run in this JVM: [EvilThreadPoolTests]
14:59:21 
14:59:21 [ant:junit4] JVM J7: stderr was not empty, see: /var/lib/jenkins/workspace/elastic+elasticsearch+master+corretto-periodic/ES_BUILD_JAVA/openjdk12/label/amazon/qa/evil-tests/build/testrun/unitTest/temp/junit4-J7-20190122_123912_92917956035713705242833.syserr

Frequency

10 in the last 24 hours, including PR builds

@alpar-t alpar-t added :Core/Infra/Core Core issues without another label >test-failure Triaged test failures from CI labels Jan 22, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

alpar-t added a commit that referenced this issue Jan 22, 2019
alpar-t added a commit that referenced this issue Jan 22, 2019
@henningandersen
Copy link
Contributor

henningandersen commented Jan 24, 2019

Seems to be a genuine problem with the PrioritizedEsThreadPoolExecutor, given this is the one failing in both builds. This code (test and thread poll) seems to have been touched 17th Jan 2019. As far as I can see, slowness in build system etc. should not be able to cause this, since the timeout is many minutes.

Both:

testExecutionErrorOnSinglePrioritizingThreadPoolExecutor
and
testExecutionExceptionOnSinglePrioritizingThreadPoolExecutor

have been failing recently.

I did not manage to reproduce this locally yet.

@henningandersen
Copy link
Contributor

855801a might be the cause though I have not spotted the error.

@ywelsch ywelsch self-assigned this Jan 24, 2019
@henningandersen
Copy link
Contributor

This is caused by Scheduler not yet logging warnings on exceptions. There is an associated todo in EvilThreadPoolTests.

Working on a solution to both parts, which should stabilize this test.

@henningandersen henningandersen self-assigned this Jan 29, 2019
henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Jan 30, 2019
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.

This is a continuation of elastic#28667, elastic#36317 and also fixes elastic#37708.
@henningandersen
Copy link
Contributor

Opened PR: #38014

henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Jan 30, 2019
Fixed review comments: removed todo, use FutureUtils.cancel and removed
scheduler task decoration since this adds more complexity than it
benefits.

This is a continuation of elastic#28667, elastic#36137 and also fixes elastic#37708.
henningandersen added a commit that referenced this issue Jan 31, 2019
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.

This is a continuation of #28667, #36137 and also fixes #37708.
henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Feb 1, 2019
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.

In this backport to 6.x, source backwards compatibility is maintained
and some of the changes has therefore not been carried out (notably
the signature change on Processor.Parameters.scheduler).

This is a continuation of elastic#28667, elastic#36137 and also fixes elastic#37708.
henningandersen added a commit that referenced this issue Feb 4, 2019
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.

In this backport to 6.x, source backwards compatibility is maintained
and some of the changes has therefore not been carried out (notably
the signature change on Processor.Parameters.scheduler).

This is a continuation of #28667, #36137 and also fixes #37708.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants