[Tests] Make testEngineGCDeletesSetting deterministic #38942

matriv · 2019-02-15T11:18:44Z

InternalEngine.resolveDocVersion() uses relativeTimeInMillis() from
ThreadPool so it needs, the cached time to be advanced. Add a check
to ensure that and decrease the thread_pool.estimated_time_interval
to 1msec to prevent long running times for the test.

Fixes: #38874

`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <[email protected]>

elasticmachine · 2019-02-15T11:18:45Z

Pinging @elastic/es-distributed

jasontedor

Thanks for diving into this. I left some questions.

jasontedor · 2019-02-15T11:42:02Z

server/src/test/java/org/elasticsearch/indices/settings/UpdateSettingsIT.java

+    @Override
+    protected Settings nodeSettings(int nodeOrdinal) {
+        return Settings.builder().put(super.nodeSettings(nodeOrdinal))
+            .put("thread_pool.estimated_time_interval", TimeValue.timeValueMillis(1))


This effectively makes this thread busy spin for this entire test suite. On a machine with a low core count or that is otherwise busy (think how we fork tests across JVMs) it might be too much?

We can increase it a bit I guess. The reason of changing the setting here, is to avoid adding a public test-only method to force update the cached time -> to avoid too many iterations in the while loop.

Maybe we can change the logic of the cache - if this is 0, it disables caching.

@bleskes Something like this: 7fed99a ?

jasontedor · 2019-02-15T11:43:06Z

server/src/test/java/org/elasticsearch/indices/settings/UpdateSettingsIT.java

+                time2 = tPool.relativeTimeInMillis();
+            }
+        }
+
        // delete is should not be in cache
        assertThrows(client().prepareIndex("test", "type", "1").setSource("f", 3).setIfSeqNo(seqNo).setIfPrimaryTerm(primaryTerm),


Wouldn’t it be enough to assert busy and remove the sleep?

I guess so, this solution just makes more visible of what's happening and less "brute force".

The goal of the test is to make sure that once time moves the delete is forgotten. If we busy spin on the indexing request (instead of on time - which is what I think Jason refers to with sleep), we will have different semantics as some indexing ops may succeed, changing the dynamics of the test (it now will check that a CASed index operation fails if it's base is an index op, rather than a delete op).

ah, yeah. Didn't even think that the assertBusy would potentially execute the prepareIndex request multiple times.

@bleskes Sorry to be unclear. I meant busy spin waiting for the cached time to advance. So instead of sleeping for it to happen, assert that it has happened, busily since it happens in the background.

@jasontedor You mean this: 9e8c531#diff-2f68a8c77a0935a6ec75d0cc4878e86aR466
as also proposed by @DaveCTurner ?

DaveCTurner

I left some suggestions, but this looks generally good to me.

DaveCTurner · 2019-02-18T09:33:42Z

server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java

         */
        long relativeTimeInMillis() {
-            return relativeMillis;
+            if (running) {


I'd slightly prefer

Suggested change

if (running) {

if (0 < interval) {

since this is what the Javadoc says.

I chose to use the boolean running (properly set in the constructor) to avoid the comparison on every call, and it's used in the same way to avoid having a "running" thread that updates those values.

I don't clearly understand why avoiding the comparison is a good thing. Is it a performance question? Are you sure that a comparison between a constant and a final field is worse than a second volatile read?

My thinking was yes the performance, but didn't think about the volatile.
But also to have the logic that when interval == 0 -> running = false so that we avoid having a running thread and we have a common check in all 3 places (the methods and the run()).

If you think that the interval check is better, I'll happily change. And if we do that, should the code in the run() change like: while(running && 0 < interval) ? So that the running flag is just an external means of controlling the thread and not mixed up with the interval value?

Yep, sounds like a good idea, thanks.

DaveCTurner · 2019-02-18T09:36:41Z

server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java

         */
        long absoluteTimeInMillis() {
-            return absoluteMillis;
+            if (running) {


Similarly, I'd prefer

Suggested change

if (running) {

if (0 < interval) {

DaveCTurner · 2019-02-18T09:52:11Z

server/src/test/java/org/elasticsearch/indices/settings/UpdateSettingsIT.java

+        for (ThreadPool tPool : internalCluster().getInstances(ThreadPool.class)) {
+            long time1 = tPool.relativeTimeInMillis();
+            long time2 = tPool.relativeTimeInMillis();
+            while (time1 == time2) {


I think I'd have written this loop as the following, since there's no need for it to be a tight loop:

assertBusy(() -> assertThat(tPool.relativeTimeInMillis(), greaterThan(time1)));

It is, admittedly, somewhat of a question of taste.

matriv · 2019-02-18T10:13:25Z

@jasontedor @bleskes @DaveCTurner Thank you for the feedback! Please check again.

DaveCTurner

LGTM, thanks @matriv

jasontedor

I left some comments.

jasontedor · 2019-02-18T17:50:25Z

server/src/test/java/org/elasticsearch/indices/settings/UpdateSettingsIT.java

+                time2 = tPool.relativeTimeInMillis();
+            }
+        }
+
        // delete is should not be in cache
        assertThrows(client().prepareIndex("test", "type", "1").setSource("f", 3).setIfSeqNo(seqNo).setIfPrimaryTerm(primaryTerm),


@bleskes Sorry to be unclear. I meant busy spin waiting for the cached time to advance. So instead of sleeping for it to happen, assert that it has happened, busily since it happens in the background.

jasontedor · 2019-02-18T17:50:45Z

server/src/test/java/org/elasticsearch/indices/settings/UpdateSettingsIT.java

-        Thread.sleep(300); // wait for cache time to change TODO: this needs to be solved better. To be discussed.
+
+        // Make sure the time has advanced for InternalEngine#resolveDocVersion()
+        for (ThreadPool tPool : internalCluster().getInstances(ThreadPool.class)) {


Can we please use names consistent with the style in the codebase? For example, this would be threadPool.

jasontedor · 2019-02-18T17:50:57Z

server/src/test/java/org/elasticsearch/indices/settings/UpdateSettingsIT.java

+
+        // Make sure the time has advanced for InternalEngine#resolveDocVersion()
+        for (ThreadPool tPool : internalCluster().getInstances(ThreadPool.class)) {
+            long time1 = tPool.relativeTimeInMillis();


Can we use a clearer name than time1?

matriv · 2019-02-19T13:18:08Z

@elasticmachine run elasticsearch-ci/2

matriv · 2019-02-20T16:45:26Z

@jasontedor please check again

jasontedor · 2019-02-20T23:30:57Z

server/src/main/java/org/elasticsearch/threadpool/ThreadPool.java

@@ -555,22 +555,36 @@ public String toString() {
        /**
         * Return the current time used for relative calculations. This is
         * {@link System#nanoTime()} truncated to milliseconds.
+         * <p>
+         * If {@link ThreadPool#ESTIMATED_TIME_INTERVAL_SETTING} is set to 0


Should we make ThreadPool#ESTIMATED_TIME_INTERVAL_SETTING a setting that has 0 as an inclusive lower bound? I am fine with that in a follow up.

jasontedor

LGTM.

`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <[email protected]>

`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: #38874 Co-authored-by: Boaz Leskes <[email protected]>

* elastic/master: Ensure index commit released when testing timeouts (elastic#39273) Avoid using TimeWarp in TransformIntegrationTests. (elastic#39277) Fixed missed stopping of SchedulerEngine (elastic#39193) [CI] Mute CcrRetentionLeaseIT.testRetentionLeaseIsRenewedDuringRecovery (elastic#39269) Muting AutoFollowIT.testAutoFollowManyIndices (elastic#39264) Clarify the use of sleep in CCR test Fix testCannotShrinkLeaderIndex (elastic#38529) Fix CCR tests that manipulate transport requests Align generated release notes with doc standards (elastic#39234) Mute test (elastic#39248) ReadOnlyEngine should update translog recovery state information (elastic#39238) Wrap accounting breaker check in assertBusy (elastic#39211) Simplify and Fix Synchronization in InternalTestCluster (elastic#39168) [Tests] Make testEngineGCDeletesSetting deterministic (elastic#38942) Extend nextDoc to delegate to the wrapped doc-value iterator for date_nanos (elastic#39176) Change ShardFollowTask to reuse common serialization logic (elastic#39094) Replace superfluous usage of Counter with Supplier (elastic#39048) Disable bwc tests for elastic#39094

`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <[email protected]>

`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: #38874 Co-authored-by: Boaz Leskes <[email protected]>

matriv added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >test-failure Triaged test failures from CI v7.0.0 v8.0.0 v7.2.0 labels Feb 15, 2019

matriv requested a review from bleskes February 15, 2019 11:18

matriv requested a review from DaveCTurner February 15, 2019 11:26

matriv mentioned this pull request Feb 15, 2019

UpdateSettingsIT#testEngineGCDeletesSetting failure #38874

Closed

jasontedor reviewed Feb 15, 2019

View reviewed changes

matriv added 2 commits February 18, 2019 10:44

Disable time cache on interval == 0

7fed99a

remove unused import

2541491

DaveCTurner reviewed Feb 18, 2019

View reviewed changes

Address comment

9e8c531

Address comment

1fe3866

DaveCTurner approved these changes Feb 18, 2019

View reviewed changes

jasontedor requested changes Feb 18, 2019

View reviewed changes

Rename variables

1d88ff0

matriv requested a review from jasontedor February 19, 2019 13:16

jasontedor reviewed Feb 20, 2019

View reviewed changes

jasontedor approved these changes Feb 20, 2019

View reviewed changes

Make interval setting accept values >= 0

c414476

matriv merged commit b5ac16c into elastic:master Feb 21, 2019

matriv deleted the mt/fix-38874 branch February 21, 2019 10:53

matriv added the backport pending label Feb 21, 2019

matriv removed the backport pending label Feb 21, 2019

jakelandis added v7.0.0-rc2 and removed v7.0.0 labels Apr 3, 2019

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

[Tests] Make testEngineGCDeletesSetting deterministic #38942

[Tests] Make testEngineGCDeletesSetting deterministic #38942

Uh oh!

Conversation

matriv commented Feb 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Feb 15, 2019

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matriv Feb 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matriv Feb 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matriv commented Feb 18, 2019

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matriv commented Feb 19, 2019

Uh oh!

matriv commented Feb 20, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasontedor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

matriv commented Feb 15, 2019 •

edited

Loading

matriv Feb 15, 2019 •

edited

Loading

matriv Feb 18, 2019 •

edited

Loading