Skip to content

[Tests] Make testEngineGCDeletesSetting deterministic #38942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 21, 2019

Conversation

matriv
Copy link
Contributor

@matriv matriv commented Feb 15, 2019

InternalEngine.resolveDocVersion() uses relativeTimeInMillis() from
ThreadPool so it needs, the cached time to be advanced. Add a check
to ensure that and decrease the thread_pool.estimated_time_interval
to 1msec to prevent long running times for the test.

Fixes: #38874

`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from
`ThreadPool` so it needs, the cached time to be advanced. Add a check
to ensure that and decrease the `thread_pool.estimated_time_interval`
to 1msec to prevent long running times for the test.

Fixes: elastic#38874

Co-authored-by: Boaz Leskes <[email protected]>
@matriv matriv added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >test-failure Triaged test failures from CI v7.0.0 v8.0.0 v7.2.0 labels Feb 15, 2019
@matriv matriv requested a review from bleskes February 15, 2019 11:18
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for diving into this. I left some questions.

@Override
protected Settings nodeSettings(int nodeOrdinal) {
return Settings.builder().put(super.nodeSettings(nodeOrdinal))
.put("thread_pool.estimated_time_interval", TimeValue.timeValueMillis(1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This effectively makes this thread busy spin for this entire test suite. On a machine with a low core count or that is otherwise busy (think how we fork tests across JVMs) it might be too much?

Copy link
Contributor Author

@matriv matriv Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can increase it a bit I guess. The reason of changing the setting here, is to avoid adding a public test-only method to force update the cached time -> to avoid too many iterations in the while loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can change the logic of the cache - if this is 0, it disables caching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bleskes Something like this: 7fed99a ?

time2 = tPool.relativeTimeInMillis();
}
}

// delete is should not be in cache
assertThrows(client().prepareIndex("test", "type", "1").setSource("f", 3).setIfSeqNo(seqNo).setIfPrimaryTerm(primaryTerm),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn’t it be enough to assert busy and remove the sleep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so, this solution just makes more visible of what's happening and less "brute force".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of the test is to make sure that once time moves the delete is forgotten. If we busy spin on the indexing request (instead of on time - which is what I think Jason refers to with sleep), we will have different semantics as some indexing ops may succeed, changing the dynamics of the test (it now will check that a CASed index operation fails if it's base is an index op, rather than a delete op).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yeah. Didn't even think that the assertBusy would potentially execute the prepareIndex request multiple times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bleskes Sorry to be unclear. I meant busy spin waiting for the cached time to advance. So instead of sleeping for it to happen, assert that it has happened, busily since it happens in the background.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some suggestions, but this looks generally good to me.

*/
long relativeTimeInMillis() {
return relativeMillis;
if (running) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd slightly prefer

Suggested change
if (running) {
if (0 < interval) {

since this is what the Javadoc says.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to use the boolean running (properly set in the constructor) to avoid the comparison on every call, and it's used in the same way to avoid having a "running" thread that updates those values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't clearly understand why avoiding the comparison is a good thing. Is it a performance question? Are you sure that a comparison between a constant and a final field is worse than a second volatile read?

Copy link
Contributor Author

@matriv matriv Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was yes the performance, but didn't think about the volatile.
But also to have the logic that when interval == 0 -> running = false so that we avoid having a running thread and we have a common check in all 3 places (the methods and the run()).

If you think that the interval check is better, I'll happily change. And if we do that, should the code in the run() change like: while(running && 0 < interval) ? So that the running flag is just an external means of controlling the thread and not mixed up with the interval value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sounds like a good idea, thanks.

*/
long absoluteTimeInMillis() {
return absoluteMillis;
if (running) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I'd prefer

Suggested change
if (running) {
if (0 < interval) {

for (ThreadPool tPool : internalCluster().getInstances(ThreadPool.class)) {
long time1 = tPool.relativeTimeInMillis();
long time2 = tPool.relativeTimeInMillis();
while (time1 == time2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd have written this loop as the following, since there's no need for it to be a tight loop:

assertBusy(() -> assertThat(tPool.relativeTimeInMillis(), greaterThan(time1)));

It is, admittedly, somewhat of a question of taste.

@matriv
Copy link
Contributor Author

matriv commented Feb 18, 2019

@jasontedor @bleskes @DaveCTurner Thank you for the feedback! Please check again.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @matriv

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments.

time2 = tPool.relativeTimeInMillis();
}
}

// delete is should not be in cache
assertThrows(client().prepareIndex("test", "type", "1").setSource("f", 3).setIfSeqNo(seqNo).setIfPrimaryTerm(primaryTerm),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bleskes Sorry to be unclear. I meant busy spin waiting for the cached time to advance. So instead of sleeping for it to happen, assert that it has happened, busily since it happens in the background.

Thread.sleep(300); // wait for cache time to change TODO: this needs to be solved better. To be discussed.

// Make sure the time has advanced for InternalEngine#resolveDocVersion()
for (ThreadPool tPool : internalCluster().getInstances(ThreadPool.class)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please use names consistent with the style in the codebase? For example, this would be threadPool.


// Make sure the time has advanced for InternalEngine#resolveDocVersion()
for (ThreadPool tPool : internalCluster().getInstances(ThreadPool.class)) {
long time1 = tPool.relativeTimeInMillis();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a clearer name than time1?

@matriv matriv requested a review from jasontedor February 19, 2019 13:16
@matriv
Copy link
Contributor Author

matriv commented Feb 19, 2019

@elasticmachine run elasticsearch-ci/2

@matriv
Copy link
Contributor Author

matriv commented Feb 20, 2019

@jasontedor please check again

@@ -555,22 +555,36 @@ public String toString() {
/**
* Return the current time used for relative calculations. This is
* {@link System#nanoTime()} truncated to milliseconds.
* <p>
* If {@link ThreadPool#ESTIMATED_TIME_INTERVAL_SETTING} is set to 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make ThreadPool#ESTIMATED_TIME_INTERVAL_SETTING a setting that has 0 as an inclusive lower bound? I am fine with that in a follow up.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@matriv matriv merged commit b5ac16c into elastic:master Feb 21, 2019
@matriv matriv deleted the mt/fix-38874 branch February 21, 2019 10:53
matriv added a commit to matriv/elasticsearch that referenced this pull request Feb 21, 2019
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from
`ThreadPool` so it needs, the cached time to be advanced. Add a check
to ensure that and decrease the `thread_pool.estimated_time_interval`
to 1msec to prevent long running times for the test.

Fixes: elastic#38874

Co-authored-by: Boaz Leskes <[email protected]>
matriv added a commit to matriv/elasticsearch that referenced this pull request Feb 21, 2019
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from
`ThreadPool` so it needs, the cached time to be advanced. Add a check
to ensure that and decrease the `thread_pool.estimated_time_interval`
to 1msec to prevent long running times for the test.

Fixes: elastic#38874

Co-authored-by: Boaz Leskes <[email protected]>
matriv added a commit that referenced this pull request Feb 21, 2019
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from
`ThreadPool` so it needs, the cached time to be advanced. Add a check
to ensure that and decrease the `thread_pool.estimated_time_interval`
to 1msec to prevent long running times for the test.

Fixes: #38874

Co-authored-by: Boaz Leskes <[email protected]>
matriv added a commit that referenced this pull request Feb 21, 2019
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from
`ThreadPool` so it needs, the cached time to be advanced. Add a check
to ensure that and decrease the `thread_pool.estimated_time_interval`
to 1msec to prevent long running times for the test.

Fixes: #38874

Co-authored-by: Boaz Leskes <[email protected]>
jasontedor added a commit to DaveCTurner/elasticsearch that referenced this pull request Feb 22, 2019
* elastic/master:
  Ensure index commit released when testing timeouts (elastic#39273)
  Avoid using TimeWarp in TransformIntegrationTests. (elastic#39277)
  Fixed missed stopping of SchedulerEngine (elastic#39193)
  [CI] Mute CcrRetentionLeaseIT.testRetentionLeaseIsRenewedDuringRecovery (elastic#39269)
  Muting AutoFollowIT.testAutoFollowManyIndices (elastic#39264)
  Clarify the use of sleep in CCR test
  Fix testCannotShrinkLeaderIndex (elastic#38529)
  Fix CCR tests that manipulate transport requests
  Align generated release notes with doc standards (elastic#39234)
  Mute test (elastic#39248)
  ReadOnlyEngine should update translog recovery state information (elastic#39238)
  Wrap accounting breaker check in assertBusy (elastic#39211)
  Simplify and Fix Synchronization in InternalTestCluster (elastic#39168)
  [Tests] Make testEngineGCDeletesSetting deterministic (elastic#38942)
  Extend nextDoc to delegate to the wrapped doc-value iterator for date_nanos (elastic#39176)
  Change ShardFollowTask to reuse common serialization logic (elastic#39094)
  Replace superfluous usage of Counter with Supplier (elastic#39048)
  Disable bwc tests for elastic#39094
weizijun pushed a commit to weizijun/elasticsearch that referenced this pull request Feb 22, 2019
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from
`ThreadPool` so it needs, the cached time to be advanced. Add a check
to ensure that and decrease the `thread_pool.estimated_time_interval`
to 1msec to prevent long running times for the test.

Fixes: elastic#38874

Co-authored-by: Boaz Leskes <[email protected]>
weizijun pushed a commit to weizijun/elasticsearch that referenced this pull request Feb 22, 2019
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from
`ThreadPool` so it needs, the cached time to be advanced. Add a check
to ensure that and decrease the `thread_pool.estimated_time_interval`
to 1msec to prevent long running times for the test.

Fixes: elastic#38874

Co-authored-by: Boaz Leskes <[email protected]>
dnhatn pushed a commit that referenced this pull request Mar 18, 2020
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from
`ThreadPool` so it needs, the cached time to be advanced. Add a check
to ensure that and decrease the `thread_pool.estimated_time_interval`
to 1msec to prevent long running times for the test.

Fixes: #38874

Co-authored-by: Boaz Leskes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >test-failure Triaged test failures from CI v7.0.0-rc2 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UpdateSettingsIT#testEngineGCDeletesSetting failure
6 participants