-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[Tests] Make testEngineGCDeletesSetting deterministic #38942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <[email protected]>
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for diving into this. I left some questions.
@Override | ||
protected Settings nodeSettings(int nodeOrdinal) { | ||
return Settings.builder().put(super.nodeSettings(nodeOrdinal)) | ||
.put("thread_pool.estimated_time_interval", TimeValue.timeValueMillis(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively makes this thread busy spin for this entire test suite. On a machine with a low core count or that is otherwise busy (think how we fork tests across JVMs) it might be too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can increase it a bit I guess. The reason of changing the setting here, is to avoid adding a public test-only method to force update the cached time -> to avoid too many iterations in the while loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can change the logic of the cache - if this is 0, it disables caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time2 = tPool.relativeTimeInMillis(); | ||
} | ||
} | ||
|
||
// delete is should not be in cache | ||
assertThrows(client().prepareIndex("test", "type", "1").setSource("f", 3).setIfSeqNo(seqNo).setIfPrimaryTerm(primaryTerm), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn’t it be enough to assert busy and remove the sleep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so, this solution just makes more visible of what's happening and less "brute force".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of the test is to make sure that once time moves the delete is forgotten. If we busy spin on the indexing request (instead of on time - which is what I think Jason refers to with sleep), we will have different semantics as some indexing ops may succeed, changing the dynamics of the test (it now will check that a CASed index operation fails if it's base is an index op, rather than a delete op).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yeah. Didn't even think that the assertBusy would potentially execute the prepareIndex request multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bleskes Sorry to be unclear. I meant busy spin waiting for the cached time to advance. So instead of sleeping for it to happen, assert that it has happened, busily since it happens in the background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasontedor You mean this: 9e8c531#diff-2f68a8c77a0935a6ec75d0cc4878e86aR466
as also proposed by @DaveCTurner ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some suggestions, but this looks generally good to me.
*/ | ||
long relativeTimeInMillis() { | ||
return relativeMillis; | ||
if (running) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd slightly prefer
if (running) { | |
if (0 < interval) { |
since this is what the Javadoc says.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to use the boolean running
(properly set in the constructor) to avoid the comparison on every call, and it's used in the same way to avoid having a "running" thread that updates those values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't clearly understand why avoiding the comparison is a good thing. Is it a performance question? Are you sure that a comparison between a constant and a final
field is worse than a second volatile
read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was yes the performance, but didn't think about the volatile.
But also to have the logic that when interval == 0
-> running = false
so that we avoid having a running thread and we have a common check in all 3 places (the methods and the run()
).
If you think that the interval check is better, I'll happily change. And if we do that, should the code in the run() change like: while(running && 0 < interval)
? So that the running flag is just an external means of controlling the thread and not mixed up with the interval value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sounds like a good idea, thanks.
*/ | ||
long absoluteTimeInMillis() { | ||
return absoluteMillis; | ||
if (running) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I'd prefer
if (running) { | |
if (0 < interval) { |
for (ThreadPool tPool : internalCluster().getInstances(ThreadPool.class)) { | ||
long time1 = tPool.relativeTimeInMillis(); | ||
long time2 = tPool.relativeTimeInMillis(); | ||
while (time1 == time2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd have written this loop as the following, since there's no need for it to be a tight loop:
assertBusy(() -> assertThat(tPool.relativeTimeInMillis(), greaterThan(time1)));
It is, admittedly, somewhat of a question of taste.
@jasontedor @bleskes @DaveCTurner Thank you for the feedback! Please check again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @matriv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments.
time2 = tPool.relativeTimeInMillis(); | ||
} | ||
} | ||
|
||
// delete is should not be in cache | ||
assertThrows(client().prepareIndex("test", "type", "1").setSource("f", 3).setIfSeqNo(seqNo).setIfPrimaryTerm(primaryTerm), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bleskes Sorry to be unclear. I meant busy spin waiting for the cached time to advance. So instead of sleeping for it to happen, assert that it has happened, busily since it happens in the background.
Thread.sleep(300); // wait for cache time to change TODO: this needs to be solved better. To be discussed. | ||
|
||
// Make sure the time has advanced for InternalEngine#resolveDocVersion() | ||
for (ThreadPool tPool : internalCluster().getInstances(ThreadPool.class)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please use names consistent with the style in the codebase? For example, this would be threadPool
.
|
||
// Make sure the time has advanced for InternalEngine#resolveDocVersion() | ||
for (ThreadPool tPool : internalCluster().getInstances(ThreadPool.class)) { | ||
long time1 = tPool.relativeTimeInMillis(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a clearer name than time1
?
@elasticmachine run elasticsearch-ci/2 |
@jasontedor please check again |
@@ -555,22 +555,36 @@ public String toString() { | |||
/** | |||
* Return the current time used for relative calculations. This is | |||
* {@link System#nanoTime()} truncated to milliseconds. | |||
* <p> | |||
* If {@link ThreadPool#ESTIMATED_TIME_INTERVAL_SETTING} is set to 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make ThreadPool#ESTIMATED_TIME_INTERVAL_SETTING
a setting that has 0
as an inclusive lower bound? I am fine with that in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <[email protected]>
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <[email protected]>
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: #38874 Co-authored-by: Boaz Leskes <[email protected]>
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: #38874 Co-authored-by: Boaz Leskes <[email protected]>
* elastic/master: Ensure index commit released when testing timeouts (elastic#39273) Avoid using TimeWarp in TransformIntegrationTests. (elastic#39277) Fixed missed stopping of SchedulerEngine (elastic#39193) [CI] Mute CcrRetentionLeaseIT.testRetentionLeaseIsRenewedDuringRecovery (elastic#39269) Muting AutoFollowIT.testAutoFollowManyIndices (elastic#39264) Clarify the use of sleep in CCR test Fix testCannotShrinkLeaderIndex (elastic#38529) Fix CCR tests that manipulate transport requests Align generated release notes with doc standards (elastic#39234) Mute test (elastic#39248) ReadOnlyEngine should update translog recovery state information (elastic#39238) Wrap accounting breaker check in assertBusy (elastic#39211) Simplify and Fix Synchronization in InternalTestCluster (elastic#39168) [Tests] Make testEngineGCDeletesSetting deterministic (elastic#38942) Extend nextDoc to delegate to the wrapped doc-value iterator for date_nanos (elastic#39176) Change ShardFollowTask to reuse common serialization logic (elastic#39094) Replace superfluous usage of Counter with Supplier (elastic#39048) Disable bwc tests for elastic#39094
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <[email protected]>
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: elastic#38874 Co-authored-by: Boaz Leskes <[email protected]>
`InternalEngine.resolveDocVersion()` uses `relativeTimeInMillis()` from `ThreadPool` so it needs, the cached time to be advanced. Add a check to ensure that and decrease the `thread_pool.estimated_time_interval` to 1msec to prevent long running times for the test. Fixes: #38874 Co-authored-by: Boaz Leskes <[email protected]>
InternalEngine.resolveDocVersion()
usesrelativeTimeInMillis()
fromThreadPool
so it needs, the cached time to be advanced. Add a checkto ensure that and decrease the
thread_pool.estimated_time_interval
to 1msec to prevent long running times for the test.
Fixes: #38874