Skip to content

[CI] AssertionError in ShardSearchStats.onFetchPhase #70968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
droberts195 opened this issue Mar 29, 2021 · 1 comment · Fixed by #71446
Closed

[CI] AssertionError in ShardSearchStats.onFetchPhase #70968

droberts195 opened this issue Mar 29, 2021 · 1 comment · Fixed by #71446
Assignees
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team >test-failure Triaged test failures from CI

Comments

@droberts195
Copy link
Contributor

The following assertion tripped in the core search code while doing a search in an ML test on the 7.x branch:

Build scan:

https://gradle-enterprise.elastic.co/s/innofaaydb3fc

Repro line:

./gradlew ':x-pack:plugin:ml:internalClusterTest' --tests "org.elasticsearch.xpack.ml.integration.MlDistributedFailureIT.testJobRelocationIsMemoryAware" -Dtests.seed=1CE7934034A0817E -Dtests.security.manager=true -Dtests.locale=be-BY -Dtests.timezone=America/La_Paz -Druntime.java=8

Reproduces locally?:

No

Applicable branches:

master, 7.x

Failure history:

https://build-stats.elastic.co/app/kibana#/discover?_g=(refreshInterval:(pause:!t,value:0),time:(from:now-90d,mode:quick,to:now))&_a=(columns:!(_source),index:e58bf320-7efd-11e8-bf69-63c8ef516157,interval:auto,query:(language:lucene,query:'%22Uncaught%20exception%20in%20thread%22%20AND%20ShardSearchStats.java%20AND%20112'),sort:!(time,desc))

Same thing happened in a PR build on 24th February, but in BasicDistributedJobsIT.testMaxConcurrentJobAllocations instead of MlDistributedFailureIT.testJobRelocationIsMemoryAware.

Both tests that have triggered this are internal cluster tests - maybe there's something special about internal cluster tests that violates the expected invariants of search stats?

Failure excerpt:

скв 28, 2021 7:37:22 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
WARNING: Uncaught exception in thread: Thread[elasticsearch[node_t0][search][T#1],5,TGRP-MlDistributedFailureIT]
java.lang.AssertionError
	at __randomizedtesting.SeedInfo.seed([1CE7934034A0817E]:0)
	at org.elasticsearch.index.search.stats.ShardSearchStats.lambda$onFetchPhase$5(ShardSearchStats.java:112)
	at org.elasticsearch.index.search.stats.ShardSearchStats.computeStats(ShardSearchStats.java:117)
	at org.elasticsearch.index.search.stats.ShardSearchStats.onFetchPhase(ShardSearchStats.java:109)
	at org.elasticsearch.index.shard.SearchOperationListener$CompositeListener.onFetchPhase(SearchOperationListener.java:178)
	at org.elasticsearch.search.SearchService$SearchOperationListenerExecutor.close(SearchService.java:1415)
	at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:464)
	at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:435)
	at org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:398)
	at org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)
	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
@droberts195 droberts195 added :Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI labels Mar 29, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Mar 29, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@dnhatn dnhatn self-assigned this Apr 7, 2021
dnhatn added a commit that referenced this issue Apr 13, 2021
A CounterMetric is used to track the number of completed and outstanding 
items, for example, the number of executed refreshes, the currently used
memory by indexing, the current pending search requests. In all cases,
the current count of CounterMetric is always non-negative.

However, as this metric is implemented using a LongAdder, the returned
count is NOT an atomic snapshot; invocation in the absence of concurrent
updates returns an accurate result, but concurrent updates that occur
while the sum is being calculated might not be incorporated.

We can replace LongAdder with AtomicLong, but this commit chooses to 
continue using LongAdder but returns 0 when the sum value is negative.

Relates #52411
Closes #70968
dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Apr 14, 2021
A CounterMetric is used to track the number of completed and outstanding 
items, for example, the number of executed refreshes, the currently used
memory by indexing, the current pending search requests. In all cases,
the current count of CounterMetric is always non-negative.

However, as this metric is implemented using a LongAdder, the returned
count is NOT an atomic snapshot; invocation in the absence of concurrent
updates returns an accurate result, but concurrent updates that occur
while the sum is being calculated might not be incorporated.

We can replace LongAdder with AtomicLong, but this commit chooses to 
continue using LongAdder but returns 0 when the sum value is negative.

Relates elastic#52411
Closes elastic#70968
dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Apr 14, 2021
A CounterMetric is used to track the number of completed and outstanding 
items, for example, the number of executed refreshes, the currently used
memory by indexing, the current pending search requests. In all cases,
the current count of CounterMetric is always non-negative.

However, as this metric is implemented using a LongAdder, the returned
count is NOT an atomic snapshot; invocation in the absence of concurrent
updates returns an accurate result, but concurrent updates that occur
while the sum is being calculated might not be incorporated.

We can replace LongAdder with AtomicLong, but this commit chooses to 
continue using LongAdder but returns 0 when the sum value is negative.

Relates elastic#52411
Closes elastic#70968
dnhatn added a commit that referenced this issue Apr 14, 2021
A CounterMetric is used to track the number of completed and outstanding
items, for example, the number of executed refreshes, the currently used
memory by indexing, the current pending search requests. In all cases,
the current count of CounterMetric is always non-negative.

However, as this metric is implemented using a LongAdder, the returned
count is NOT an atomic snapshot; invocation in the absence of concurrent
updates returns an accurate result, but concurrent updates that occur
while the sum is being calculated might not be incorporated.

We can replace LongAdder with AtomicLong, but this commit chooses to
continue using LongAdder but returns 0 when the sum value is negative.

Relates #52411
Closes #70968
dnhatn added a commit that referenced this issue Apr 14, 2021
A CounterMetric is used to track the number of completed and outstanding 
items, for example, the number of executed refreshes, the currently used
memory by indexing, the current pending search requests. In all cases,
the current count of CounterMetric is always non-negative.

However, as this metric is implemented using a LongAdder, the returned
count is NOT an atomic snapshot; invocation in the absence of concurrent
updates returns an accurate result, but concurrent updates that occur
while the sum is being calculated might not be incorporated.

We can replace LongAdder with AtomicLong, but this commit chooses to 
continue using LongAdder but returns 0 when the sum value is negative.

Relates #52411
Closes #70968
dnhatn added a commit that referenced this issue Apr 14, 2021
A CounterMetric is used to track the number of completed and outstanding
items, for example, the number of executed refreshes, the currently used
memory by indexing, the current pending search requests. In all cases,
the current count of CounterMetric is always non-negative.

However, as this metric is implemented using a LongAdder, the returned
count is NOT an atomic snapshot; invocation in the absence of concurrent
updates returns an accurate result, but concurrent updates that occur
while the sum is being calculated might not be incorporated.

We can replace LongAdder with AtomicLong, but this commit chooses to
continue using LongAdder but returns 0 when the sum value is negative.

Relates #52411
Closes #70968
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants