[CI] InferenceIngestIT.testPipelineIngest and testPathologicalPipelineCreationAndDeletion failures #61564

ywangd · 2020-08-26T00:41:33Z

Build scan:
https://gradle-enterprise.elastic.co/s/2kdpwkkrjc6r4

Repro line:

./gradlew ':x-pack:plugin:ml:qa:native-multi-node-tests:integTest' --tests "org.elasticsearch.xpack.ml.integration.InferenceIngestIT.testPipelineIngest" -Dtests.seed=F132D0E49ECDF4B3 -Dtests.security.manager=true -Dtests.locale=sr-ME -Dtests.timezone=Pacific/Midway -Druntime.java=8

./gradlew ':x-pack:plugin:ml:qa:native-multi-node-tests:integTest' --tests "org.elasticsearch.xpack.ml.integration.InferenceIngestIT.testPathologicalPipelineCreationAndDeletion" -Dtests.seed=F132D0E49ECDF4B3 -Dtests.security.manager=true -Dtests.locale=sr-ME -Dtests.timezone=Pacific/Midway -Druntime.java=8

Reproduces locally?:
No

Applicable branches:

7.x
7.9

Failure history:

These two tests seem only fail for v7 branches. It failed 8 times within last 60 days according to build-stats.

When they fail, there are always failures of `ClassificationIT in the same build scan with "ClusterHealthResponse has timed out" error. Maybe they are related, like one is the cause of the other.

I also noticed we have a previous issue (#54786) for testPipelineIngest but the failure message is different. So I am opening a new issue.

Failure excerpt:


java.lang.AssertionError: |  
-- | --
  | Expected: a string containing "\"cache_miss_count\":3" |  
  | but: was "{"count":1,"trained_model_stats":[{"model_id":"test_classification","pipeline_count":0,"inference_stats":{"failure_count":0,"inference_count":10,"cache_miss_count":2,"missing_all_fields_count":0,"timestamp":1598400435615}}]}" |  

at __randomizedtesting.SeedInfo.seed([F132D0E49ECDF4B3:536E46D26C7D13F0]:0) |  
-- | --
  |   | •••
  |   | at org.elasticsearch.xpack.ml.integration.InferenceIngestIT.lambda$testPipelineIngest$1(InferenceIngestIT.java:178) |  
  |   | •••
  |   | at org.elasticsearch.xpack.ml.integration.InferenceIngestIT.testPipelineIngest(InferenceIngestIT.java:172) |  
  |   | •••

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-08-26T00:41:35Z

Pinging @elastic/ml-core (:ml)

droberts195 · 2020-10-05T09:47:12Z

This is still failing in the same way:

Expected: a string containing "\"cache_miss_count\":30"	
	     but: was "{"count":1,"trained_model_stats":[{"model_id":"test_pathological_classification","pipeline_count":0,"inference_stats":{"failure_count":0,"inference_count":10,"cache_miss_count":29,"missing_all_fields_count":0,"timestamp":1601791172638}}]}"

The build scan for that is: https://gradle-enterprise.elastic.co/s/pk6orftvu2us2

pgomulka · 2020-11-03T08:14:58Z

another one https://gradle-enterprise.elastic.co/s/h5omv5t3fhc36
failed on master

…nts (#65774) Looking over the failure history, it is always the cache miss count that is off. This is mostly ok as all the failures had indicated that there were indeed cache failures and every one of them were fence-post errors. Opting to make the cache miss count check lenient as other stats checked verify consistency. closes #61564

…nts (elastic#65774) Looking over the failure history, it is always the cache miss count that is off. This is mostly ok as all the failures had indicated that there were indeed cache failures and every one of them were fence-post errors. Opting to make the cache miss count check lenient as other stats checked verify consistency. closes elastic#61564

…nts (#65774) (#65815) Looking over the failure history, it is always the cache miss count that is off. This is mostly ok as all the failures had indicated that there were indeed cache failures and every one of them were fence-post errors. Opting to make the cache miss count check lenient as other stats checked verify consistency. closes #61564

ywangd added >test-failure Triaged test failures from CI :ml Machine learning labels Aug 26, 2020

astefan mentioned this issue Oct 1, 2020

[CI] ClassificationEvaluationWithSecurityIT testEvaluate_withSecurity failure #63118

Closed

benwtrent self-assigned this Dec 2, 2020

benwtrent mentioned this issue Dec 2, 2020

[ML] make InferenceIngestIT more lenient when checking cache miss counts #65774

Merged

benwtrent closed this as completed in #65774 Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] InferenceIngestIT.testPipelineIngest and testPathologicalPipelineCreationAndDeletion failures #61564

[CI] InferenceIngestIT.testPipelineIngest and testPathologicalPipelineCreationAndDeletion failures #61564

ywangd commented Aug 26, 2020

elasticmachine commented Aug 26, 2020

droberts195 commented Oct 5, 2020

pgomulka commented Nov 3, 2020

[CI] InferenceIngestIT.testPipelineIngest and testPathologicalPipelineCreationAndDeletion failures #61564

[CI] InferenceIngestIT.testPipelineIngest and testPathologicalPipelineCreationAndDeletion failures #61564

Comments

ywangd commented Aug 26, 2020

elasticmachine commented Aug 26, 2020

droberts195 commented Oct 5, 2020

pgomulka commented Nov 3, 2020