Investigate the build issues, focusing on tests #1471

TomFinley · 2018-10-31T19:58:20Z

At the time of writing our build system is plagued by a large number of failing tests and other build issues. This impacts our agility since an otherwise valid PR can not pass the test checks for spurious reasons that have nothing to do with the change. It also in turn leads to significant wastage of resources. The goal would be to improve the test error rate.

However, we are vexed somewhat by a lack of information on why these test failures are occurring. In particular, trying to reproduce test failures locally has, at least in my experience, very limited success. For example, in my own investigation into the random failures of MulticlassTreeFeaturizedLRTest on MacOS debug, I was only able to achieve a test failure twice out of some hundreds of runs on a Macbook, and what information I was able to gather was limited.

In the seeming absence of the ability to reliably produce test failures outside of the build machines, we need more information.

Publish the tests logs as an artifact of the build so that we can gather more information. Random build failures: Publish the test logs #1473.
Make the error messages from tests, when they do occur, contain some actually useful information. Random build failures: Make test failure output on numerical comparisons semi-useful #1477.
Create a catalog of failures that occur in builds that in principle should have succeeded. (E.g., builds of master.) This is partially to validate the assumption that tests are the primary problem, as well as to get a sense of what tests are problematic. Random build failures: Catalog the failures #1474.

The preceding is purely information gathering, but at the same time there are some positive steps that can be taken, pending the above.

We already know of some troublesome tests. These should be investigated for the "usual suspects," e.g., failure to set random seeds to a fixed value, having a variable number of threads in training processes, etc. (Which are known, but innocent, sources of run to run variance.)
That the tests seem to fail so readily on the build machines yet are vexingly difficult to make fail locally suggests that there is something about the build environment that is different -- perhaps a different architecture or performance characteristics raise issues or race conditions that are simply not observed on our more performant developer machines. It may therefore be worthwhile to try to get the test environment machines reproduced exactly (down to the environment, processor, memory, everything) to see if that shows any clues.
Most vague, but still useful, the nature of the failures, while mysterious, have not been entirely devoid of clues as to potential causes. I may write more about them in a comment later.

/cc @Zruty0 @eerhardt

The text was updated successfully, but these errors were encountered:

eerhardt · 2018-10-31T20:16:00Z

Note for

Create a catalog of failures that occur in builds that in principle should have succeeded. (E.g., builds of master.) This is partially to validate the assumption that tests are the primary problem, as well as to get a sense of what tests are problematic.

We do have a set of builds that should in principle always succeed. After every merge into the master branch, we run a "CI" build that just takes what is now in master and runs the validation build against it. These builds are called "CI Build"s in Azure DevOps:

https://dnceng.visualstudio.com/public/_build?definitionId=104&_a=completed&view=buildsHistory&showFilters=true

You can see the above filters (not sure how to provide a link to the query). ALL of these runs "should" have passed, but didn't for one reason or another. These runs weren't against a PR that had failures. These runs were against the current master branch.

There is also an "Analytics" tab where you can see which tests fail the most. You can slice this by refs/heads/master, and see only the test failures for these CI builds:

I assume anything less than 90% pass rate is not acceptable (looking at you LightGBM tests).

TomFinley · 2018-10-31T20:28:16Z

Great, thanks @eerhardt , I wasn't quite sure where to see this information. I am opening specific issues to track each of these investigative issues, but I will link back to your comment for this "cataloging" issue. In fact I may as well write that issue right now...

Ivanidzo4ka · 2018-10-31T23:19:49Z

Is this a thread about timeouts on Mac? If not, I'm hijacking it right now.

I'm a simple man. I wrote code which parse test outputs and looking for Starting test to add something to hashset and Finished test to remove it from hashset
so far:

or this:

I've check about 10 different test logs which resulted in timeout on mac and in all cases I have this triplet of ImageSmoke, CopyColumns and OnlineGradient present.

sfilipi · 2018-11-01T05:04:06Z

Starting to take note of tests that hug up:
I am seeing a hung up after: Microsoft.ML.Scenarios.ScenariosTests.TensorFlowTransformCifar

montebhoover · 2018-11-02T17:45:28Z

Here are instructions for reproducing the Hosted Mac environment for debugging test failures locally: https://dev.azure.com/aifx/public/_settings/agentqueues?queueId=5&_a=agents

Might be helpful with #1506

codemzs · 2019-06-30T08:23:35Z

we did improve tests last year to improve pass rate. closing this.

TomFinley added bug Something isn't working Build Build related issue test related to tests labels Oct 31, 2018

TomFinley assigned Ivanidzo4ka, TomFinley, artidoro, sfilipi and montebhoover Oct 31, 2018

TomFinley mentioned this issue Oct 31, 2018

Random build failures: Publish the test logs #1473

Closed

This was referenced Oct 31, 2018

Random build failures: Catalog the failures #1474

Closed

Random build failures: Make test failure output on numerical comparisons semi-useful #1477

Closed

sfilipi mentioned this issue Nov 26, 2018

EntryPointSDCARegression test fails intermittenly in Mac Debug with: longIdx=449, invariants.Length=449 #1726

Closed

sfilipi unassigned TomFinley, sfilipi and montebhoover May 21, 2019

Ivanidzo4ka removed their assignment Jun 3, 2019

codemzs closed this as completed Jun 30, 2019

ghost locked as resolved and limited conversation to collaborators Mar 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate the build issues, focusing on tests #1471

Investigate the build issues, focusing on tests #1471

TomFinley commented Oct 31, 2018 •

edited

Loading

eerhardt commented Oct 31, 2018 •

edited

Loading

TomFinley commented Oct 31, 2018

Ivanidzo4ka commented Oct 31, 2018

sfilipi commented Nov 1, 2018

montebhoover commented Nov 2, 2018

codemzs commented Jun 30, 2019

Investigate the build issues, focusing on tests #1471

Investigate the build issues, focusing on tests #1471

Comments

TomFinley commented Oct 31, 2018 • edited Loading

eerhardt commented Oct 31, 2018 • edited Loading

TomFinley commented Oct 31, 2018

Ivanidzo4ka commented Oct 31, 2018

sfilipi commented Nov 1, 2018

montebhoover commented Nov 2, 2018

codemzs commented Jun 30, 2019

TomFinley commented Oct 31, 2018 •

edited

Loading

eerhardt commented Oct 31, 2018 •

edited

Loading