Skip to content

Investigate the build issues, focusing on tests #1471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomFinley opened this issue Oct 31, 2018 · 6 comments
Closed

Investigate the build issues, focusing on tests #1471

TomFinley opened this issue Oct 31, 2018 · 6 comments
Assignees
Labels
bug Something isn't working Build Build related issue test related to tests

Comments

@TomFinley
Copy link
Contributor

TomFinley commented Oct 31, 2018

At the time of writing our build system is plagued by a large number of failing tests and other build issues. This impacts our agility since an otherwise valid PR can not pass the test checks for spurious reasons that have nothing to do with the change. It also in turn leads to significant wastage of resources. The goal would be to improve the test error rate.

However, we are vexed somewhat by a lack of information on why these test failures are occurring. In particular, trying to reproduce test failures locally has, at least in my experience, very limited success. For example, in my own investigation into the random failures of MulticlassTreeFeaturizedLRTest on MacOS debug, I was only able to achieve a test failure twice out of some hundreds of runs on a Macbook, and what information I was able to gather was limited.

In the seeming absence of the ability to reliably produce test failures outside of the build machines, we need more information.

  1. Publish the tests logs as an artifact of the build so that we can gather more information. Random build failures: Publish the test logs #1473.

  2. Make the error messages from tests, when they do occur, contain some actually useful information. Random build failures: Make test failure output on numerical comparisons semi-useful #1477.

  3. Create a catalog of failures that occur in builds that in principle should have succeeded. (E.g., builds of master.) This is partially to validate the assumption that tests are the primary problem, as well as to get a sense of what tests are problematic. Random build failures: Catalog the failures #1474.

The preceding is purely information gathering, but at the same time there are some positive steps that can be taken, pending the above.

  1. We already know of some troublesome tests. These should be investigated for the "usual suspects," e.g., failure to set random seeds to a fixed value, having a variable number of threads in training processes, etc. (Which are known, but innocent, sources of run to run variance.)

  2. That the tests seem to fail so readily on the build machines yet are vexingly difficult to make fail locally suggests that there is something about the build environment that is different -- perhaps a different architecture or performance characteristics raise issues or race conditions that are simply not observed on our more performant developer machines. It may therefore be worthwhile to try to get the test environment machines reproduced exactly (down to the environment, processor, memory, everything) to see if that shows any clues.

  3. Most vague, but still useful, the nature of the failures, while mysterious, have not been entirely devoid of clues as to potential causes. I may write more about them in a comment later.

/cc @Zruty0 @eerhardt

@TomFinley TomFinley added bug Something isn't working Build Build related issue test related to tests labels Oct 31, 2018
@eerhardt
Copy link
Member

eerhardt commented Oct 31, 2018

Note for

  1. Create a catalog of failures that occur in builds that in principle should have succeeded. (E.g., builds of master.) This is partially to validate the assumption that tests are the primary problem, as well as to get a sense of what tests are problematic.

We do have a set of builds that should in principle always succeed. After every merge into the master branch, we run a "CI" build that just takes what is now in master and runs the validation build against it. These builds are called "CI Build"s in Azure DevOps:

https://dnceng.visualstudio.com/public/_build?definitionId=104&_a=completed&view=buildsHistory&showFilters=true

image

You can see the above filters (not sure how to provide a link to the query). ALL of these runs "should" have passed, but didn't for one reason or another. These runs weren't against a PR that had failures. These runs were against the current master branch.

There is also an "Analytics" tab where you can see which tests fail the most. You can slice this by refs/heads/master, and see only the test failures for these CI builds:

image

I assume anything less than 90% pass rate is not acceptable (looking at you LightGBM tests).

@TomFinley
Copy link
Contributor Author

Great, thanks @eerhardt , I wasn't quite sure where to see this information. I am opening specific issues to track each of these investigative issues, but I will link back to your comment for this "cataloging" issue. In fact I may as well write that issue right now...

@Ivanidzo4ka
Copy link
Contributor

Is this a thread about timeouts on Mac? If not, I'm hijacking it right now.

I'm a simple man. I wrote code which parse test outputs and looking for Starting test to add something to hashset and Finished test to remove it from hashset
so far:
image
or this:
image

I've check about 10 different test logs which resulted in timeout on mac and in all cases I have this triplet of ImageSmoke, CopyColumns and OnlineGradient present.

@sfilipi
Copy link
Member

sfilipi commented Nov 1, 2018

Starting to take note of tests that hug up:
I am seeing a hung up after: Microsoft.ML.Scenarios.ScenariosTests.TensorFlowTransformCifar

@montebhoover
Copy link
Contributor

Here are instructions for reproducing the Hosted Mac environment for debugging test failures locally: https://dev.azure.com/aifx/public/_settings/agentqueues?queueId=5&_a=agents

image

Might be helpful with #1506

@codemzs
Copy link
Member

codemzs commented Jun 30, 2019

we did improve tests last year to improve pass rate. closing this.

@codemzs codemzs closed this as completed Jun 30, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working Build Build related issue test related to tests
Projects
None yet
Development

No branches or pull requests

7 participants