You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Companion issue to #1471, specific to making the test failure messages provide more helpful information than they do right now.
Many of our tests are baseline tests -- that is, they produce a file, then we compare the two files. A while ago we changed the baseline test infrastructure a bit, so that rather than insisting on a perfect match, they only had to match numbers within a certain tolerance. This is in principle a positive change, but unfortunately the current code to do so is flawed in certain ways that compromise its usefulness.
Consider this code here, which is the place where many of the test failures actually take place.
Probably the most egregious issue is the code, instead of using the Fail method as we do everywhere else, instead used xunit's Assert. This means that when a difference is detected it fails immediately, and stops the test altogether. This is extremely bad. In a given test many files to be baselined may be produced, and by failing on the first one, we will lack crucial side information that may give us a more complete picture of what is going on.
This would be especially bad in the situation where we expect that the baselines will be changed, and are running the tests just to produce the new files -- instead of having to run the tests once, we must run them many times, just to get the files that are baselined.
The second reason is that the message produced by this check is totally uninformative.
, even if we wanted to use this InRange method (we don't), it's checking against a range centered around 0. So for example, when I was working on MulticlassTreeFeaturizedLRTest, this line was failing with reporting that -6 was out of some small range centered around 0. But I couldn't just re-run the test with a breakpoint or something to figure out what was going wrong, because even if the Mac Visual Studio were detecting our tests (for some reason it doesn't, not sure why), the entire problem was that the test was highly non-deterministic, and it had taken me over 100 tries to even get that one repro, and a subsequent run would almost certainly succeed. But at the same time I have absolutely no idea what files.
So first I dug in and found what files would be produced in a successful run. Then I wrote a new test that would actually just compare these files (crucially, without trying to generate new ones!), while printing out what file failed. And it turned out to be this file here. MulticlassLogisticRegression-TrainTest-iris-tree-featurized-out.txt.
L1 regularization selected 78 of 78 weights.
and it should have had this line
L1 regularization selected 72 of 72 weights.
This whole process took many frustrating minutes.
So now I found the reason for this -6. Because, I guess, 72-78=-6, and obviously letting me know that -6 was not very close to 0 was far, far more important than letting me know what file differed where. 😄
And, to top it all of, because it is an xunit assert, all the other files that would be generated by this (e.g., not just train-test but also the CV comparison, were not generated, because the test was stopped right there before they could be generated. So on the whole a very
So:
This comparison should not use an xunit assert, rather it should use the Fail mechanism, to allow the test to continue running once a difference is detected, since obviously many differences may be observed and in any event we do not want to stop files from being generated. (It can perhaps stop checking that one file, but it shouldn't stop generating files altogether.)
The test message should be enhanced to show additional information, specifically:
Which file had the difference,
Where in the file, e.g., line and offset?
If a numerical comparison, what were the two numbers that were compared? (Not just their difference, the actual numbers)
Possibly even some context of the two files (surrounding lines, characters?) to make things more obvious.
The text was updated successfully, but these errors were encountered:
Companion issue to #1471, specific to making the test failure messages provide more helpful information than they do right now.
Many of our tests are baseline tests -- that is, they produce a file, then we compare the two files. A while ago we changed the baseline test infrastructure a bit, so that rather than insisting on a perfect match, they only had to match numbers within a certain tolerance. This is in principle a positive change, but unfortunately the current code to do so is flawed in certain ways that compromise its usefulness.
Consider this code here, which is the place where many of the test failures actually take place.
machinelearning/test/Microsoft.ML.TestFramework/BaseTestBaseline.cs
Lines 549 to 550 in a26eca7
This code is bad for two reasons...
Probably the most egregious issue is the code, instead of using the
Fail
method as we do everywhere else, instead used xunit'sAssert
. This means that when a difference is detected it fails immediately, and stops the test altogether. This is extremely bad. In a given test many files to be baselined may be produced, and by failing on the first one, we will lack crucial side information that may give us a more complete picture of what is going on.This would be especially bad in the situation where we expect that the baselines will be changed, and are running the tests just to produce the new files -- instead of having to run the tests once, we must run them many times, just to get the files that are baselined.
The second reason is that the message produced by this check is totally uninformative.
, even if we wanted to use this
InRange
method (we don't), it's checking against a range centered around 0. So for example, when I was working onMulticlassTreeFeaturizedLRTest
, this line was failing with reporting that-6
was out of some small range centered around 0. But I couldn't just re-run the test with a breakpoint or something to figure out what was going wrong, because even if the Mac Visual Studio were detecting our tests (for some reason it doesn't, not sure why), the entire problem was that the test was highly non-deterministic, and it had taken me over 100 tries to even get that one repro, and a subsequent run would almost certainly succeed. But at the same time I have absolutely no idea what files.So first I dug in and found what files would be produced in a successful run. Then I wrote a new test that would actually just compare these files (crucially, without trying to generate new ones!), while printing out what file failed. And it turned out to be this file here.
MulticlassLogisticRegression-TrainTest-iris-tree-featurized-out.txt
.and it should have had this line
This whole process took many frustrating minutes.
So now I found the reason for this
-6
. Because, I guess,72-78=-6
, and obviously letting me know that-6
was not very close to 0 was far, far more important than letting me know what file differed where. 😄And, to top it all of, because it is an xunit assert, all the other files that would be generated by this (e.g., not just train-test but also the CV comparison, were not generated, because the test was stopped right there before they could be generated. So on the whole a very
So:
This comparison should not use an xunit assert, rather it should use the
Fail
mechanism, to allow the test to continue running once a difference is detected, since obviously many differences may be observed and in any event we do not want to stop files from being generated. (It can perhaps stop checking that one file, but it shouldn't stop generating files altogether.)The test message should be enhanced to show additional information, specifically:
The text was updated successfully, but these errors were encountered: