Skip to content

Suggestion for test framework improvements. #410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Ivanidzo4ka opened this issue Jun 26, 2018 · 4 comments
Closed

Suggestion for test framework improvements. #410

Ivanidzo4ka opened this issue Jun 26, 2018 · 4 comments
Labels
test related to tests
Milestone

Comments

@Ivanidzo4ka
Copy link
Contributor

No reason to have same baselines for Debug and Release if they are same.

Currently we have /test/BaselineOutput folder which contains folders Common, SingleDebug and SingleRelease and they filled with pretty much same files. We need to modify our test framework so it will look first in Common folder and if file is not found then it should look in respectful folders.
Also word Single is legacy artifact and can be omitted.

This would reduce size of repo, and simplify peoples life. Also it give more room to have platform specific tests since many of our tests disabled because we verify everything with windows output, which sometimes differ because of native code implementation.

@Ivanidzo4ka Ivanidzo4ka mentioned this issue Jun 26, 2018
@codemzs
Copy link
Member

codemzs commented Jun 26, 2018

CC: @TomFinley

@TomFinley
Copy link
Contributor

TomFinley commented Jun 26, 2018

Hi @Ivanidzo4ka , thanks for the proposal.

One thing about the existing system that is quite convenient, that I do not see how to reproduce in your scheme very well, is that if the baseline files could exist in common, or in some target specific directory, then I do not see how one could use a tool like Beyond Compare to actually do a directory diff, and detect the differences in baselines. This is quite useful -- I have and I know you have had situations where due to a seemingly innocuous change, baselines have changed significantly. (E.g., slightly different wording in a log message leads to hundreds of files changing.) That work is simplified enormously if the test output folder mirrors the baseline folder, which as far as I can tell they would not in your proposal.

If you modified your proposal so that tests had to explicitly identify themselves as being either "common" or "platform specific" baselines (probably the latter), then this would retain the "diffability" of the TestOutput and BaselineOutput directories.

@Ivanidzo4ka
Copy link
Contributor Author

Ivanidzo4ka commented Jun 27, 2018

So currently we have this structure of test\BaselineOutput and and bin{configuration}{project}{framework}\TestOutput folders. We pull files from same respective locations in both folders and compare them.

My proposal is to touch only test\BaselineOutput folder and in framework if we looking for file, first search in Common folder, then in Configuration specific folder, then in OS specific and then in combination of Configuration and OS (we can get rid of OS specific for now). Everything we do with test output will continue to go to TestOutput folder, and you can run it for different Configurations and they would be inside respective Debug/Release folder. So you can compare them. It just during PR creation owner of PR can decide where he want's to put this files, if baselines are similar, he can put them into Common, if they different he can put them in respective folders.

We also can have test similar AAACompareBaselines which would notify person what files are same, and should be moved to Common folder.

Reason why I propose this solution is fact what currently and most of the time we have similar outputs (probably around 90% in our main solution and 100% in this repository so far) and in most cases all user need to do is to copy files into common folder.

@sfilipi
Copy link
Member

sfilipi commented Jul 31, 2018

Commenting here, because I run in such a case again, trying to enable OLS.
Only one prediction different, out of 4896 generated differing at the 5th decimal point just on osx debug build.

In additional to the per OS baselines, i don;t think it makese sense to compare numbers on the 10th decimal point, which we are doing sometimes.

we could:
1- Have a minimum threshold: don't compare beyond the 6th decimal (is 6th good enough?)
2- Maybe count the failures, and ignore, let's say 1% or 0.5% of the failures per test (what is a good %? @justinormont?)

In addition, if the above cases are not enough, we can put in place OS specific baseline.

Have the current /Baseline folder follow the following structure:

I think a folder structure like such:
/BaselineOutput

/Release
    /Common
           /FastTree
                 /cv_wine.out
                 /traintest_wine.out
    .........
    /OXS
           /FastTree
                 /cv_adult.out
    /Linux
    /Win
/Debug
    /Common
    /OXS
    /Linux
    /Win

@sfilipi sfilipi added the test related to tests label Jul 31, 2018
eerhardt added a commit to eerhardt/machinelearning that referenced this issue Oct 9, 2018
A lot of baseline tests have duplicated baselines between debug and release. Allow BaseTestBaseline to check the Common baseline directory for a baseline file first.

Fix dotnet#410
@shauheen shauheen added this to the 1018 milestone Oct 9, 2018
eerhardt added a commit that referenced this issue Oct 10, 2018
)

* Enable a QuantileRegression test

Enable CommandTrainScoreEvaluateQuantileRegression, add the dataset and the necessary baselines.

The only baseline changes were of the form that were caused by https://github.com/dotnet/corefx/issues/31847.

* Allow BaseTestBaseline to check Common first.

A lot of baseline tests have duplicated baselines between debug and release. Allow BaseTestBaseline to check the Common baseline directory for a baseline file first.

Fix #410

* PR feedback

Add more info to the README.
Add column headers to the housing.txt data file.
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
test related to tests
Projects
None yet
5 participants