Updates ml.net reference of LightGBM to version 2.2 #2448

singlis · 2019-02-06T23:58:50Z

Updates ml.net reference of LightGBM to version 2.2.3 (Fixes Updating LightGBM to version 2.2.3 #2446)
Updated the lightgbm parsing code to handle inf, -inf (now checks for
contains rather than equals).
Additional updates for handling NaN
Moved all LightGBM baseline tests from SingleDebug/SingleRelease to
Common.
Added Seed parameter to LightGBM arguments to support setting
LightGBM's random seed.

…#2446) - Updated the lightgbm parsing code to handle inf, -inf (now checks for contains rather than equals). - Additional updates for handling NaN - Moved all LightGBM baseline tests from SingleDebug/SingleRelease to Common. - Added Seed parameter to LightGBM arguments to support setting LightGBM's random seed.

Ivanidzo4ka · 2019-02-07T00:15:23Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+                        FeatureColumn="Features", 
+                        Seed=1, 
+                        NThread=1, 
+                        NumBoostRound=1}));


ctrl + K +D aka formatting #Resolved

Ivanidzo4ka

codecov · 2019-02-07T00:35:20Z

Codecov Report

Merging #2448 into master will increase coverage by <.01%.
The diff coverage is 82.75%.

@@            Coverage Diff             @@
##           master    #2448      +/-   ##
==========================================
+ Coverage   71.25%   71.26%   +<.01%     
==========================================
  Files         797      797              
  Lines      141280   141293      +13     
  Branches    16115    16118       +3     
==========================================
+ Hits       100675   100688      +13     
+ Misses      36147    36144       -3     
- Partials     4458     4461       +3

Flag	Coverage Δ
#Debug	`71.26% <82.75%> (ø)`	⬆️
#production	`67.58% <50%> (ø)`	⬆️
#test	`85.36% <95.23%> (ø)`	⬆️

wschin · 2019-02-07T00:44:30Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs


                var trainedModel = pipe.Fit(preprocessedTrainData);
                var predicted = trainedModel.Transform(preprocessedTestData);
                var metrics = mlContext.MulticlassClassification.Evaluate(predicted);

                // First group of checks. They check if the overall prediction quality is ok using a test set.
-                Assert.InRange(metrics.AccuracyMicro, expectedMicroAccuracy - .01, expectedMicroAccuracy + .01);
-                Assert.InRange(metrics.AccuracyMacro, expectedMacroAccruacy - .01, expectedMicroAccuracy + .01);
+                bool inRange = expectedValues.Any(exp =>


Why not use Assert.Equal with a proper tolerance (e.g., 2) or Assert.InRange? They automatically print out error information. #Pending

I wanted to use Assert.InRange, however this is given a list of expected ranges that should match (i.e. micro and macro should match one of those ranges). Because of this, Assert.InRange will not work because it will fail on the first range that doesnt match.

In reply to: 254510687 [](ancestors = 254510687)

zeahmed · 2019-02-07T05:17:48Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+            var expectedValues = new List<(double micro, double macro)>()
+            {
+                (0.71304347826086956, 0.53197278911564627),
+                (0.73304347826086956, 0.677551020408163)


so debug and release versions still differ?

zeahmed

singlis · 2019-02-07T22:35:48Z

Here are the benchmark differences that I am seeing between master and this change:
Master

                                                       Method |    Mean |   Error |  StdDev | Extra Metric |
------------------------------------------------------------- |--------:|--------:|--------:|-------------:|
 CV_Multiclass_WikiDetox_BigramsAndTrichar_LightGBMMulticlass | 327.9 s | 60.83 s | 3.334 s |            - |

// * Legends *
  Mean         : Arithmetic mean of all measurements
  Error        : Half of 99.9% confidence interval
  StdDev       : Standard deviation of all measurements
  Extra Metric : Value of the provided extra metric
  1 s          : 1 Second (1 sec)

// ***** BenchmarkRunner: End *****
Run time: 00:16:27 (987.07 sec), executed benchmarks: 1


                                                          Method |    Mean |   Error |  StdDev | Extra Metric |
---------------------------------------------------------------- |--------:|--------:|--------:|-------------:|
 TrainTest_Ranking_MSLRWeb10K_RawNumericFeatures_LightGBMRanking | 50.51 s | 76.32 s | 4.183 s |            - |

// * Legends *
  Mean         : Arithmetic mean of all measurements
  Error        : Half of 99.9% confidence interval
  StdDev       : Standard deviation of all measurements
  Extra Metric : Value of the provided extra metric
  1 s          : 1 Second (1 sec)

// ***** BenchmarkRunner: End *****
Run time: 00:02:33 (153.47 sec), executed benchmarks: 1

LightGBM Update

                                                       Method |    Mean |   Error |   StdDev | Extra Metric |
------------------------------------------------------------- |--------:|--------:|---------:|-------------:|
 CV_Multiclass_WikiDetox_BigramsAndTrichar_LightGBMMulticlass | 320.9 s | 8.921 s | 0.4890 s |            - |

// * Legends *
  Mean         : Arithmetic mean of all measurements
  Error        : Half of 99.9% confidence interval
  StdDev       : Standard deviation of all measurements
  Extra Metric : Value of the provided extra metric
  1 s          : 1 Second (1 sec)

// ***** BenchmarkRunner: End *****
Run time: 00:16:05 (965.68 sec), executed benchmarks: 1

                                                          Method |    Mean |   Error |  StdDev | Extra Metric |
---------------------------------------------------------------- |--------:|--------:|--------:|-------------:|
 TrainTest_Ranking_MSLRWeb10K_RawNumericFeatures_LightGBMRanking | 44.73 s | 41.35 s | 2.266 s |            - |

// * Legends *
  Mean         : Arithmetic mean of all measurements
  Error        : Half of 99.9% confidence interval
  StdDev       : Standard deviation of all measurements
  Extra Metric : Value of the provided extra metric
  1 s          : 1 Second (1 sec)

// ***** BenchmarkRunner: End *****
Run time: 00:02:16 (136.03 sec), executed benchmarks: 1

// * Artifacts cleanup *
Global total time: 00:02:24 (144.49 sec), executed benchmarks: 1

…date

…machinelearning into singlis/lightgbm-update

singlis · 2019-02-13T17:20:37Z

Here are the differences between master and this PR:

Flight Data
LightGBM update
OVERALL RESULTS

L1(avg): 0.299062 (0.0000)
L2(avg): 0.144591 (0.0000)
RMS(avg): 0.380251 (0.0000)
Loss-fn(avg): 0.144591 (0.0000)
R Squared: 0.069896 (0.0000)

Physical memory usage(MB): 113
Virtual memory usage(MB): 2101875
2/12/2019 1:11:53 AM Time elapsed(s): 7.502

Master
OVERALL RESULTS

L1(avg): 0.299102 (0.0000)
L2(avg): 0.144667 (0.0000)
RMS(avg): 0.380351 (0.0000)
Loss-fn(avg): 0.144667 (0.0000)
R Squared: 0.069406 (0.0000)

Physical memory usage(MB): 112
Virtual memory usage(MB): 2101875
2/12/2019 12:29:38 AM Time elapsed(s): 7.429

MNIST
LigthGBM Update
OVERALL RESULTS

Accuracy(micro-avg): 0.976700 (0.0000)
Accuracy(macro-avg): 0.976538 (0.0000)
Log-loss: 0.078717 (0.0000)
Log-loss reduction: 96.578795 (0.0000)

Physical memory usage(MB): 207
Virtual memory usage(MB): 2102375
2/12/2019 1:35:23 AM Time elapsed(s): 63.784

Master
OVERALL RESULTS

Accuracy(micro-avg): 0.975400 (0.0000)
Accuracy(macro-avg): 0.975237 (0.0000)
Log-loss: 0.078511 (0.0000)
Log-loss reduction: 96.587753 (0.0000)

Physical memory usage(MB): 199
Virtual memory usage(MB): 2102378
2/12/2019 1:09:02 AM Time elapsed(s): 63.62

singlis · 2019-02-13T17:35:31Z

Further benchmarks:
[Internal ranking dataset with 500k rows]
LightGBM Update
OVERALL RESULTS

AUC: 0.793242 (0.0000)
Accuracy: 0.881250 (0.0000)
Positive precision: 0.748541 (0.0000)
Positive recall: 0.273974 (0.0000)
Negative precision: 0.888696 (0.0000)
Negative recall: 0.984371 (0.0000)
Log-loss: 0.467847 (0.0000)
Log-loss reduction: 21.711629 (0.0000)
F1 Score: 0.401130 (0.0000)
AUPRC: 0.515647 (0.0000)

Physical memory usage(MB): 2472
Virtual memory usage(MB): 2104804
2/12/2019 1:32:10 AM Time elapsed(s): 252

Master
OVERALL RESULTS

AUC: 0.793242 (0.0000)
Accuracy: 0.881250 (0.0000)
Positive precision: 0.748541 (0.0000)
Positive recall: 0.273974 (0.0000)
Negative precision: 0.888696 (0.0000)
Negative recall: 0.984371 (0.0000)
Log-loss: 0.467847 (0.0000)
Log-loss reduction: 21.711629 (0.0000)
F1 Score: 0.401130 (0.0000)
AUPRC: 0.515647 (0.0000)

Physical memory usage(MB): 2471
Virtual memory usage(MB): 2104846
2/12/2019 1:03:58 AM Time elapsed(s): 224

Wiki Detox
LightGBM Update
iter = 100
OVERALL RESULTS

Accuracy(micro-avg): 0.958941 (0.0011)
Accuracy(macro-avg): 0.823435 (0.0049)
Log-loss: 0.114225 (0.0028)
Log-loss reduction: 63.930055 (0.5704)

Physical memory usage(MB): 8486
Virtual memory usage(MB): 2111485
2/11/2019 6:43:24 PM Time elapsed(s): 956

Master
OVERALL RESULTS

Accuracy(micro-avg): 0.958194 (0.0009)
Accuracy(macro-avg): 0.817749 (0.0041)
Log-loss: 0.116529 (0.0028)
Log-loss reduction: 63.202381 (0.5569)

Physical memory usage(MB): 8561
Virtual memory usage(MB): 2111419
2/11/2019 11:50:51 PM Time elapsed(s): 849

NOTE: These are informal benchmarks to spotcheck if there are any regressions with the LightGBM Update. There are time differences on WIkiDetox and the internal ranking dataset as the LightGBM update shows a slightly longer time. This is likely due to running a single test on my work machine, but not concerning as the other benchmarks do fall more inline.

These tests were run on Windows 10.

…date

…machinelearning into singlis/lightgbm-update

singlis self-assigned this Feb 6, 2019

singlis requested review from Ivanidzo4ka, wschin and zeahmed February 6, 2019 23:58

Ivanidzo4ka reviewed Feb 7, 2019

View reviewed changes

Ivanidzo4ka approved these changes Feb 7, 2019

View reviewed changes

wschin reviewed Feb 7, 2019

View reviewed changes

wschin approved these changes Feb 7, 2019

View reviewed changes

- Formatting based upon feedback

06a5000

zeahmed reviewed Feb 7, 2019

View reviewed changes

zeahmed approved these changes Feb 7, 2019

View reviewed changes

- Removed the ranges for test checks

b361353

singlis added 2 commits February 7, 2019 14:41

Merge remote-tracking branch 'origin/master' into singlis/lightgbm-up…

d9fe11a

…date

Merge branch 'singlis/lightgbm-update' of https://github.com/singlis/…

107ede3

…machinelearning into singlis/lightgbm-update

singlis added 2 commits February 13, 2019 09:39

Merge remote-tracking branch 'origin/master' into singlis/lightgbm-up…

9d778c9

…date

Merge branch 'singlis/lightgbm-update' of https://github.com/singlis/…

994a5a5

…machinelearning into singlis/lightgbm-update

singlis merged commit 656a42f into dotnet:master Feb 13, 2019

Ivanidzo4ka mentioned this pull request Feb 14, 2019

LightGbmMulti System.ArgumentOutOfRangeException: Schema mismatch for label column '': expected Bool, got Key<U4> #2534

Closed

singlis deleted the singlis/lightgbm-update branch March 7, 2019 00:55

ghost locked as resolved and limited conversation to collaborators Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates ml.net reference of LightGBM to version 2.2 #2448

Updates ml.net reference of LightGBM to version 2.2 #2448

singlis commented Feb 6, 2019 •

edited

Loading

Ivanidzo4ka Feb 7, 2019 •

edited by singlis

Loading

Ivanidzo4ka left a comment

codecov bot commented Feb 7, 2019 •

edited

Loading

wschin Feb 7, 2019 •

edited by singlis

Loading

singlis Feb 7, 2019

zeahmed Feb 7, 2019

zeahmed left a comment

singlis commented Feb 7, 2019

singlis commented Feb 13, 2019 •

edited

Loading

singlis commented Feb 13, 2019 •

edited

Loading

Updates ml.net reference of LightGBM to version 2.2 #2448

Updates ml.net reference of LightGBM to version 2.2 #2448

Conversation

singlis commented Feb 6, 2019 • edited Loading

Ivanidzo4ka Feb 7, 2019 • edited by singlis Loading

Choose a reason for hiding this comment

Ivanidzo4ka left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 7, 2019 • edited Loading

Codecov Report

wschin Feb 7, 2019 • edited by singlis Loading

Choose a reason for hiding this comment

singlis Feb 7, 2019

Choose a reason for hiding this comment

zeahmed Feb 7, 2019

Choose a reason for hiding this comment

zeahmed left a comment

Choose a reason for hiding this comment

singlis commented Feb 7, 2019

singlis commented Feb 13, 2019 • edited Loading

Flight Data LightGBM update OVERALL RESULTS

Master OVERALL RESULTS

MNIST LigthGBM Update OVERALL RESULTS

Master OVERALL RESULTS

singlis commented Feb 13, 2019 • edited Loading

Further benchmarks: [Internal ranking dataset with 500k rows] LightGBM Update OVERALL RESULTS

Master OVERALL RESULTS

Wiki Detox LightGBM Update iter = 100 OVERALL RESULTS

Master OVERALL RESULTS

singlis commented Feb 6, 2019 •

edited

Loading

Ivanidzo4ka Feb 7, 2019 •

edited by singlis

Loading

codecov bot commented Feb 7, 2019 •

edited

Loading

wschin Feb 7, 2019 •

edited by singlis

Loading

singlis commented Feb 13, 2019 •

edited

Loading

Flight Data
LightGBM update
OVERALL RESULTS

Master
OVERALL RESULTS

MNIST
LigthGBM Update
OVERALL RESULTS

Master
OVERALL RESULTS

singlis commented Feb 13, 2019 •

edited

Loading

Further benchmarks:
[Internal ranking dataset with 500k rows]
LightGBM Update
OVERALL RESULTS

Master
OVERALL RESULTS

Wiki Detox
LightGBM Update
iter = 100
OVERALL RESULTS

Master
OVERALL RESULTS