Skip to content

Commit 71d58fa

Browse files
authored
Remove adult.train and adult.test and modify all unit tests (#1687)
* Remove adult.train and adult.test and modify all unit tests * Remove adult.train from Microsoft.ML.Benchmarks * Move baselines to Common folder
1 parent 78cad14 commit 71d58fa

File tree

175 files changed

+11830
-408822
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

175 files changed

+11830
-408822
lines changed

test/BaselineOutput/Common/Command/CommandTrainScoreEvaluateQuantileRegression-out.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Making per-feature arrays
44
Changing data from row-wise to column-wise on disk
55
Processed 506 instances
66
Binning and forming Feature objects
7-
Reserved memory for tree learner: 290472 bytes
7+
Reserved memory for tree learner: %Number% bytes
88
Starting to train ...
99
Not training a calibrator because it is not needed.
1010
Physical memory usage(MB): %Number%

test/BaselineOutput/Common/Command/CommandTrainingLrWithStats-out.txt

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
maml.exe Train feat=Num lab=Lab tr=lr{t=- stat=+} loader=text{header+ sep=comma col=Lab:14 col=Num:0,2,4,10-12} data=%Data% out=%Output%
1+
maml.exe Train feat=Num lab=Lab tr=lr{t=- stat=+} loader=text{header+ col=Lab:0 col=Num:9-14} data=%Data% out=%Output%
22
Automatically adding a MinMax normalization transform, use 'norm=Warn' or 'norm=No' to turn this behavior off.
33
Beginning optimization
44
num vars: 7
55
improvement criterion: Mean Improvement
66
L1 regularization selected 7 of 7 weights.
7-
Model trained with 32561 training examples.
8-
Residual Deviance: 26705.74 (on 32554 degrees of freedom)
9-
Null Deviance: 35948.08 (on 32560 degrees of freedom)
10-
AIC: 26719.74
7+
Model trained with 500 training examples.
8+
Residual Deviance: 458.9709 (on 493 degrees of freedom)
9+
Null Deviance: 539.2764 (on 499 degrees of freedom)
10+
AIC: 472.9709
1111
Not training a calibrator because it is not needed.
1212
Physical memory usage(MB): %Number%
1313
Virtual memory usage(MB): %Number%
Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,27 @@
11
Linear Binary Classification Predictor non-zero weights
22

3-
(Bias) -8.228298
4-
capital-gain 18.58347
5-
education-num 5.066041
6-
hours-per-week 3.946534
7-
age 3.86064
8-
capital-loss 2.81616
9-
fnlwgt 0.7489593
3+
(Bias) -4.426744
4+
education-num 2.102877
5+
age 1.920366
6+
hours-per-week 1.882183
7+
capital-gain 1.671043
8+
capital-loss 0.9767318
9+
fnlwgt 0.3191842
1010

1111
*** MODEL STATISTICS SUMMARY ***
12-
Count of training examples: 32561
13-
Residual Deviance: 26705.74
14-
Null Deviance: 35948.08
15-
AIC: 26719.74
12+
Count of training examples: 500
13+
Residual Deviance: 458.9709
14+
Null Deviance: 539.2764
15+
AIC: 472.9709
1616

1717
Coefficients statistics:
1818
Coefficient Estimate Std. Error z value Pr(>|z|)
19-
(Bias) -8.228298 0.1161297 -70.85435 0 ***
20-
education-num 5.066041 0.1048074 48.33666 0 ***
21-
capital-gain 18.58347 0.4694776 39.5833 0 ***
22-
age 3.86064 0.1061118 36.38277 0 ***
23-
hours-per-week 3.946534 0.1258723 31.35349 0 ***
24-
capital-loss 2.81616 0.13793 20.41732 0 ***
25-
fnlwgt 0.7489593 0.2048056 3.656927 0.0002553463 ***
19+
(Bias) -4.426744 0.5968595 -7.416728 0 ***
20+
education-num 2.102877 0.4148865 5.068559 4.172325E-07 ***
21+
age 1.920366 0.4261497 4.506317 6.616116E-06 ***
22+
hours-per-week 1.882183 0.4595276 4.095909 4.208088E-05 ***
23+
capital-gain 1.671043 0.4953039 3.373774 0.0007415414 ***
24+
capital-loss 0.9767318 0.4553182 2.145163 0.03193969 *
25+
fnlwgt 0.3191842 0.4430353 0.7204487 0.4712486
2626
---
2727
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,40 @@
1-
maml.exe TrainTest test=%Data% tr=FastTreeBinaryClassification{nl=5 mil=5 lr=0.25 iter=20 mb=255} dout=%Output% loader=Text{sep=, header+ col=Label:14 col=Cat:TX:1,3,5-9,13} data=%Data% out=%Output% seed=1 xf=Cat{col=Cat} xf=Concat{col=Features:Cat}
1+
maml.exe TrainTest test=%Data% tr=FastTreeBinaryClassification{nl=5 mil=5 lr=0.25 iter=20 mb=255} dout=%Output% loader=Text{header+ col=Label:0 col=Cat:TX:1-8} data=%Data% out=%Output% seed=1 xf=Cat{col=Cat} xf=Concat{col=Features:Cat}
22
Not adding a normalizer.
33
Making per-feature arrays
44
Changing data from row-wise to column-wise
5-
Processed 32561 instances
5+
Processed 500 instances
66
Binning and forming Feature objects
77
Changing data from row-wise to column-wise
88
Reserved memory for tree learner: %Number% bytes
99
Starting to train ...
1010
Not training a calibrator because it is not needed.
11-
TEST POSITIVE RATIO: 0.2362 (3846.0/(3846.0+12435.0))
11+
TEST POSITIVE RATIO: 0.2300 (115.0/(115.0+385.0))
1212
Confusion table
1313
||======================
1414
PREDICTED || positive | negative | Recall
1515
TRUTH ||======================
16-
positive || 1,982 | 1,864 | 0.5153
17-
negative || 895 | 11,540 | 0.9280
16+
positive || 55 | 60 | 0.4783
17+
negative || 17 | 368 | 0.9558
1818
||======================
19-
Precision || 0.6889 | 0.8609 |
20-
OVERALL 0/1 ACCURACY: 0.830539
21-
LOG LOSS/instance: 0.537244
22-
Test-set entropy (prior Log-Loss/instance): 0.788708
23-
LOG-LOSS REDUCTION (RIG): 31.883066
24-
AUC: 0.871960
19+
Precision || 0.7639 | 0.8598 |
20+
OVERALL 0/1 ACCURACY: 0.846000
21+
LOG LOSS/instance: 0.481805
22+
Test-set entropy (prior Log-Loss/instance): 0.778011
23+
LOG-LOSS REDUCTION (RIG): 38.072215
24+
AUC: 0.893281
2525

2626
OVERALL RESULTS
2727
---------------------------------------
28-
AUC: 0.871960 (0.0000)
29-
Accuracy: 0.830539 (0.0000)
30-
Positive precision: 0.688912 (0.0000)
31-
Positive recall: 0.515341 (0.0000)
32-
Negative precision: 0.860937 (0.0000)
33-
Negative recall: 0.928026 (0.0000)
34-
Log-loss: 0.537244 (0.0000)
35-
Log-loss reduction: 31.883066 (0.0000)
36-
F1 Score: 0.589618 (0.0000)
37-
AUPRC: 0.670582 (0.0000)
28+
AUC: 0.893281 (0.0000)
29+
Accuracy: 0.846000 (0.0000)
30+
Positive precision: 0.763889 (0.0000)
31+
Positive recall: 0.478261 (0.0000)
32+
Negative precision: 0.859813 (0.0000)
33+
Negative recall: 0.955844 (0.0000)
34+
Log-loss: 0.481805 (0.0000)
35+
Log-loss reduction: 38.072215 (0.0000)
36+
F1 Score: 0.588235 (0.0000)
37+
AUPRC: 0.738040 (0.0000)
3838

3939
---------------------------------------
4040
Physical memory usage(MB): %Number%
@@ -43,7 +43,7 @@ Virtual memory usage(MB): %Number%
4343

4444
--- Progress log ---
4545
[1] 'Building term dictionary' started.
46-
[1] (%Time%) 32561 examples Total Terms: 100
46+
[1] (%Time%) 500 examples Total Terms: 76
4747
[1] 'Building term dictionary' finished in %Time%.
4848
[2] 'FastTree data preparation' started.
4949
[2] 'FastTree data preparation' finished in %Time%.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
FastTreeBinaryClassification
2+
AUC Accuracy Positive precision Positive recall Negative precision Negative recall Log-loss Log-loss reduction F1 Score AUPRC /lr /nl /mil /iter Learner Name Train Dataset Test Dataset Results File Run Time Physical Memory Virtual Memory Command Line Settings
3+
0.893281 0.846 0.763889 0.478261 0.859813 0.955844 0.481805 38.07222 0.588235 0.73804 0.25 5 5 20 FastTreeBinaryClassification %Data% %Data% %Output% 99 0 0 maml.exe TrainTest test=%Data% tr=FastTreeBinaryClassification{nl=5 mil=5 lr=0.25 iter=20 mb=255} dout=%Output% loader=Text{header+ col=Label:0 col=Cat:TX:1-8} data=%Data% out=%Output% seed=1 xf=Cat{col=Cat} xf=Concat{col=Features:Cat} /lr:0.25;/nl:5;/mil:5;/iter:20
4+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
2+
Per-feature gain summary for the boosted tree ensemble:
3+
marital-status.Married-civ-spouse 1
4+
occupation.Prof-specialty 0.499978104303624
5+
occupation.Exec-managerial 0.434935442345068
6+
marital-status.Never-married 0.267286281351726
7+
education.Doctorate 0.255205743594169
8+
Workclass.Self-emp-inc 0.218913886097949
9+
ethnicity.Asian-Pac-Islander 0.198007032992433
10+
relationship.Husband 0.187623229061434
11+
Workclass.Local-gov 0.186017717229329
12+
native-country-region.Mexico 0.177614308223737
13+
education.Bachelors 0.170936054067049
14+
education.Masters 0.169804646788063
15+
occupation.Farming-fishing 0.145811380173079
16+
ethnicity.Black 0.137140760481446
17+
occupation.Transport-moving 0.13022579853358
18+
occupation.Other-service 0.12589916923037
19+
education.Some-college 0.116092176092999
20+
Workclass.Private 0.109401738173027
21+
Workclass.? 0.109103172155946
22+
education.Assoc-acdm 0.100323599151049
23+
Workclass.Self-emp-not-inc 0.0973694292542764
24+
relationship.Own-child 0.0957862810994456
25+
education.11th 0.0905382862951382
26+
occupation.Tech-support 0.0778673683865196
27+
education.HS-grad 0.0727543074440302
28+
education.7th-8th 0.0727320100839907
29+
marital-status.Widowed 0.0634429881388435
30+
Workclass.State-gov 0.0535351814497221

0 commit comments

Comments
 (0)