Change default # of iterations in Averaged Perceptron to 10 #2305

daholste · 2019-01-29T19:20:26Z

@justinormont figured out that setting default # of iterations to 10 in the Averaged Perceptron learner would lead to better results

From: Justin Ormont
Sent: Monday, April 3, 2017 2:52:13 PM
Subject: Re: Move AveragedPerceptron defaults to iter=10

Greetings folks,

I had a chance to run larger datasets, and I think my conclusion holds.

I did a sweep of the 15GB dataset, and the 2.7TB dataset.

Sweep: 1 to 20 iterations; while it's still running; it's finished most of the experiments and the pattern is pretty clear.

15GB text (note x-axis is number of iterations, not time; y-axis AUC)

Also run (not shown) was FastTreeBinary, its AUC is below this graph at 89.1%, and much, much slower.

2.7TB numeric (note x-axis is number iterations, not time; y-axis AUC)

It doesn't appear that I've hit overfitting thus far in either dataset. AUC continues to increase from a low at iter=1 (far left), to a high on the right (iter=15)

How does AP iterations affect time?

Time was a bit odd (not a smooth graph) but generally increasing as the number of iterations increases.

15GB text (note x-axis is iteration count, y-axis is time)

Time was almost constant with added iterations (noise is due zooming). There's ~5% runtime difference between fastest and slowest on this graph, with 15 iterations being fastest (likely noise).

For 1 iterations: 14,478 (4.0 hours)
For 10 iterations: 14,623 sec (4.1 hours)
That's a very sub-linear 1.01x growth from 1 to 10 iterations

2.7TB numeric (note x-axis is iteration count, y-axis is time)

Sorry, the GUI cuts off the time labels on the left. Time given on next line.
For 1 iteration: 111,367 sec (1.3 days);
For 10 iterations: 317,203 sec (3.7 days).
That's a sub-linear 2.8x growth from 1 to 10 iterations.

I think the 15GB text dataset fitting fully in memory causes it to have a near constant runtime vs iterations and it's dominated by another factor, like Text featurization[wild guess].
The dataset being 2.7TB had to have caching turned off, and each iteration had to fetch the data from CT01; data fetch time may have dominated[wild guess].

Presented is AUC as the datasets are binary. Accuracy graphs look similar though more noisy indicating perhaps we could look at how we're setting the binary threshold.

Memory usage
In both datasets, memory usage appears flat (plus noise) as iteration count increases.

Methodology
Both datasets are binary classification of larger size than previous experiments w/ AveragedPerceptron's iteration count. All experiments were run on HPC with each experiment taking a full node until finished. Data was stored on CT01.

For the 2.7 TB numeric dataset, caching, normalization and shuffling were turned off. Caching was disabled due to size (2.7TB)

Conclusion
For AveragedPerceptron, iterations=10 seems to be an OK default for these two larger datasets; it appears the "best" (in terms of AUC/Acc) hasn't been hit and is above 15 for these.

For 10 iterations, the added duration in the 15GB dataset was negligible and the added runtime for the 2.7TB was an additional 1.8x.

The 2.7TB dataset gains ~0.2% AUC w/ 10 iterations (~7% decrease in relative AUC-loss [aka, 1-AUC]). The 15G dataset gains ~0.4% AUC w/ 10 iterations (~4% decease in relative AUC-loss).

justinormont · 2019-01-29T21:27:17Z

General purpose is to make default runs better for the user.

Our current docs & benchmarks use AveragedPerceptron{iter=10}. This will simplify our user docs, and also simplify code in our AutoML sweepers.

The above main body of this issue discusses impact on larger datasets. The below discusses the impact on various sized text datasets.

We also evaluated across ~28 text datasets of various sizes:

Each line/color represents a certain ngram+chargram length with the pareto frontier highlighted; the connected line varies with a sweep across iter=N. The fastest results are to the right, and the best accuracy is at the top, hence points to the top right are best.

The current default is iter=1, which does very poor in comparison. Iter=10 is a nice bend in the graph in accuracy vs. time.

You'll notice for each featurization technique (each line), iter=10 is in a good place. For the unlabeled points, iter=10 is the 3rd from the right of each line. The only technique which has substantial gains beyond iter=10 is Trigram+Trichar.

wschin · 2019-07-02T17:37:37Z

This could be a breaking change, so I think we should keep the current setting. AutoML will eventually solve this problem for us. Please feel free to reopen it if you have other concerns. Thanks.

justinormont · 2019-07-03T09:21:16Z

@eerhardt, @terrajobst: Would changing a hyperparameter default be considered a breaking change?

Assuming I'm reading the breaking change conversation correctly, I think this issue is specifically being called out as an example of a non-breaking change:

So I'm not sure I'm too concerned about that, if someone references something with default number of iterations 1, and we change that default to 10 per #2305 hypothetically...

I would anticipate we can refine our default hyperparameters to gravitate users more quickly to good models. API signature is not affected and behavior is generally the same though resulting in better metrics.

Some examples of us changing hyperparameters:

LightGBM EvaluateMetricType.Default Change default EvaluationMetric for LightGbm trainers to conform to d… #3859 (comment) & Change default EvaluationMetric for LightGbm trainers to conform to d… #3859
OVA's UseProb - OneVersusAllModelParameters Strongly Typed #3928 (review) (not accepted so far)
LightGBM's PlattCalibrator's Sigma - Bugfix/hardwired sigmoid #3850

re: AutoML -- correct, it solves the problem -if- the user is running AutoML. Good defaults get users on the right footing. We should make their first model great.

terrajobst · 2019-07-03T18:03:47Z

Presumably a hyperparameter is a property value that can be set and if not has a default?

Changing default values can break people, but we generally consider this in the realm of acceptable breaking changes, unless the change specifically makes fewer scenarios work (e.g. if it disallows more inputs).

eerhardt · 2019-07-08T16:08:44Z

I think we've decided that changing default values is an acceptable change in #3689. (Note, on the .NET team, we consider that any change could be a breaking change 😉.)

Note that when you change a parameter's default value, the change doesn't take affect until consuming code is re-compiled. If a user had code that was calling Foo() and in version 1 Foo had a default parameter bool bar = false, and we changed the default of bar from false to true in version 2. If that code was compiled against version 1, and executed against version 2, the value of bar will still be false. This is because in C# default values are compiled into the calling assembly.

najeeb-kazmi · 2020-01-31T00:51:54Z

Tracking in #4749

daholste changed the title ~~Change # of default iterations in Average Perceptron to 10~~ Change # of default iterations in Averaged Perceptron to 10 Jan 29, 2019

daholste changed the title ~~Change # of default iterations in Averaged Perceptron to 10~~ Change default # of iterations in Averaged Perceptron to 10 Jan 29, 2019

justinormont added perf Performance and Benchmarking related usability Smoothing user interaction or experience labels Jan 29, 2019

This was referenced Feb 8, 2019

Modify API for FeaturizeText ? #2460

Closed

Documentation for BinaryClassification.AveragedPerceptron #2483

Closed

justinormont mentioned this issue Mar 25, 2019

Update default n-gram length for Text Transform to match default text recipe #2870

Closed

TomFinley mentioned this issue Apr 30, 2019

Need to add API breaking change definition and enforce it #3602

Closed

wschin closed this as completed Jul 2, 2019

justinormont reopened this Jul 3, 2019

harishsk added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Jan 12, 2020

najeeb-kazmi mentioned this issue Jan 31, 2020

[Meta Issue] Changing defaults #4749

Closed

najeeb-kazmi closed this as completed Jan 31, 2020

justinormont mentioned this issue Jun 18, 2020

Changed default NGrams for FeaturizerText from 1 to 2 #5243

Closed

justinormont mentioned this issue Jul 7, 2020

Updated AveragedPerceptron default iterations from 1 to 10 #5258

Merged

ghost locked as resolved and limited conversation to collaborators Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default # of iterations in Averaged Perceptron to 10 #2305

Change default # of iterations in Averaged Perceptron to 10 #2305

daholste commented Jan 29, 2019 •

edited by justinormont

Loading

justinormont commented Jan 29, 2019 •

edited

Loading

wschin commented Jul 2, 2019

justinormont commented Jul 3, 2019

terrajobst commented Jul 3, 2019 •

edited

Loading

eerhardt commented Jul 8, 2019

najeeb-kazmi commented Jan 31, 2020

Change default # of iterations in Averaged Perceptron to 10 #2305

Change default # of iterations in Averaged Perceptron to 10 #2305

Comments

daholste commented Jan 29, 2019 • edited by justinormont Loading

How does AP iterations affect time?

justinormont commented Jan 29, 2019 • edited Loading

wschin commented Jul 2, 2019

justinormont commented Jul 3, 2019

terrajobst commented Jul 3, 2019 • edited Loading

eerhardt commented Jul 8, 2019

najeeb-kazmi commented Jan 31, 2020

daholste commented Jan 29, 2019 •

edited by justinormont

Loading

justinormont commented Jan 29, 2019 •

edited

Loading

terrajobst commented Jul 3, 2019 •

edited

Loading