-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Change default # of iterations in Averaged Perceptron to 10 #2305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This could be a breaking change, so I think we should keep the current setting. AutoML will eventually solve this problem for us. Please feel free to reopen it if you have other concerns. Thanks. |
@eerhardt, @terrajobst: Would changing a hyperparameter default be considered a breaking change? Assuming I'm reading the breaking change conversation correctly, I think this issue is specifically being called out as an example of a non-breaking change:
I would anticipate we can refine our default hyperparameters to gravitate users more quickly to good models. API signature is not affected and behavior is generally the same though resulting in better metrics. Some examples of us changing hyperparameters:
re: AutoML -- correct, it solves the problem -if- the user is running AutoML. Good defaults get users on the right footing. We should make their first model great. |
Presumably a hyperparameter is a property value that can be set and if not has a default? Changing default values can break people, but we generally consider this in the realm of acceptable breaking changes, unless the change specifically makes fewer scenarios work (e.g. if it disallows more inputs). |
I think we've decided that changing default values is an acceptable change in #3689. (Note, on the .NET team, we consider that any change could be a breaking change 😉.) Note that when you change a parameter's default value, the change doesn't take affect until consuming code is re-compiled. If a user had code that was calling |
Tracking in #4749 |
@justinormont figured out that setting default # of iterations to 10 in the Averaged Perceptron learner would lead to better results
From: Justin Ormont
Sent: Monday, April 3, 2017 2:52:13 PM
Subject: Re: Move AveragedPerceptron defaults to iter=10
Greetings folks,
I had a chance to run larger datasets, and I think my conclusion holds.
I did a sweep of the 15GB dataset, and the 2.7TB dataset.
Sweep: 1 to 20 iterations; while it's still running; it's finished most of the experiments and the pattern is pretty clear.
15GB text (note x-axis is number of iterations, not time; y-axis AUC)

Also run (not shown) was FastTreeBinary, its AUC is below this graph at 89.1%, and much, much slower.
2.7TB numeric (note x-axis is number iterations, not time; y-axis AUC)

It doesn't appear that I've hit overfitting thus far in either dataset. AUC continues to increase from a low at iter=1 (far left), to a high on the right (iter=15)
How does AP iterations affect time?
Time was a bit odd (not a smooth graph) but generally increasing as the number of iterations increases.
15GB text (note x-axis is iteration count, y-axis is time)
Time was almost constant with added iterations (noise is due zooming). There's ~5% runtime difference between fastest and slowest on this graph, with 15 iterations being fastest (likely noise).
For 1 iterations: 14,478 (4.0 hours)
For 10 iterations: 14,623 sec (4.1 hours)
That's a very sub-linear 1.01x growth from 1 to 10 iterations
2.7TB numeric (note x-axis is iteration count, y-axis is time)
Sorry, the GUI cuts off the time labels on the left. Time given on next line.
For 1 iteration: 111,367 sec (1.3 days);
For 10 iterations: 317,203 sec (3.7 days).
That's a sub-linear 2.8x growth from 1 to 10 iterations.
I think the 15GB text dataset fitting fully in memory causes it to have a near constant runtime vs iterations and it's dominated by another factor, like Text featurization[wild guess].
The dataset being 2.7TB had to have caching turned off, and each iteration had to fetch the data from CT01; data fetch time may have dominated[wild guess].
Presented is AUC as the datasets are binary. Accuracy graphs look similar though more noisy indicating perhaps we could look at how we're setting the binary threshold.
Memory usage
In both datasets, memory usage appears flat (plus noise) as iteration count increases.
Methodology
Both datasets are binary classification of larger size than previous experiments w/ AveragedPerceptron's iteration count. All experiments were run on HPC with each experiment taking a full node until finished. Data was stored on CT01.
For the 2.7 TB numeric dataset, caching, normalization and shuffling were turned off. Caching was disabled due to size (2.7TB)
Conclusion
For AveragedPerceptron, iterations=10 seems to be an OK default for these two larger datasets; it appears the "best" (in terms of AUC/Acc) hasn't been hit and is above 15 for these.
For 10 iterations, the added duration in the 15GB dataset was negligible and the added runtime for the 2.7TB was an additional 1.8x.
The 2.7TB dataset gains ~0.2% AUC w/ 10 iterations (~7% decrease in relative AUC-loss [aka, 1-AUC]). The 15G dataset gains ~0.4% AUC w/ 10 iterations (~4% decease in relative AUC-loss).
The text was updated successfully, but these errors were encountered: