-
Notifications
You must be signed in to change notification settings - Fork 1.9k
AutoML 2 is way worse than 1.7.1 (for me) #6552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks like you're using GridSearch for HPO optimization and you disable LightGbm as well? Can you try using default tuner (by removing SetGridSearchTuner) instead? In the meantime, you can still use AutoML v1.0 API in AutoML v2.0, which basically inherit the configuration of AutoML 1.7.1 in featurizer and trainers. Can you also give it a try and see if performance improves?
|
Hello Jake and thank you for your answer.
GOOD EYE! Yes, that additional call does break everything and I was too quick in pasting the sample code. I had already removed the call to SetGridSearchTuner() because with it then nothing works. So I'm still without an answer. NOTE: leaving in lightgbm causes lots of errors in my log... "failed with exception Unable to load DLL 'lib_lightgbm': The specified module could not be found. (Exception from HRESULT: 0x8007007E)" I read this old post "Null reference exception when training #6470 " about the dll but was not able to resolve that. Maybe it's a sign that something isn't set up correctly? However I still get the group not as tight and something that binds / limits the predicted range. |
@TT-Dev1 Thanks for the reply, and I definitly willing to help you figure out what's not working here. Especially on figuring out why it's not better than old AutoML
Looks like a dup of ##6446. This issue has been fixed but haven't released to nuget yet. You can try nightly build though.
Are you running on a linux/osx arm64 device? If so LightGbm won't be available on those platforms. A few more questions:
|
Thanks VERY MUCH!!!! I will try and report back. I can verify (like you said) that this bug has not been fixed in the Dec. 22, 2022 release.
No, Win10, Intel x64. EDIT: is there a way to force the install or is there a place that I can look to find the .dll?
Yes, all on the same box. |
OK -- seems like I'm getting somewhere now. THANK YOU. Trying AutoML v1.0 API in AutoML v2.0 causes a new (or more specific error) with the current (3.0.0-dev.23110.1 / 0.21.0-dev.23110.1) build. // AutoMLExperiment.cs, line 246 is the source of the null reference exception -- "tuner can't be null"
public async Task<TrialResult> RunAsync(CancellationToken ct = default)
var tuner = serviceProvider.GetService<ITuner>();
Contracts.Assert(tuner != null, "tuner can't be null");
var parameter = tuner.Propose(trialSettings); // <<< line 246 Now that I have the libraries, I can be much more efficient at debugging this. I should have done that from the beginning. ;) EDIT: I can also test the v2.0 methods to see if the results have improved. EDIT2: The v2.0 api still fails / skips LightGbm...
I haven't yet found where lib_lightgbm comes from -- I build the Microsoft.ML.LightGbm just fine as far as I can tell. Can there be something strange in my environment causing BOTH of my issues? |
Hmmm are you sure you are on the latest nightly build? The most recent version should be from this feed |
Yes, I was one build ahead because I built the current source on that date -- but there were no code changes for a few days so we were on the same thing. But I still have the problem. So back to the project of determining.... ISSUE#1: Why can't I configure AutoML 2.0 to work as well as 1.7.1? ISSUE#2: Why can't I run the 1.0 API with 2.0? Some observations... AutoML 1.7.1 -- 1 error in the log|7 OnlineGradientDescentRegression -12.0358 19.26 1213.01 22.22 0.6 But it works and comes up with a tight model. |
AutoML 3.0.0-dev.23124.1(current Git @ 2023-02-24 / 8am) The GOOD NEWS is that my ML1 code is now running to completion but still with the worse results, fewer trainings and some exceptions logged.
NOTE: if I add OneHotEncoding to my preFeaturizer then it takes a very long time.
I believe that other tests were running but they were cancelled because of time.
So, I removed this column from the training and removed it from my preFeaturizer. REMOVED: Still had the exception when trying to run the
Hopefully, something that I posted here is helpful to point me in the right direction. |
@TT-Dev1
This might be because we use a larger search space in AutoML2.0, which brings both pros and cons. Larger search space can give better result if budget is enough, but also increase the risk of stucking in time-consuming conifugraitons (for example, numberOfTree=32468 for fast forest will cost a lot of time to train but doesn't necessarily bring a better result.) We are hoping to eliminate that effection using #6577. And you can also provide a smaller search space using AutoML2.0 API to overcome that problem
What is
The error indicates that it fail to find trainer(one of fasttree|sdca|lbfgs|lgbm) in your model, which is strange. Can you share me with around 100 lines of your dataset and I can try reproduce the error. |
BTW if you are also on discord, feel free to ping me (BigMiao#1789) and I'm happy to see what I can do to help you improve training performance |
Hey I've experienced the same issue, though I stopped maintaining my ML code from the days when it used to attempt to predict tails instead of this- basically ML.net regression is giving up on being a ML as soon as it hits a training boundary. But this isn't practical, as any time-based, geometric, biological, or compounding model, necessarily lives on a boundary. Please don't take the conversation to Discord; I've been following it. |
Win10 / ML.NET 1.7.1 vs. 2.0.0 / .NET Framework 4.8
AutoML 2.0 is way worse for me than the previous 1.7.1 release. I tried using the Featurizer or even removing completely and doing it all by hand -- in 2 days of fiddling I can not create a model that is anywhere close to that created with the old CreateRegressionExperiment() version of the previous release.

To Reproduce
Steps to reproduce the behavior:
For 2.0 (where the problem is) I used the same code as this sample (but with my objects): https://github.com/dotnet/machinelearning-samples/tree/main/samples/csharp/getting-started/MLNET2/AutoMLAdvanced
I also unwound the featurizer and did all the same steps by hand and they worked with 1.7.1.
Expected behavior
To be able to train a model that works as well as the last version.
Additional context
NOTE: I had all kinds of different versions on my machine and completely uninstalled Visual Studio, deleted the directory, etc.
Maybe relevant?
Now, after re-installing VS and adding ML.net, I no longer have the ability to edit notebooks (.ipynb).
Sometimes, when playing with the ML.NET Model Builder 2022 (16.13.9.2235601) and the same data, I don't get a Next button with my data. [maybe there's something with my data that causes a problem with the 2.0 code?]

ANY IDEAS WHERE I CAN DEBUG MORE? OR TELL ME WHAT YOU WOULD LIKE TO HAVE ME CAN SHARE SO THAT I CAN BE MORE HELPFUL.
The text was updated successfully, but these errors were encountered: