Skip to content

AutoML 2 is way worse than 1.7.1 (for me) #6552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TT-Dev1 opened this issue Jan 28, 2023 · 13 comments
Open

AutoML 2 is way worse than 1.7.1 (for me) #6552

TT-Dev1 opened this issue Jan 28, 2023 · 13 comments
Assignees
Labels
AutoML.NET Automating various steps of the machine learning process

Comments

@TT-Dev1
Copy link

TT-Dev1 commented Jan 28, 2023

Win10 / ML.NET 1.7.1 vs. 2.0.0 / .NET Framework 4.8

AutoML 2.0 is way worse for me than the previous 1.7.1 release. I tried using the Featurizer or even removing completely and doing it all by hand -- in 2 days of fiddling I can not create a model that is anywhere close to that created with the old CreateRegressionExperiment() version of the previous release.
image

To Reproduce
Steps to reproduce the behavior:
For 2.0 (where the problem is) I used the same code as this sample (but with my objects): https://github.com/dotnet/machinelearning-samples/tree/main/samples/csharp/getting-started/MLNET2/AutoMLAdvanced

//Define pipeline
SweepablePipeline pipeline =
    ctx.Auto().Featurizer(data, columnInformation: columnInference.ColumnInformation)
        .Append(ctx.Auto().Regression(labelColumnName: columnInference.ColumnInformation.LabelColumnName, useLgbm: false));

// Create AutoML experiment
AutoMLExperiment experiment = ctx.Auto().CreateExperiment();

// Configure experiment
experiment
    .SetPipeline(pipeline)
    .SetRegressionMetric(RegressionMetric.RSquared, labelColumn: columnInference.ColumnInformation.LabelColumnName)
    .SetTrainingTimeInSeconds(60)
    .SetGridSearchTuner()
    .SetDataset(trainValidationData);

// Run experiment
var cts = new CancellationTokenSource();
TrialResult experimentResults = await experiment.RunAsync(cts.Token);

I also unwound the featurizer and did all the same steps by hand and they worked with 1.7.1.

Expected behavior
To be able to train a model that works as well as the last version.

Additional context
NOTE: I had all kinds of different versions on my machine and completely uninstalled Visual Studio, deleted the directory, etc.

Maybe relevant?

  • Now, after re-installing VS and adding ML.net, I no longer have the ability to edit notebooks (.ipynb).

  • Sometimes, when playing with the ML.NET Model Builder 2022 (16.13.9.2235601) and the same data, I don't get a Next button with my data. [maybe there's something with my data that causes a problem with the 2.0 code?]
    image

ANY IDEAS WHERE I CAN DEBUG MORE? OR TELL ME WHAT YOU WOULD LIKE TO HAVE ME CAN SHARE SO THAT I CAN BE MORE HELPFUL.

@ghost ghost added the untriaged New issue has not been triaged label Jan 28, 2023
@TT-Dev1
Copy link
Author

TT-Dev1 commented Jan 29, 2023

More information...

I was able to get rid of the tails that I circled in red by adding a binary OneHotEncoding (Binary) to my pipeline

mlContext.Transforms.Categorical.OneHotEncoding(@"blah", @"blah", outputKind: OneHotEncodingEstimator.OutputKind.Binary)
image

NOTE: the tail is still there with Indicator

NOTE2: the tail goes away when I drop the categorical field all together but I must also add
.Append(mlContext.Transforms.NormalizeMinMax("Features", fixZero: false))

NOTE3: having Transforms.ReplaceMissingValues in my pipleline also causes the tail to appear

But even with this the R^2 and training results are still much worse than what I got with the 1.7.1 version.

IS THERE A WAY TO SEE WHAT PIPLEINE WAS CREATED IN 1.7.1 WITH mlContext.Auto().CreateRegressionExperiment()?

@LittleLittleCloud
Copy link
Contributor

LittleLittleCloud commented Feb 1, 2023

Looks like you're using GridSearch for HPO optimization and you disable LightGbm as well? Can you try using default tuner (by removing SetGridSearchTuner) instead?

In the meantime, you can still use AutoML v1.0 API in AutoML v2.0, which basically inherit the configuration of AutoML 1.7.1 in featurizer and trainers. Can you also give it a try and see if performance improves?

Now, after re-installing VS and adding ML.net, I no longer have the ability to edit notebooks (.ipynb).
@JakeRadMSFT will know better.

@TT-Dev1
Copy link
Author

TT-Dev1 commented Feb 10, 2023

Hello Jake and thank you for your answer.

.SetGridSearchTuner()

GOOD EYE! Yes, that additional call does break everything and I was too quick in pasting the sample code.

I had already removed the call to SetGridSearchTuner() because with it then nothing works. So I'm still without an answer.

NOTE: leaving in lightgbm causes lots of errors in my log...

"failed with exception Unable to load DLL 'lib_lightgbm': The specified module could not be found. (Exception from HRESULT: 0x8007007E)"

I read this old post "Null reference exception when training #6470 " about the dll but was not able to resolve that. Maybe it's a sign that something isn't set up correctly?

However I still get the group not as tight and something that binds / limits the predicted range.
image

@TT-Dev1
Copy link
Author

TT-Dev1 commented Feb 10, 2023

In the meantime, you can still use AutoML v1.0 API in AutoML v2.0, which basically inherit the configuration of AutoML 1.7.1 in featurizer and trainers. Can you also give it a try and see if performance improves?

When I just tried to use my 1.7.1 code with 2.0.1 I get an exception..

Exception thrown: 'System.AggregateException' in mscorlib.dll System.AggregateException: One or more errors occurred. ---> System.NullReferenceException: Object reference not set to an instance of an object. at Microsoft.ML.AutoML.AutoMLExperiment.<RunAsync>d__26.MoveNext() --- End of inner exception stack trace --- at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification)
at Microsoft.ML.AutoML.AutoMLExperiment.Run()
at Microsoft.ML.AutoML.RegressionExperiment.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress1 progressHandler)
at TestML.BuildTrainEvaluateAndSaveModel(MLContext mlContext, String trainField, String dataInFname, String modelOutFname, String htmlChartFname, String logFname, UInt32 trainingSeconds, Boolean openResults, Boolean testPfi) in ..\TestML.cs:line 269
---> (Inner Exception #0) System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.ML.AutoML.AutoMLExperiment.d__26.MoveNext()<---`

My code is pretty straightforward but I do create the column information structure by hand. And it works in 1.7.1.

But I see no difference in the structure that is created automatically from a csv.
image

Here is the code / pseudo-code.

ColumnInformation columnInformation = new ColumnInformation();
columnInformation.TextColumnNames.Clear();
columnInformation.CategoricalColumnNames.Clear();

columnInformation.LabelColumnName = trainField;
columnInformation.ItemIdColumnName = "UID";

RemoveList(columnInformation, UNUSED);
// The function is essentially this...
//foreach (string s in removeList)						
//	columnInformation.IgnoredColumnNames.Add(colName);
//	columnInformation.NumericColumnNames.Remove(colName);

AddNumericalList(columnInformation, USED);
// The function is essentially this...
//foreach (string s in addList)
//	columnInformation.NumericColumnNames.Add(s);

var experimentSettings = new RegressionExperimentSettings();
experimentSettings.MaxExperimentTimeInSeconds = trainingSeconds;
experimentSettings.CacheDirectoryName = null;   // keep models in memory
experimentSettings.OptimizingMetric = RegressionMetric.RSquared;

// Create an experiment
RegressionExperiment experiment = mlContext.Auto().CreateRegressionExperiment(experimentSettings);

// Run the experiment -- THIS IS WHERE IT FAILS
ExperimentResult<RegressionMetrics> experimentResult = experiment.Execute(trainingDataView, columnInformation);

@LittleLittleCloud
Copy link
Contributor

@TT-Dev1 Thanks for the reply, and I definitly willing to help you figure out what's not working here. Especially on figuring out why it's not better than old AutoML

copy code from automl 1.7.1 not working

Looks like a dup of ##6446. This issue has been fixed but haven't released to nuget yet. You can try nightly build though.

NOTE: leaving in lightgbm causes lots of errors in my log... "failed with exception Unable to load DLL 'lib_lightgbm': The specified module could not be found. (Exception from HRESULT: 0x8007007E)"

Are you running on a linux/osx arm64 device? If so LightGbm won't be available on those platforms.

A few more questions:

  • Is your experiment running on AutoML 1.7.1 the same platform of experiment running AutoML 0.20.1

@LittleLittleCloud LittleLittleCloud self-assigned this Feb 10, 2023
@TT-Dev1
Copy link
Author

TT-Dev1 commented Feb 10, 2023

Looks like a dup of ##6446. This issue has been fixed but haven't released to nuget yet. You can try nightly build though.

Thanks VERY MUCH!!!! I will try and report back.

I can verify (like you said) that this bug has not been fixed in the Dec. 22, 2022 release.
image

Are you running on a linux/osx arm64 device? If so LightGbm won't be available on those platforms.

No, Win10, Intel x64.

EDIT: is there a way to force the install or is there a place that I can look to find the .dll?

Is your experiment running on AutoML 1.7.1 the same platform of experiment running AutoML 0.20.1

Yes, all on the same box.

@TT-Dev1
Copy link
Author

TT-Dev1 commented Feb 10, 2023

You can try nightly build though.

OK -- seems like I'm getting somewhere now. THANK YOU.

Trying AutoML v1.0 API in AutoML v2.0 causes a new (or more specific error) with the current (3.0.0-dev.23110.1 / 0.21.0-dev.23110.1) build.

// AutoMLExperiment.cs, line 246 is the source of the null reference exception -- "tuner can't be null"
public async Task<TrialResult> RunAsync(CancellationToken ct = default)
            var tuner = serviceProvider.GetService<ITuner>();
            Contracts.Assert(tuner != null, "tuner can't be null");
            
            var parameter = tuner.Propose(trialSettings);   // <<< line 246

Now that I have the libraries, I can be much more efficient at debugging this. I should have done that from the beginning. ;)

EDIT: I can also test the v2.0 methods to see if the results have improved.

EDIT2: The v2.0 api still fails / skips LightGbm...

Exception thrown: 'System.DllNotFoundException' in Microsoft.ML.LightGbm.dll
An exception of type 'System.DllNotFoundException' occurred in Microsoft.ML.LightGbm.dll but was not handled in user code
Unable to load DLL 'lib_lightgbm': The specified module could not be found. (Exception from HRESULT: 0x8007007E)

I haven't yet found where lib_lightgbm comes from -- I build the Microsoft.ML.LightGbm just fine as far as I can tell.

Can there be something strange in my environment causing BOTH of my issues?

@LittleLittleCloud
Copy link
Contributor

Hmmm are you sure you are on the latest nightly build? The most recent version should be
3.0.0-preview.23109.1

from this feed
https://dev.azure.com/dnceng/public/_artifacts/feed/dotnet-libraries/NuGet/Microsoft.ML/overview/3.0.0-preview.23109.1

@LittleLittleCloud LittleLittleCloud added the AutoML.NET Automating various steps of the machine learning process label Feb 16, 2023
@TT-Dev1
Copy link
Author

TT-Dev1 commented Feb 24, 2023

Hmmm are you sure you are on the latest nightly build? The most recent version should be
3.0.0-preview.23109.1

Yes, I was one build ahead because I built the current source on that date -- but there were no code changes for a few days so we were on the same thing.

But I still have the problem.

So back to the project of determining....

ISSUE#1: Why can't I configure AutoML 2.0 to work as well as 1.7.1?

ISSUE#2: Why can't I run the 1.0 API with 2.0?

Some observations...

AutoML 1.7.1 -- 1 error in the log

|7 OnlineGradientDescentRegression -12.0358 19.26 1213.01 22.22 0.6
'IBVConnector.exe' (CLR v4.0.30319: IBVConnector.exe): Loaded 'C:\mltest\bin\Debug\Microsoft.ML.Mkl.Components.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
Exception during AutoML iteration: System.ArgumentOutOfRangeException: Input matrix was not positive-definite. Try using a larger L2 regularization weight.

But it works and comes up with a tight model.

@TT-Dev1
Copy link
Author

TT-Dev1 commented Feb 24, 2023

AutoML 3.0.0-dev.23124.1

(current Git @ 2023-02-24 / 8am)

The GOOD NEWS is that my ML1 code is now running to completion but still with the worse results, fewer trainings and some exceptions logged.

  • Only runs 3 tests in 30 seconds (vs. 20+).
|     Trainer                             RSquared Absolute-loss Squared-loss RMS-loss  Duration

|1    Unknown=>ReplaceMissingValues=>OneHotEncoding=>Concatenate=>FastForestRegression   0.6575          3.83        24.61     4.96       1.0

|2    Unknown=>ReplaceMissingValues=>OneHotEncoding=>Concatenate=>FastForestRegression   0.8989          2.95        15.12     3.89       0.9

|3    Unknown=>ReplaceMissingValues=>OneHotHashEncoding=>Concatenate=>FastTreeRegression -135.2384        117.29     13818.06   117.55       1.2

NOTE: if I add OneHotEncoding to my preFeaturizer then it takes a very long time.

//mlContext.Transforms.Categorical.OneHotEncoding(@"PrMorph", @"PrMorph", outputKind: OneHotEncodingEstimator.OutputKind.Binary) takes a very long time to complete!

I believe that other tests were running but they were cancelled because of time.

An exception of type 'System.OperationCanceledException' occurred in Microsoft.ML.Core.dll but was not handled in user code
Operation was canceled.
  • Still have less accurate results with tails.
    image

  • PermutationFeatureImportance now fails with code that worked w/ 1.7.1

System.ArgumentNullException: The model provided does not have a compatible predictor
Parameter name: lastTransformer
   at Microsoft.ML.Runtime.Contracts.CheckValue[T](IExceptionContext ctx, T val, String paramName, String msg)
   at Microsoft.ML.PermutationFeatureImportanceExtensions.PermutationFeatureImportance[TMetric,TResult](IHostEnvironment env, ITransformer model, IDataView data, Func`1 resultInitializer, Func`2 evaluationFunc, Func`3 deltaFunc, Int32 permutationCount, Boolean useFeatureWeightFilter, Nullable`1 numberOfExamplesToUse)
   at Microsoft.ML.PermutationFeatureImportanceExtensions.PermutationFeatureImportance(RegressionCatalog catalog, ITransformer model, IDataView data, String labelColumnName, Boolean useFeatureWeightFilter, Nullable`1 numberOfExamplesToUse, Int32 permutationCount)
  • Simplified PFI call still fails
    Simplified code:
ImmutableDictionary<string, RegressionMetricsStatistics> permutationFeatureImportance =
    mlContext.Regression
    .PermutationFeatureImportance(
                model,
                data,
                labelColumnName: trainField,
                useFeatureWeightFilter: false,
                numberOfExamplesToUse: null,
permutationCount: 1);
An exception of type 'System.ArgumentOutOfRangeException' occurred in Microsoft.ML.Core.dll but was not handled in user code
__Features__ column 'Feature' not found
The thread 0x7864 has exited with code 0 (0x0).
System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.ArgumentOutOfRangeException: __Features__ column 'Feature' not found
Parameter name: schema
   at Microsoft.ML.Data.RoleMappedSchema.MapFromNames(DataViewSchema schema, IEnumerable`1 roles, Boolean opt)
System.InvalidOperationException: Can't bind the IDataView column 'PrMorph' of type 'Vector<Single, 4>' to field or property 'PrMorph' of type 'System.String'.
   at Microsoft.ML.Data.TypedCursorable`1..ctor(IHostEnvironment env, IDataView data, Boolean ignoreMissingColumns, InternalSchemaDefinition schemaDefn)

So, I removed this column from the training and removed it from my preFeaturizer.

REMOVED:
.Append(mlContext.Transforms.Categorical.OneHotEncoding(@"PrMorphINT", @"PrMorph", outputKind: OneHotEncodingEstimator.OutputKind.Bag)); // .Bag = BEST; .Indicator = clipped range; .Binary = loose

Still had the exception when trying to run the

Parameter name: schema
   at Microsoft.ML.Data.RoleMappedSchema.MapFromNames(DataViewSchema schema, IEnumerable`1 roles, Boolean opt)
   at Microsoft.ML.Data.RoleMappedSchema..ctor(DataViewSchema schema, IEnumerable`1 roles, Boolean opt)
   at Microsoft.ML.Data.GenericScorer.Bindings.Create(IHostEnvironment env, ISchemaBindableMapper bindable, DataViewSchema input, IEnumerable`1 roles, String suffix, Boolean user)
   at Microsoft.ML.Data.GenericScorer.Bindings.ApplyToSchema(IHostEnvironment env, DataViewSchema input)
   at Microsoft.ML.Data.GenericScorer..ctor(IHostEnvironment env, GenericScorer transform, IDataView data)
   at Microsoft.ML.Data.GenericScorer.ApplyToDataCore(IHostEnvironment env, IDataView newSource)
   at Microsoft.ML.Data.RowToRowScorerBase.ApplyToData(IHostEnvironment env, IDataView newSource)
   at Microsoft.ML.Data.PredictionTransformerBase`1.Transform(IDataView input)
   at Microsoft.ML.Transforms.PermutationFeatureImportance`3.GetImportanceMetricsMatrix(IHostEnvironment env, IPredictionTransformer`1 model, IDataView data, Func`1 resultInitializer, Func`2 evaluationFunc, Func`3 deltaFunc, String features, Int32 permutationCount, Boolean useFeatureWeightFilter, Nullable`1 topExamples)
   --- End of inner exception stack trace ---
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
   at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)
   at Microsoft.ML.PermutationFeatureImportanceExtensions.PermutationFeatureImportance[TMetric,TResult](IHostEnvironment env, ITransformer model, IDataView data, Func`1 resultInitializer, Func`2 evaluationFunc, Func`3 deltaFunc, Int32 permutationCount, Boolean useFeatureWeightFilter, Nullable`1 numberOfExamplesToUse)
   at Microsoft.ML.PermutationFeatureImportanceExtensions.PermutationFeatureImportance(RegressionCatalog catalog, ITransformer model, IDataView data, String labelColumnName, Boolean useFeatureWeightFilter, Nullable`1 numberOfExamplesToUse, Int32 permutationCount)
  • Testing the model right after training also fails in 3.0 that didn't fail in 1.0
PredictionEngine<ModelInput, ModelOutput> pe = mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(trainedModel, trainingDataView.Schema);
ModelInput rec2Test = mlContext.Data.CreateEnumerable<ModelInput>(trainingDataView, reuseRowObject: false).First<ModelInput>();
ModelOutput mo = pe.Predict(rec2Test);
Debug.WriteLine($"==== results: {(float)rec2Test[trainField]} ||| {mo.Score}");

Hopefully, something that I posted here is helpful to point me in the right direction.

@LittleLittleCloud LittleLittleCloud removed the untriaged New issue has not been triaged label Feb 27, 2023
@LittleLittleCloud
Copy link
Contributor

LittleLittleCloud commented Feb 27, 2023

@TT-Dev1
Good to see now you can use AutoML v1.x API in AutoML2.*. There're a lot of possible reasons why it gives a worse result versus AutoML 1.7 though. It might be because

  • different search space
  • different HPO algorithm
  • different trainers

The GOOD NEWS is that my ML1 code is now running to completion but still with the worse results, fewer trainings and some exceptions logged.

This might be because we use a larger search space in AutoML2.0, which brings both pros and cons. Larger search space can give better result if budget is enough, but also increase the risk of stucking in time-consuming conifugraitons (for example, numberOfTree=32468 for fast forest will cost a lot of time to train but doesn't necessarily bring a better result.) We are hoping to eliminate that effection using #6577. And you can also provide a smaller search space using AutoML2.0 API to overcome that problem

if I add OneHotEncoding to my preFeaturizer then it takes a very long time.
//mlContext.Transforms.Categorical.OneHotEncoding(@"PrMorph", @"PrMorph", outputKind: OneHotEncodingEstimator.OutputKind.Binary) takes a very long time to complete!

What is PrMorph, is that a text column?
One thing to note is that AutoML1.* API also applies featurizer to your dataset. In most of cases, OneHotEncoding is not time-consuming, but TextFeaturizer is not. So if PrMorph is a text column and it's inferred as text instead of category, it's very likely to add a bunch of training time.

PermutationFeatureImportance now fails with code that worked w/ 1.7.1

The error indicates that it fail to find trainer(one of fasttree|sdca|lbfgs|lgbm) in your model, which is strange. Can you share me with around 100 lines of your dataset and I can try reproduce the error.

@LittleLittleCloud
Copy link
Contributor

BTW if you are also on discord, feel free to ping me (BigMiao#1789) and I'm happy to see what I can do to help you improve training performance

@random-namespace
Copy link

Hey I've experienced the same issue, though I stopped maintaining my ML code from the days when it used to attempt to predict tails instead of this- basically ML.net regression is giving up on being a ML as soon as it hits a training boundary. But this isn't practical, as any time-based, geometric, biological, or compounding model, necessarily lives on a boundary.

Please don't take the conversation to Discord; I've been following it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AutoML.NET Automating various steps of the machine learning process
Projects
None yet
Development

No branches or pull requests

3 participants