Skip to content

Passing CollectionDataSource w/ SQL dependency throws "System.ArgumentOutOfRangeException: Source column not found" #399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sar opened this issue Jun 23, 2018 · 3 comments
Labels
API Issues pertaining the friendly API enhancement New feature or request

Comments

@sar
Copy link

sar commented Jun 23, 2018

Issue

Passing a CollectionDataSource.Create<T>(data: var) to pipeline.Add(var) returns null values when referencing SQL Db via EF Core as the data source. Note that the latest version of ML.Net (0.20) is currently indicated as a dependency in the .csproj.

Inspecting the object indicates a valid state of data returned from the table but loaded as null into the pipeline however, I am not clear if the bug is attributable to the dataloader or its use. Any guidelines or help on the corrected implementation of parsing data from SQL Server is much appreciated.

Implementation

genericClass.cs

public partial class GenericClass {
    public int CustomerId { get; set; }
    public float PurchaseAmt { get; set; }
    public string OrderType { get; set; }
}

public partial class GenericPredictionClass {
    [ColumnName("Score")]
    public float PredictedAmt;
}

sampleController.cs

public async Task<IActionResult> ShowOutput() {
    PredictionModel<GenericClass, GenericPredictionClass> model = await TrainAsync();
    
    return View();
}

private Task<List<GenericClass> LoadDataFromSql() {
    var sqlTblOutput = _context.Table1
        .Take(1000);

    return Task.Run(function: () => sqlTblOutput.ToList());
}

private async Task<PredictionModel<GenericClass, GenericPredictionClass>> TrainAsync() {
    var pipeline = new LearningPipeline();
    List<GenericClass> parseData = await LoadDataFromSql();
    var dataReference = CollectionDataSource.Create<GenericClass>(data: parseData);

    pipeline.Add(dataReference);
    pipeline.Add(new CategoricalOneHotVectorizer("OrderType"));
   
    ...

    return model;
}

Locals

pipeline: {Microsoft.ML.LearningPipeline}
    Rows [PipelineItemDebugRow[]]: {Microsoft.ML.PipelineItemDebugRow[10]}
        [0] [PipelineItemDebugRow]: ""
            Values [string]: ""


parseData [List]: Count = 1000
    [0]: {Namespace.Models.GenericClass}
        CustomerId: 1234
        PurchaseAmt: 40.99
        OrderType [string]: "Test Order"


dataReference [ILearningPipelineLoader]: {Microsoft.ML.Data.CollectionDataSource.ListDataSource<Namespace.Models.GenericClass>}
    Non-Public members
        _dataView [IDataView]: null
        _dataViewEntryPoint [DataViewReference]: null
        _listCollection [IList]: Count = 1000
            [0]: {Namespace.Models.GenericClass}
                CustomerId: 1234
                PurchaseAmt: 40.99
                OrderType [string]: "Test Order"

Log

"System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.ArgumentOutOfRangeException: Source column OrderType not found\nParameter name: Source\n at Microsoft.ML.Runtime.Data.OneToOneTransformBase.Bindings.Create(OneToOneTransformBase parent, OneToOneColumn[] column, ISchema input, ITransposeSchema transInput, Func2 testType)\n at Microsoft.ML.Runtime.Data.OneToOneTransformBase..ctor(IHostEnvironment env, String name, OneToOneColumn[] column, IDataView input, Func2 testType)\n at Microsoft.ML.Runtime.Data.TermTransform..ctor(ArgumentsBase args, ColumnBase[] column, IHostEnvironment env, IDataView input)\n at Microsoft.ML.Runtime.Data.CategoricalTransform.Create(IHostEnvironment env, Arguments args, IDataView input)\n at Microsoft.ML.Runtime.Data.Categorical.CatTransformDict(IHostEnvironment env, Arguments input)\n --- End of inner exception stack trace ---\n at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)\n at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)\n at Microsoft.ML.Runtime.EntryPoints.EntryPointNode.Run()\n at Microsoft.ML.Runtime.EntryPoints.EntryPointGraph.RunNode(EntryPointNode node)\n at Microsoft.ML.Runtime.EntryPoints.JsonUtils.GraphRunner.RunAllNonMacros()\n at Microsoft.ML.Runtime.EntryPoints.JsonUtils.GraphRunner.RunAll()\n at Microsoft.ML.LearningPipeline.Execute(IHostEnvironment environment)\n at Microsoft.ML.LearningPipelineDebugProxy.ExecutePipeline()"

System information

  • Distro: Fedora 28 (latest)
  • .NET Version: 2.1.301
  • ML.NET Package version: 0.2.0
@Ivanidzo4ka
Copy link
Contributor

We currently don't support properties in "DTO classes", only fields. For the same reason we have problems with F# and other places which try to use classes with properties. And I'm not sure why.
Maybe @TomFinley know answer. Also @glebuk @shauheen for visibility.

@TomFinley
Copy link
Contributor

Sorry @Sarthakmalik and @Ivanidzo4ka , I somehow missed that I'd been tagged on a question till just now. Properties are a little more awkward due to inability to pass them via ref, which is how buffer sharing works with pipelines (e.g., here). That's not an insurmountable problem of course: you could get the property as a local variable, get the value, then set the property.

There's no fundamental limitation other than limited engineering resources, I suppose.

@Ivanidzo4ka
Copy link
Contributor

We support via #616 .
So I'm closing this issue.

@Ivanidzo4ka Ivanidzo4ka added enhancement New feature or request API Issues pertaining the friendly API labels Oct 18, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants