API to create a TextLoader from class metadata #561

eerhardt · 2018-07-19T22:37:32Z

In the LearningPipeline API, we have the ability to create a TextLoader object using metadata applied to a regular C# class:

pipeline.Add(new TextLoader(dataPath).CreateFrom<HousePriceData>(useHeader: true, separator: ','));

public class HousePriceData
{
    [Column(ordinal: "0")]
    public string Id;
    [Column(ordinal: "1")]
    public string Date;
    [Column(ordinal: "2", name: "Label")]
    public float Price;
    [Column(ordinal: "3")]
    public float Bedrooms;
    [Column(ordinal: "4")]
    public float Bathrooms;
    [Column(ordinal: "5")]
    public float SqftLiving;
    [Column(ordinal: "6")]
    public float SqftLot;
    ...

I find it more intuitive to have a class decorated with metadata to use to load data instead of imperatively building up a schema in code, like the following:

private static TextLoader.Column ScalarCol(string name, int ordinal, DataKind kind = DataKind.Num)
    => new TextLoader.Column() { Name = name, Type = kind, Source = new[] { new TextLoader.Range() { Min = ordinal, Max = ordinal } } };

var loader = new TextLoader(env, new TextLoader.Arguments()
{
    HasHeader = true,
    SeparatorChars = new char[] { ',' },
    // These column declarations are meant to mirror those that appear in HousePriceData.
    Column = new[] {
        ScalarCol("Id", 0, DataKind.Text),
        ScalarCol("Date", 1, DataKind.Text),
        ScalarCol("Label", 2),
        ScalarCol("Bedrooms", 3),
        ScalarCol("Bathrooms", 4),
        ScalarCol("SqftLiving", 5),
        ScalarCol("SqftLot", 6),
}, new MultiFileSource(dataPath));

This issue is being opened to ensure we preserve this behavior with the new direct access API design proposed in #371 (possibly using a different API design, but preserving the functionality).

/cc @ericstj @TomFinley @Zruty0 @terrajobst

The text was updated successfully, but these errors were encountered:

Zruty0 · 2018-07-19T22:52:11Z

I definitely think we should be able to do both.

Declarative schema is shorter, but imperative schema allows to read the data without knowing its schema at compile time.

eerhardt · 2018-07-19T23:13:11Z

Agreed. I wasn't proposing to remove the imperative API to build the columns up using code. That is definitely necessary. I was proposing building a "convenience" API on top of it, which built up the objects from metadata.

glebuk · 2019-01-07T23:19:27Z

Done per Senja

shauheen added the API Issues pertaining the friendly API label Jul 23, 2018

eerhardt added the enhancement New feature or request label Nov 30, 2018

eerhardt mentioned this issue Nov 30, 2018

Be able to create a reader by just specifying an schema class instead of all the columns #1515

Closed

sfilipi self-assigned this Dec 6, 2018

sfilipi added this to the 1218 milestone Dec 6, 2018

sfilipi mentioned this issue Dec 14, 2018

Schema based text loader #1878

Merged

glebuk closed this as completed Jan 7, 2019

eerhardt mentioned this issue Sep 27, 2019

[DatabaseLoader] Error when using attributes (i.e ColumnName) #4195

Closed

ghost locked as resolved and limited conversation to collaborators Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API to create a TextLoader from class metadata #561

API to create a TextLoader from class metadata #561

eerhardt commented Jul 19, 2018

Zruty0 commented Jul 19, 2018

eerhardt commented Jul 19, 2018

glebuk commented Jan 7, 2019

API to create a TextLoader from class metadata #561

API to create a TextLoader from class metadata #561

Comments

eerhardt commented Jul 19, 2018

Zruty0 commented Jul 19, 2018

eerhardt commented Jul 19, 2018

glebuk commented Jan 7, 2019