Skip to content

API to create a TextLoader from class metadata #561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eerhardt opened this issue Jul 19, 2018 · 3 comments
Closed

API to create a TextLoader from class metadata #561

eerhardt opened this issue Jul 19, 2018 · 3 comments
Assignees
Labels
API Issues pertaining the friendly API enhancement New feature or request
Milestone

Comments

@eerhardt
Copy link
Member

In the LearningPipeline API, we have the ability to create a TextLoader object using metadata applied to a regular C# class:

pipeline.Add(new TextLoader(dataPath).CreateFrom<HousePriceData>(useHeader: true, separator: ','));

public class HousePriceData
{
    [Column(ordinal: "0")]
    public string Id;
    [Column(ordinal: "1")]
    public string Date;
    [Column(ordinal: "2", name: "Label")]
    public float Price;
    [Column(ordinal: "3")]
    public float Bedrooms;
    [Column(ordinal: "4")]
    public float Bathrooms;
    [Column(ordinal: "5")]
    public float SqftLiving;
    [Column(ordinal: "6")]
    public float SqftLot;
    ...

I find it more intuitive to have a class decorated with metadata to use to load data instead of imperatively building up a schema in code, like the following:

private static TextLoader.Column ScalarCol(string name, int ordinal, DataKind kind = DataKind.Num)
    => new TextLoader.Column() { Name = name, Type = kind, Source = new[] { new TextLoader.Range() { Min = ordinal, Max = ordinal } } };

var loader = new TextLoader(env, new TextLoader.Arguments()
{
    HasHeader = true,
    SeparatorChars = new char[] { ',' },
    // These column declarations are meant to mirror those that appear in HousePriceData.
    Column = new[] {
        ScalarCol("Id", 0, DataKind.Text),
        ScalarCol("Date", 1, DataKind.Text),
        ScalarCol("Label", 2),
        ScalarCol("Bedrooms", 3),
        ScalarCol("Bathrooms", 4),
        ScalarCol("SqftLiving", 5),
        ScalarCol("SqftLot", 6),
}, new MultiFileSource(dataPath));

This issue is being opened to ensure we preserve this behavior with the new direct access API design proposed in #371 (possibly using a different API design, but preserving the functionality).

/cc @ericstj @TomFinley @Zruty0 @terrajobst

@Zruty0
Copy link
Contributor

Zruty0 commented Jul 19, 2018

I definitely think we should be able to do both.

Declarative schema is shorter, but imperative schema allows to read the data without knowing its schema at compile time.

@eerhardt
Copy link
Member Author

Agreed. I wasn't proposing to remove the imperative API to build the columns up using code. That is definitely necessary. I was proposing building a "convenience" API on top of it, which built up the objects from metadata.

@shauheen shauheen added the API Issues pertaining the friendly API label Jul 23, 2018
@eerhardt eerhardt added the enhancement New feature or request label Nov 30, 2018
@sfilipi sfilipi self-assigned this Dec 6, 2018
@sfilipi sfilipi added this to the 1218 milestone Dec 6, 2018
@glebuk
Copy link
Contributor

glebuk commented Jan 7, 2019

Done per Senja

@glebuk glebuk closed this as completed Jan 7, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants