Skip to content

how to load 25k features using TextLoader? #420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vivekpradhan opened this issue Jun 26, 2018 · 2 comments
Closed

how to load 25k features using TextLoader? #420

vivekpradhan opened this issue Jun 26, 2018 · 2 comments

Comments

@vivekpradhan
Copy link

System information

.NET Core SDK:
Version: 2.1.301
Commit: 59524873d6

Runtime Environment:
OS Name: ubuntu
OS Version: 16.04

Issue

  • What did you do?
    Loaded a subset of columns of my data using one of the examples (Iris data). I have a total of 25k features in the CSV file. I don't want to list all 25k of them in public class IrisData. Is there a better way to load this dataset?
public class IrisData
{
    [Column("0")]
    [ColumnName("Feature0")]
    public float Feature0;
    [Column("1")]
    [ColumnName("Feature1")]
    public float Feature1;
    [Column("2")]
    [ColumnName("Feature2")]
    public float Feature2;
    [Column("3")]
    [ColumnName("Feature3")]
    public float Feature3;
    [Column("4")]
    [ColumnName("Label")]
    public bool Label;
}
string dataPath = "../../../data/train.csv";
string testPath = "../../../data/test.csv";
pipeline.Add(new TextLoader(dataPath).CreateFrom<IrisData>(separator: ','));
pipeline.Add(new ColumnConcatenator("Features", "Feature0", "Feature1", "Feature2", "Feature3"));
@glebuk
Copy link
Contributor

glebuk commented Jun 26, 2018

Great question, Vivek.
You can use vector-valued columns and ordinal parameter to read all your features into a relatively simple struct. Imagine, that your data has one label and many categorical, and numerical features.
You can define a data structure like this to initialize your text reader:

        public class MyData
        {
            [Column(ordinal: "0")]
            public string Label;

            [Column(ordinal: "1,2,10-1000")]
            [VectorType(992)]
            public float[] NumericFeatures;

            [Column(ordinal: "3-9,2000-3000")]
            [VectorType(1007)]
            public string[] CategoricalFeatures;

            // You can add accessors to features of interest.
            public float? NumericFeature0 => NumericFeatures?[0];
        }pipeline.Add(new TextLoader(dataPath).CreateFrom<MyData>(useHeader: false, allowQuotedStrings: true, supportSparse: false));

@vivekpradhan
Copy link
Author

This works perfectly. Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants