Skip to content

Be able to create a reader by just specifying an schema class instead of all the columns #1515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CESARDELATORRE opened this issue Nov 2, 2018 · 4 comments

Comments

@CESARDELATORRE
Copy link
Contributor

In current version (v0.6 and v0.7) when creating a TextLoader we need to provide all the columns explicetely like the following:

_loader = mlContext.Data.TextReader(new TextLoader.Arguments()
{
Separator = ",",
HasHeader = true,
Column = new[]
{
new TextLoader.Column("Season", DataKind.R4, 2),
new TextLoader.Column("Year", DataKind.R4, 3),
new TextLoader.Column("Month", DataKind.R4, 4),
new TextLoader.Column("Hour", DataKind.R4, 5),
new TextLoader.Column("Holiday", DataKind.R4, 6),
new TextLoader.Column("Weekday", DataKind.R4, 7),
new TextLoader.Column("WorkingDay", DataKind.R4, 8),
new TextLoader.Column("Weather", DataKind.R4, 9),
new TextLoader.Column("Temperature", DataKind.R4, 10),
new TextLoader.Column("NormalizedTemperature", DataKind.R4, 11),
new TextLoader.Column("Humidity", DataKind.R4, 12),
new TextLoader.Column("Windspeed", DataKind.R4, 13),
new TextLoader.Column("Count", DataKind.R4, 16)
}
}

Since you usually also have a schema class for the observations, like the following:

public class DemandObservation
{
    public float Season;
    public float Year;
    public float Month;
    public float Hour;
    public float Holiday;
    public float Weekday;
    public float WorkingDay;
    public float Weather;
    public float Temperature;
    public float NormalizedTemperature;
    public float Humidity;
    public float Windspeed;
    public float Count;   // This is the observed count, to be used a "label" to predict
}

It would be very convenient to be able to create a TextReader by just providing the class, like this:

mlContext.Data.TextReader(DemandObservation);

@Zruty0
Copy link
Contributor

Zruty0 commented Nov 12, 2018

Unfortunately, such a simple interface will not work, because the order of fields in the class is not defined, so we cannot expect Season to be column number 0, and Month to be number 2. We would still need some attribute to tell TextLoader where to read the column from:

public class DemandObservation
{
    [LoadColumn(0)]   
    public float Season;
    [LoadColumn(1)]   
    public float Year;
    [LoadColumn(2)]   
    public float Month;
    [LoadColumn(3)]   
    public float Hour;
    [LoadColumn(4)]   
    public float Holiday;
    [LoadColumn(5)]   
    public float Weekday;
    [LoadColumn(6)]   
    public float WorkingDay;
    [LoadColumn(7)]   
    public float Weather;
    [LoadColumn(8)]   
    public float Temperature;
    [LoadColumn(9)]   
    public float NormalizedTemperature;
    [LoadColumn(10)]   
    public float Humidity;
    [LoadColumn(11)]   
    public float Windspeed;
    [LoadColumn(12), ColumnName("Label")]   
    public float Count;   // This is the observed count, to be used a "label" to predict
}
// ...
var reader = mlContext.Data.TextReader<DemandObservation>();

I don't want to conflate the 'reading' attributes (like LoadColumn) with the 'schema comprehension' attributes (ColumnName, VectorType etc.): reading attributes are really specific to the reader, while schema attributes are universal.

@sfilipi
Copy link
Member

sfilipi commented Nov 21, 2018

@Zruty0 a bit unclear on what is the conclusion here: are we going to use Attributes for position, role and other to extract column related informaiton?

@Zruty0
Copy link
Contributor

Zruty0 commented Nov 21, 2018

If this class is going to be used for schema comprehension (i.e. inside MakePredictionFunction), we will use VectorType, KeyType and ColumnName attributes for schema refinement, as we always do.

If this class is going to be used for specifying the schema of a text file, we will use LoadColumn to specify the index (or indices, for vector columns). We will also respect the above attributes for schema refinement.

@eerhardt
Copy link
Member

This is a duplicate of #561. Closing as dupe.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants