Be able to create a reader by just specifying an schema class instead of all the columns #1515

CESARDELATORRE · 2018-11-02T18:34:50Z

In current version (v0.6 and v0.7) when creating a TextLoader we need to provide all the columns explicetely like the following:

_loader = mlContext.Data.TextReader(new TextLoader.Arguments()
{
Separator = ",",
HasHeader = true,
Column = new[]
{
new TextLoader.Column("Season", DataKind.R4, 2),
new TextLoader.Column("Year", DataKind.R4, 3),
new TextLoader.Column("Month", DataKind.R4, 4),
new TextLoader.Column("Hour", DataKind.R4, 5),
new TextLoader.Column("Holiday", DataKind.R4, 6),
new TextLoader.Column("Weekday", DataKind.R4, 7),
new TextLoader.Column("WorkingDay", DataKind.R4, 8),
new TextLoader.Column("Weather", DataKind.R4, 9),
new TextLoader.Column("Temperature", DataKind.R4, 10),
new TextLoader.Column("NormalizedTemperature", DataKind.R4, 11),
new TextLoader.Column("Humidity", DataKind.R4, 12),
new TextLoader.Column("Windspeed", DataKind.R4, 13),
new TextLoader.Column("Count", DataKind.R4, 16)
}
}

Since you usually also have a schema class for the observations, like the following:

public class DemandObservation
{
    public float Season;
    public float Year;
    public float Month;
    public float Hour;
    public float Holiday;
    public float Weekday;
    public float WorkingDay;
    public float Weather;
    public float Temperature;
    public float NormalizedTemperature;
    public float Humidity;
    public float Windspeed;
    public float Count;   // This is the observed count, to be used a "label" to predict
}

It would be very convenient to be able to create a TextReader by just providing the class, like this:

mlContext.Data.TextReader(DemandObservation);

The text was updated successfully, but these errors were encountered:

Zruty0 · 2018-11-12T23:47:54Z

Unfortunately, such a simple interface will not work, because the order of fields in the class is not defined, so we cannot expect Season to be column number 0, and Month to be number 2. We would still need some attribute to tell TextLoader where to read the column from:

public class DemandObservation
{
    [LoadColumn(0)]   
    public float Season;
    [LoadColumn(1)]   
    public float Year;
    [LoadColumn(2)]   
    public float Month;
    [LoadColumn(3)]   
    public float Hour;
    [LoadColumn(4)]   
    public float Holiday;
    [LoadColumn(5)]   
    public float Weekday;
    [LoadColumn(6)]   
    public float WorkingDay;
    [LoadColumn(7)]   
    public float Weather;
    [LoadColumn(8)]   
    public float Temperature;
    [LoadColumn(9)]   
    public float NormalizedTemperature;
    [LoadColumn(10)]   
    public float Humidity;
    [LoadColumn(11)]   
    public float Windspeed;
    [LoadColumn(12), ColumnName("Label")]   
    public float Count;   // This is the observed count, to be used a "label" to predict
}
// ...
var reader = mlContext.Data.TextReader<DemandObservation>();

I don't want to conflate the 'reading' attributes (like LoadColumn) with the 'schema comprehension' attributes (ColumnName, VectorType etc.): reading attributes are really specific to the reader, while schema attributes are universal.

sfilipi · 2018-11-21T21:59:31Z

@Zruty0 a bit unclear on what is the conclusion here: are we going to use Attributes for position, role and other to extract column related informaiton?

Zruty0 · 2018-11-21T22:32:42Z

If this class is going to be used for schema comprehension (i.e. inside MakePredictionFunction), we will use VectorType, KeyType and ColumnName attributes for schema refinement, as we always do.

If this class is going to be used for specifying the schema of a text file, we will use LoadColumn to specify the index (or indices, for vector columns). We will also respect the above attributes for schema refinement.

eerhardt · 2018-11-30T21:16:06Z

This is a duplicate of #561. Closing as dupe.

Zruty0 mentioned this issue Nov 12, 2018

Confusion between ColumnAttribute and ColumnNameAttribute should be prevented #1603

Closed

Zruty0 mentioned this issue Nov 14, 2018

TextLoader should have only one constructor, that doesn't take Arguments as a parameter #1611

Closed

eerhardt closed this as completed Nov 30, 2018

sfilipi mentioned this issue Dec 14, 2018

Schema based text loader #1878

Merged

eerhardt mentioned this issue Sep 27, 2019

[DatabaseLoader] Error when using attributes (i.e ColumnName) #4195

Closed

ghost locked as resolved and limited conversation to collaborators Mar 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be able to create a reader by just specifying an schema class instead of all the columns #1515

Be able to create a reader by just specifying an schema class instead of all the columns #1515

CESARDELATORRE commented Nov 2, 2018

Zruty0 commented Nov 12, 2018

sfilipi commented Nov 21, 2018

Zruty0 commented Nov 21, 2018

eerhardt commented Nov 30, 2018

Be able to create a reader by just specifying an schema class instead of all the columns #1515

Be able to create a reader by just specifying an schema class instead of all the columns #1515

Comments

CESARDELATORRE commented Nov 2, 2018

Zruty0 commented Nov 12, 2018

sfilipi commented Nov 21, 2018

Zruty0 commented Nov 21, 2018

eerhardt commented Nov 30, 2018