-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[DatabaseLoader] Error when using attributes (i.e ColumnName) #4195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Agree. This needs to be fixed. https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.data.columnnameattribute?view=ml-dotnet In the current public preview works kind of the opposite way. Such as here:
If using the ColumnName attribute, what has to match to the database column name is [ColumnName("fare_amount")], then it you want you can change the C# property to a different name it you'd like. That approach is not crazy and sounds logical, too. In fact, you can also do that when using a file because "the source of truth is the index of the LoadColumnAttribute". But when using a database, we need to choose which one has to match the database column names, and in order to be consistent to how the ColumnNameAttribute works when using files I think it should be:
|
I think it is more complicated than described above. From what I can tell, in the file loader you have In this case, the same logic is being applied between both and I believe you would be expected to set both |
@CESARDELATORRE, could you comment on the above? I want to make sure we're on the same page |
The thing is that Then, in addition to that we want the user to be able to change the internal column's name in the schema. @eerhardt What are your thoughts about this topic? |
I think that argues for having a |
That could be interesting and more flexible, for sure. But the current implementation of LoadColumnAttribute only allows an index..: https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.data.loadcolumnattribute?view=ml-dotnet In any case, that could be added in the future since if the user wouldn't provide the LoadColumnAttribute then, by default, the property/field name in the class is the one to match against the database column name. Right? |
Right, which ends up matching I think. Basically right now: No Attributes SpecifiedProperty/Field name is used for loading the data and in the data view. ColumnName SpecifiedProperty/Field name is used for loading the data and ColumnName attribute for the DataView name So far, this matches the TextLoader behavior (as far as I can tell). There is additionally LoadColumn SpecifiedLoadColumn attribute is used for loading the data and Property/Field name is used in the DataView LoadColumn and ColumnName specifiedLoadColumn attribute is used for loading the data and ColumnName attribute is used in the DataView. Property/Field name doesn't matter This (also as far as I can tell) also matches up with the TextLoader behavior. So extending LoadColumn to support using a string, rather than just an index, would keep the consistency and provide something that "works better" for the database loader. |
The
I don't think this can always be the case. For example, a SQL Table column name can have an select Id, hi@ as hi#, [my column] from [dbo].[Table] that is totally valid. You can't always have a C# property/field name match a name coming from a database.
I agree that we will need some way to map from the DbDataReader's name of the column, to the C# property, to the However, I don't think we should overload the |
Thanks @eerhardt. I think that makes sense. The remaining question on my end (before making changes and submitting a PR) is:
|
This is the only way to support reading multiple columns into a single vector, right? And if a user does wish to indicate columns by index, this would be the way they do it. So I'm thinking we still need to support it. |
Agree, we still need it especially because to support reading multiple columns into a single vector. |
Should we not be supporting doing this via the names? I would think either the index approach is sufficient (as it is for TextLoader, in which case the new attribute is just "nice to have") or it isn't and we should support using the names for all these things. |
I guess I'm also not seeing why the |
We could, but putting hundreds of names in the attribute seems less ideal than saying
I agree. My thinking is that the index approach is sufficient and mapping by names boarders on "nice to have". But it sounds like @CESARDELATORRE thinks it is more than just "nice to have".
I've asked for this in the past, and it was never implemented. I found some context here - #1515 (comment). You can probably find more by looking through the issues and PRs - #561 and #1878. |
System information
Issue
Tried to use
ColumnName
attribute in class that defines IDataView schema.When operating on the IDataView, I received the following error
No error.
Source code / logs
Given the following data stored in a SQL Server DB
The data schema is defined as such
The following code works:
However, when attributes are added to the schema class, it produces an error.
The text was updated successfully, but these errors were encountered: