Skip to content

Any reason for inconsistency in params for TextLoader? #2472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
srsaggam opened this issue Feb 7, 2019 · 5 comments · Fixed by #2710
Closed

Any reason for inconsistency in params for TextLoader? #2472

srsaggam opened this issue Feb 7, 2019 · 5 comments · Fixed by #2710
Assignees
Labels
API Issues pertaining the friendly API
Milestone

Comments

@srsaggam
Copy link
Member

srsaggam commented Feb 7, 2019

  1. This Method :

    public static TextLoader CreateTextLoader<TInput>(this DataOperationsCatalog catalog,

  2. and this one :

    public static TextLoader CreateTextLoader(this DataOperationsCatalog catalog,

Looks like the first one has params like :

char separatorChar = '\t', bool allowQuotedStrings = true, bool supportSparse = true, bool trimWhitespace = false

Where as the second one doesn't have those. In order to use those params there is another method overload :

CreateTextLoader(this DataOperationsCatalog catalog, TextLoader.Arguments args, IMultiStreamSource dataSample = null);

This takes Arguments class and particularly one param is interesting in this Arguments:

char [] Separators

Any reason this is char[] and not char like in other overloads? and why inconsistency in params for the methods 1) and 2).

@srsaggam
Copy link
Member Author

srsaggam commented Feb 7, 2019

@daholste @justinormont

@TomFinley
Copy link
Contributor

@rogancarr and @sfilipi, should be made consistent.

@abgoswam
Copy link
Member

abgoswam commented Feb 23, 2019

@TomFinley . Could you kindly clarify what you mean by should be made consistent ?

To me it seems reasonable that the "simple" APIs (1 and 2 above) take a char to indicate a single separator character.

If a user wants to specify multiple separator characters they can use the "advanced" API which takes in the Options

public static TextLoader CreateTextLoader(this DataOperationsCatalog catalog,
TextLoader.Options options,
IMultiStreamSource dataSample = null)

cc @wschin @artidoro

@TomFinley
Copy link
Contributor

Yes @abgoswam I agree, the convenience methods should continue to take char. The advanced options should still take char[] I think, since it is sometimes important.

Of particular interest to me was the lack of these:

bool allowQuotedStrings = TextLoader.DefaultArguments.AllowQuoting,
bool supportSparse = TextLoader.DefaultArguments.AllowSparse,
bool trimWhitespace = TextLoader.DefaultArguments.TrimWhitespace)

It is not clear to me that the overloads should differ in this respect. Also why does one have the possibility of handling a file handle (which enables it to read feature names into the schema) and the other does not?

So these two overloads are inconsistent leading to loss of capability in one or the other, and I think this should be made consistent.

@rogancarr
Copy link
Contributor

Also, see #2512: We only support a limited set of characters for saving a text file, so perhaps we should move away from an explicit char and use an enum of allowed characters instead.

@shauheen shauheen added this to the 0219 milestone Feb 27, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants