-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Added convenience constructors for ScoreTransform and TrainAndScoreTransform. #614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
return Create(env, args, args.Trainer.CreateInstance(env), input); | ||
} | ||
|
||
private static IDataTransform Create(IHostEnvironment env, Arguments args, ITrainer trainer, IDataView input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the common code in private create method. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, a more common pattern is to use AssertValue in private methods, and CheckValue in public methods.
The purpose of the Check* is twofold: first, we want to enforce the calling contract, and second, we want to actually EXPLICITLY SPECIFY the calling contract.
So that you can look at any argument and immediately, within 3 lines of code, tell whether it can be null or not etc.
In reply to: 206362456 [](ancestors = 206362456)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved all the checks in the public methods instead.
In reply to: 206367797 [](ancestors = 206367797,206362456)
it should move to upper call. no reason to check trainer here if you already pass it. #Closed Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:190 in 76bab69. [](commit_id = 76bab69, deletion_comment = False) |
/// <param name="featureColumn">Role name for features.</param> | ||
/// <param name="labelColumn">Role name for label.</param> | ||
/// <returns></returns> | ||
public static IDataTransform Create(IHostEnvironment env, IDataView input, ITrainer trainer, string featureColumn = DefaultColumnNames.Features, string labelColumn = DefaultColumnNames.Label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
labelColumn [](start = 160, length = 11)
don't you need custom columns here?
also if you can split this long line into two, it would be nice. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be but I see CustomColumns as advanced argument (not frequently used). Let see if other reviewers have same argument.
In reply to: 206365422 [](ancestors = 206365422)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, it's not only CustomColumns, but also Weight, Group, Name, considering what they all have default values and user don't have to specify them, I don't see why not to add them, but I would also hear other people opinion. Just in case, same applies for Group column and CustomColumns in Scorer
In reply to: 206366117 [](ancestors = 206366117,206365422)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let see if @TomFinley has any suggestion in this regard? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the question @zeahmed. My tendency would be to view group as something possibly useful. Weight and name, I view as less fundamentally useful. Custom columns, I am somewhat reluctant to support because I have some difficulty imagining how to explain that one to people. :P
However, I cannot quite claim to care that much. not because it isn't an important consideration but because the proposal of #581 renders it moot. We're going to be moving towards a world where, essentially, all trainers become TrainAndScore transforms, with more local decisions about what column roles are and aren't important.
In reply to: 206660387 [](ancestors = 206660387)
This is a public method though, so it can be called directly. Let's keep the check. In reply to: 409056026 [](ancestors = 409056026) Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:190 in 76bab69. [](commit_id = 76bab69, deletion_comment = False) |
Actually, Ivan's point is that checking the trainer can happen inside the other create, not validation of the input In reply to: 409059591 [](ancestors = 409059591,409056026) Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:190 in 76bab69. [](commit_id = 76bab69, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
|
||
return Create(env, args, trainer, input); | ||
} | ||
|
||
public static IDataTransform Create(IHostEnvironment env, Arguments args, IDataView input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input [](start = 92, length = 5)
can you check input in this function? #Resolved
LabelColumn = labelColumn | ||
}; | ||
|
||
return Create(env, args, trainer, input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trainer, input [](start = 37, length = 14)
Can you check trainer and input? #Resolved
Dont get you. You mean check here for FeatureColumn and GroupColumn? Both have default values in Argument class so no need to check...Let me know if you mean something else? In reply to: 409060224 [](ancestors = 409060224) Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:86 in a3a0302. [](commit_id = a3a0302, deletion_comment = False) |
Moved all the checks to public method instead. Let me know if private one do need checking as well? In reply to: 409059697 [](ancestors = 409059697,409059591,409056026) Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:190 in 76bab69. [](commit_id = 76bab69, deletion_comment = False) |
var weight = TrainUtils.MatchNameOrDefaultOrNull(ectx, schema, nameof(args.WeightColumn), args.WeightColumn, In reply to: 409063977 [](ancestors = 409063977,409060224) Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:86 in a3a0302. [](commit_id = a3a0302, deletion_comment = False) |
So why the "featureColumn" starts with lower case "f" here? if I use nameof(args.FeatureColumn) then it will be "FeatureColumn". Will it be correct? In reply to: 409065068 [](ancestors = 409065068,409063977,409060224) Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:86 in a3a0302. [](commit_id = a3a0302, deletion_comment = False) |
Because someone make mistake, or initially we had internal variable featureColumn, but we switch do direct call to argument field. Who knows? In reply to: 409065784 [](ancestors = 409065784,409065068,409063977,409060224) Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:86 in a3a0302. [](commit_id = a3a0302, deletion_comment = False) |
hmm...just changed then...:) In reply to: 409067891 [](ancestors = 409067891,409065784,409065068,409063977,409060224) Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:86 in a3a0302. [](commit_id = a3a0302, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
Contracts.CheckValue(env, nameof(env)); | ||
env.CheckValue(input, nameof(input)); | ||
env.CheckValue(trainer, nameof(trainer)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
env.CheckValue(trainer, nameof(trainer)); [](start = 12, length = 41)
in the interest of self-documenting functions, add CheckValue (or CheckValueOrNull) on featureColumn and labelColumn #Resolved
|
||
private static IDataTransform Create(IHostEnvironment env, Arguments args, ITrainer trainer, IDataView input) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, AssertValue of all the things you CheckValue'd in the public side.
This is again in the interest of self-documenting methods #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just got to know that these transform will go away in new API...:) #Closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR fixes #606.