Skip to content

Added convenience constructors for ScoreTransform and TrainAndScoreTransform. #614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 1, 2018

Conversation

zeahmed
Copy link
Contributor

@zeahmed zeahmed commented Jul 31, 2018

This PR fixes #606.

return Create(env, args, args.Trainer.CreateInstance(env), input);
}

private static IDataTransform Create(IHostEnvironment env, Arguments args, ITrainer trainer, IDataView input)
Copy link
Contributor Author

@zeahmed zeahmed Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the common code in private create method. #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, a more common pattern is to use AssertValue in private methods, and CheckValue in public methods.

The purpose of the Check* is twofold: first, we want to enforce the calling contract, and second, we want to actually EXPLICITLY SPECIFY the calling contract.
So that you can look at any argument and immediately, within 3 lines of code, tell whether it can be null or not etc.


In reply to: 206362456 [](ancestors = 206362456)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all the checks in the public methods instead.


In reply to: 206367797 [](ancestors = 206367797,206362456)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Jul 31, 2018

            "Trainer cannot be null. If your model is already trained, please use ScoreTransform instead.");

it should move to upper call. no reason to check trainer here if you already pass it. #Closed


Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:190 in 76bab69. [](commit_id = 76bab69, deletion_comment = False)

/// <param name="featureColumn">Role name for features.</param>
/// <param name="labelColumn">Role name for label.</param>
/// <returns></returns>
public static IDataTransform Create(IHostEnvironment env, IDataView input, ITrainer trainer, string featureColumn = DefaultColumnNames.Features, string labelColumn = DefaultColumnNames.Label)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labelColumn [](start = 160, length = 11)

don't you need custom columns here?
also if you can split this long line into two, it would be nice. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be but I see CustomColumns as advanced argument (not frequently used). Let see if other reviewers have same argument.


In reply to: 206365422 [](ancestors = 206365422)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, it's not only CustomColumns, but also Weight, Group, Name, considering what they all have default values and user don't have to specify them, I don't see why not to add them, but I would also hear other people opinion. Just in case, same applies for Group column and CustomColumns in Scorer


In reply to: 206366117 [](ancestors = 206366117,206365422)

Copy link
Contributor Author

@zeahmed zeahmed Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let see if @TomFinley has any suggestion in this regard? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the question @zeahmed. My tendency would be to view group as something possibly useful. Weight and name, I view as less fundamentally useful. Custom columns, I am somewhat reluctant to support because I have some difficulty imagining how to explain that one to people. :P

However, I cannot quite claim to care that much. not because it isn't an important consideration but because the proposal of #581 renders it moot. We're going to be moving towards a world where, essentially, all trainers become TrainAndScore transforms, with more local decisions about what column roles are and aren't important.


In reply to: 206660387 [](ancestors = 206660387)

@Zruty0
Copy link
Contributor

Zruty0 commented Jul 31, 2018

            "Trainer cannot be null. If your model is already trained, please use ScoreTransform instead.");

This is a public method though, so it can be called directly. Let's keep the check.


In reply to: 409056026 [](ancestors = 409056026)


Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:190 in 76bab69. [](commit_id = 76bab69, deletion_comment = False)

@Zruty0
Copy link
Contributor

Zruty0 commented Jul 31, 2018

            "Trainer cannot be null. If your model is already trained, please use ScoreTransform instead.");

Actually, Ivan's point is that checking the trainer can happen inside the other create, not validation of the input


In reply to: 409059591 [](ancestors = 409059591,409056026)


Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:190 in 76bab69. [](commit_id = 76bab69, deletion_comment = False)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Jul 31, 2018

                "featureColumn", args.FeatureColumn, DefaultColumnNames.Features);

can we use nameof() here and for groupColumn? #Resolved


Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:86 in a3a0302. [](commit_id = a3a0302, deletion_comment = False)


return Create(env, args, trainer, input);
}

public static IDataTransform Create(IHostEnvironment env, Arguments args, IDataView input)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input [](start = 92, length = 5)

can you check input in this function? #Resolved

LabelColumn = labelColumn
};

return Create(env, args, trainer, input);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trainer, input [](start = 37, length = 14)

Can you check trainer and input? #Resolved

@zeahmed
Copy link
Contributor Author

zeahmed commented Jul 31, 2018

                "featureColumn", args.FeatureColumn, DefaultColumnNames.Features);

Dont get you. You mean check here for FeatureColumn and GroupColumn? Both have default values in Argument class so no need to check...Let me know if you mean something else?


In reply to: 409060224 [](ancestors = 409060224)


Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:86 in a3a0302. [](commit_id = a3a0302, deletion_comment = False)

@zeahmed
Copy link
Contributor Author

zeahmed commented Jul 31, 2018

            "Trainer cannot be null. If your model is already trained, please use ScoreTransform instead.");

Moved all the checks to public method instead. Let me know if private one do need checking as well?


In reply to: 409059697 [](ancestors = 409059697,409059591,409056026)


Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:190 in 76bab69. [](commit_id = 76bab69, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

                "featureColumn", args.FeatureColumn, DefaultColumnNames.Features);

var weight = TrainUtils.MatchNameOrDefaultOrNull(ectx, schema, nameof(args.WeightColumn), args.WeightColumn,
DefaultColumnNames.Weight);
see 3rd parameter, it's not a string literal, but actual name of parameter.


In reply to: 409063977 [](ancestors = 409063977,409060224)


Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:86 in a3a0302. [](commit_id = a3a0302, deletion_comment = False)

@zeahmed
Copy link
Contributor Author

zeahmed commented Jul 31, 2018

                "featureColumn", args.FeatureColumn, DefaultColumnNames.Features);

So why the "featureColumn" starts with lower case "f" here? if I use nameof(args.FeatureColumn) then it will be "FeatureColumn". Will it be correct?


In reply to: 409065068 [](ancestors = 409065068,409063977,409060224)


Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:86 in a3a0302. [](commit_id = a3a0302, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

                "featureColumn", args.FeatureColumn, DefaultColumnNames.Features);

Because someone make mistake, or initially we had internal variable featureColumn, but we switch do direct call to argument field. Who knows?
Actually, went to internal repo and use blame.
Tom Capitalize our arguments at some point, but string literals remain same.
By using nameof() you remove this problem of keeping them in sync ;)


In reply to: 409065784 [](ancestors = 409065784,409065068,409063977,409060224)


Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:86 in a3a0302. [](commit_id = a3a0302, deletion_comment = False)

@zeahmed
Copy link
Contributor Author

zeahmed commented Jul 31, 2018

                "featureColumn", args.FeatureColumn, DefaultColumnNames.Features);

hmm...just changed then...:)


In reply to: 409067891 [](ancestors = 409067891,409065784,409065068,409063977,409060224)


Refers to: src/Microsoft.ML.Data/Transforms/TrainAndScoreTransform.cs:86 in a3a0302. [](commit_id = a3a0302, deletion_comment = False)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

{
Contracts.CheckValue(env, nameof(env));
env.CheckValue(input, nameof(input));
env.CheckValue(trainer, nameof(trainer));
Copy link
Contributor

@Zruty0 Zruty0 Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env.CheckValue(trainer, nameof(trainer)); [](start = 12, length = 41)

in the interest of self-documenting functions, add CheckValue (or CheckValueOrNull) on featureColumn and labelColumn #Resolved


private static IDataTransform Create(IHostEnvironment env, Arguments args, ITrainer trainer, IDataView input)
{
Copy link
Contributor

@Zruty0 Zruty0 Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, AssertValue of all the things you CheckValue'd in the public side.

This is again in the interest of self-documenting methods #Resolved

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@zeahmed
Copy link
Contributor Author

zeahmed commented Jul 31, 2018

Just got to know that these transform will go away in new API...:) #Closed

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@shauheen shauheen merged commit 4bd866e into dotnet:master Aug 1, 2018
@zeahmed zeahmed deleted the CC_Train_and_score branch January 30, 2019 21:30
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Part 4] Create convenience constructors for the Score and TrainAndScore transforms.
5 participants