-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Remove ISchema in TreeEnsembleFeaturizer #2132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f7a2fbe
to
e3fad19
Compare
|
||
// Metadata of tree values. | ||
var treeIdMetadataBuilder = new MetadataBuilder(); | ||
ValueGetter<VBuffer<ReadOnlyMemory<char>>> treeIdMetadataGetter = (ref VBuffer<ReadOnlyMemory<char>> value) => owner.GetTreeSlotNames(ref value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(ref VBuffer<ReadOnlyMemory> value) => owner.GetTreeSlotNames(ref value); [](start = 82, length = 79)
Hi @wshin, what is the purpose of this wrapping delegate? Why not just assign owner.GetTreeSlotNames
directly? It appears, from what I see, to be suitable?
So this:
ValueGetter<VBuffer<ReadOnlyMemory<char>>> treeIdMetadataGetter = (ref VBuffer<ReadOnlyMemory<char>> value) => owner.GetTreeSlotNames(ref value);
treeIdMetadataBuilder.Add(MetadataUtils.Kinds.SlotNames, MetadataUtils.GetNamesType(treeValueType.VectorSize), treeIdMetadataGetter);
becomes something like this.
treeIdMetadataBuilder.Add(MetadataUtils.Kinds.SlotNames, MetadataUtils.GetNamesType(treeValueType.VectorSize), owner.GetTreeSlotNames);
Does anything bad happen? Here and the other three places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks. Btw, without type casting, complier said "Cannot convert method group to Delegate."
In reply to: 247679705 [](ancestors = 247679705)
|
||
// leaf IDs must be the second output column. | ||
Contracts.Assert(LeafIdsColumnId == 1); | ||
schemaBuilder.AddColumn(OutputColumnNames.Leaves, leafIdType, leafIdMetadataBuilder.GetMetadata()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that in their present form these asserts are terribly useful, since the purpose of the check was to see that the schema's indices lined up with our expectations, and this check no longer does that. What would be useful perhaps is that once the schema is created at the end of this constructor where we've assigned output schema, we can do things like this:
So three asserts along these lines might be useful:
Contracts.Assert(OutputSchema[OutputColumnNames.Leaves].Index == LeafIdsColumnId)
But the straight transliteration of the old asserts is no longer serving the purpose it once did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. We have
// Tree values must be the first output column.
Contracts.Assert(OutputSchema[OutputColumnNames.Trees].Index == TreeValuesColumnId);
// leaf IDs must be the second output column.
Contracts.Assert(OutputSchema[OutputColumnNames.Leaves].Index == LeafIdsColumnId);
// Path IDs must be the third output column.
Contracts.Assert(OutputSchema[OutputColumnNames.Paths].Index == PathIdsColumnId);
now. Many thanks!
In reply to: 247681048 [](ancestors = 247681048)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you @wschin !
@@ -488,7 +422,7 @@ private void GetTreeSlotNames(int col, ref VBuffer<ReadOnlyMemory<char>> dst) | |||
dst = editor.Commit(); | |||
} | |||
|
|||
private void GetLeafSlotNames(int col, ref VBuffer<ReadOnlyMemory<char>> dst) | |||
private void GetLeafSlotNames(ref VBuffer<ReadOnlyMemory<char>> dst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(off topic for the current PR)
How deep are normal trees? Could we name the slots for paths/leaves as "Tree033Leaf021-MyFeatABC_MyFeatXYZ_MyFeatIOU..." to better help users understand the output of the feature importance? The downsize is the name needs to be unique, and will be long as it notes all the input features the leaf node uses.
The current slot naming is a bit useless (though much better than nothing).
Currently the slot names look like:
LeavesMainLabel.Tree033Leaf021 65.73584
LeavesMainLabel.Tree033Leaf023 -42.43543
LeavesMainLabel.Tree033Leaf019 -40.72021
LeavesMainLabel.Tree057Leaf020 -37.54552
LeavesMainLabel.Tree079Leaf007 -36.29255
LeavesMainLabel.Tree055Leaf019 -34.78884
LeavesMainLabel.Tree075Leaf009 34.58635
LeavesMainLabel.Tree020Leaf020 33.72996
LeavesMainLabel.Tree047Leaf022 31.86535
LeavesMainLabel.Tree074Leaf008 31.86535
LeavesMainLabel.Tree066Leaf008 31.74181
LeavesMainLabel.Tree040Leaf019 30.9242
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the new unit test!
New member of #1501.