Skip to content

Remove ISchema in TreeEnsembleFeaturizer #2132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 15, 2019

Conversation

wschin
Copy link
Member

@wschin wschin commented Jan 11, 2019

New member of #1501.

@wschin wschin self-assigned this Jan 11, 2019
@wschin wschin requested review from codemzs and yaeldekel January 11, 2019 22:59
@wschin wschin force-pushed the remove-tree-feat-ischema branch from f7a2fbe to e3fad19 Compare January 11, 2019 23:01

// Metadata of tree values.
var treeIdMetadataBuilder = new MetadataBuilder();
ValueGetter<VBuffer<ReadOnlyMemory<char>>> treeIdMetadataGetter = (ref VBuffer<ReadOnlyMemory<char>> value) => owner.GetTreeSlotNames(ref value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ref VBuffer<ReadOnlyMemory> value) => owner.GetTreeSlotNames(ref value); [](start = 82, length = 79)

Hi @wshin, what is the purpose of this wrapping delegate? Why not just assign owner.GetTreeSlotNames directly? It appears, from what I see, to be suitable?

So this:

ValueGetter<VBuffer<ReadOnlyMemory<char>>> treeIdMetadataGetter = (ref VBuffer<ReadOnlyMemory<char>> value) => owner.GetTreeSlotNames(ref value);
treeIdMetadataBuilder.Add(MetadataUtils.Kinds.SlotNames, MetadataUtils.GetNamesType(treeValueType.VectorSize), treeIdMetadataGetter);

becomes something like this.

treeIdMetadataBuilder.Add(MetadataUtils.Kinds.SlotNames, MetadataUtils.GetNamesType(treeValueType.VectorSize), owner.GetTreeSlotNames);

Does anything bad happen? Here and the other three places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks. Btw, without type casting, complier said "Cannot convert method group to Delegate."


In reply to: 247679705 [](ancestors = 247679705)


// leaf IDs must be the second output column.
Contracts.Assert(LeafIdsColumnId == 1);
schemaBuilder.AddColumn(OutputColumnNames.Leaves, leafIdType, leafIdMetadataBuilder.GetMetadata());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that in their present form these asserts are terribly useful, since the purpose of the check was to see that the schema's indices lined up with our expectations, and this check no longer does that. What would be useful perhaps is that once the schema is created at the end of this constructor where we've assigned output schema, we can do things like this:

So three asserts along these lines might be useful:

Contracts.Assert(OutputSchema[OutputColumnNames.Leaves].Index == LeafIdsColumnId)

But the straight transliteration of the old asserts is no longer serving the purpose it once did.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. We have

                // Tree values must be the first output column.
                Contracts.Assert(OutputSchema[OutputColumnNames.Trees].Index == TreeValuesColumnId);
                // leaf IDs must be the second output column.
                Contracts.Assert(OutputSchema[OutputColumnNames.Leaves].Index == LeafIdsColumnId);
                // Path IDs must be the third output column.
                Contracts.Assert(OutputSchema[OutputColumnNames.Paths].Index == PathIdsColumnId);

now. Many thanks!


In reply to: 247681048 [](ancestors = 247681048)

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you @wschin !

@@ -488,7 +422,7 @@ private void GetTreeSlotNames(int col, ref VBuffer<ReadOnlyMemory<char>> dst)
dst = editor.Commit();
}

private void GetLeafSlotNames(int col, ref VBuffer<ReadOnlyMemory<char>> dst)
private void GetLeafSlotNames(ref VBuffer<ReadOnlyMemory<char>> dst)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(off topic for the current PR)

How deep are normal trees? Could we name the slots for paths/leaves as "Tree033Leaf021-MyFeatABC_MyFeatXYZ_MyFeatIOU..." to better help users understand the output of the feature importance? The downsize is the name needs to be unique, and will be long as it notes all the input features the leaf node uses.

The current slot naming is a bit useless (though much better than nothing).
Currently the slot names look like:

LeavesMainLabel.Tree033Leaf021	65.73584
LeavesMainLabel.Tree033Leaf023	-42.43543
LeavesMainLabel.Tree033Leaf019	-40.72021
LeavesMainLabel.Tree057Leaf020	-37.54552
LeavesMainLabel.Tree079Leaf007	-36.29255
LeavesMainLabel.Tree055Leaf019	-34.78884
LeavesMainLabel.Tree075Leaf009	34.58635
LeavesMainLabel.Tree020Leaf020	33.72996
LeavesMainLabel.Tree047Leaf022	31.86535
LeavesMainLabel.Tree074Leaf008	31.86535
LeavesMainLabel.Tree066Leaf008	31.74181
LeavesMainLabel.Tree040Leaf019	30.9242

Copy link
Contributor

@justinormont justinormont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the new unit test!

@wschin wschin merged commit c0af761 into dotnet:master Jan 15, 2019
@wschin wschin deleted the remove-tree-feat-ischema branch January 15, 2019 22:49
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants