Remove ISchema in TreeEnsembleFeaturizer #2132

wschin · 2019-01-11T22:59:12Z

New member of #1501.

TomFinley · 2019-01-14T22:13:51Z

src/Microsoft.ML.FastTree/TreeEnsembleFeaturizer.cs

+
+                // Metadata of tree values.
+                var treeIdMetadataBuilder = new MetadataBuilder();
+                ValueGetter<VBuffer<ReadOnlyMemory<char>>> treeIdMetadataGetter = (ref VBuffer<ReadOnlyMemory<char>> value) => owner.GetTreeSlotNames(ref value);


(ref VBuffer<ReadOnlyMemory> value) => owner.GetTreeSlotNames(ref value); [](start = 82, length = 79)

Hi @wshin, what is the purpose of this wrapping delegate? Why not just assign owner.GetTreeSlotNames directly? It appears, from what I see, to be suitable?

So this:

ValueGetter<VBuffer<ReadOnlyMemory<char>>> treeIdMetadataGetter = (ref VBuffer<ReadOnlyMemory<char>> value) => owner.GetTreeSlotNames(ref value); treeIdMetadataBuilder.Add(MetadataUtils.Kinds.SlotNames, MetadataUtils.GetNamesType(treeValueType.VectorSize), treeIdMetadataGetter);

becomes something like this.

treeIdMetadataBuilder.Add(MetadataUtils.Kinds.SlotNames, MetadataUtils.GetNamesType(treeValueType.VectorSize), owner.GetTreeSlotNames);

Does anything bad happen? Here and the other three places.

Fixed. Thanks. Btw, without type casting, complier said "Cannot convert method group to Delegate."

In reply to: 247679705 [](ancestors = 247679705)

TomFinley · 2019-01-14T22:18:24Z

src/Microsoft.ML.FastTree/TreeEnsembleFeaturizer.cs

+
+                // leaf IDs must be the second output column.
+                Contracts.Assert(LeafIdsColumnId == 1);
+                schemaBuilder.AddColumn(OutputColumnNames.Leaves, leafIdType, leafIdMetadataBuilder.GetMetadata());


I don't see that in their present form these asserts are terribly useful, since the purpose of the check was to see that the schema's indices lined up with our expectations, and this check no longer does that. What would be useful perhaps is that once the schema is created at the end of this constructor where we've assigned output schema, we can do things like this:

So three asserts along these lines might be useful:

Contracts.Assert(OutputSchema[OutputColumnNames.Leaves].Index == LeafIdsColumnId)

But the straight transliteration of the old asserts is no longer serving the purpose it once did.

I see. We have

// Tree values must be the first output column. Contracts.Assert(OutputSchema[OutputColumnNames.Trees].Index == TreeValuesColumnId); // leaf IDs must be the second output column. Contracts.Assert(OutputSchema[OutputColumnNames.Leaves].Index == LeafIdsColumnId); // Path IDs must be the third output column. Contracts.Assert(OutputSchema[OutputColumnNames.Paths].Index == PathIdsColumnId);

now. Many thanks!

In reply to: 247681048 [](ancestors = 247681048)

src/Microsoft.ML.FastTree/TreeEnsembleFeaturizer.cs

TomFinley

Great, thank you @wschin !

…earning into remove-tree-feat-ischema

justinormont · 2019-01-15T22:44:10Z

src/Microsoft.ML.FastTree/TreeEnsembleFeaturizer.cs

@@ -488,7 +422,7 @@ private void GetTreeSlotNames(int col, ref VBuffer<ReadOnlyMemory<char>> dst)
            dst = editor.Commit();
        }

-        private void GetLeafSlotNames(int col, ref VBuffer<ReadOnlyMemory<char>> dst)
+        private void GetLeafSlotNames(ref VBuffer<ReadOnlyMemory<char>> dst)


(off topic for the current PR)

How deep are normal trees? Could we name the slots for paths/leaves as "Tree033Leaf021-MyFeatABC_MyFeatXYZ_MyFeatIOU..." to better help users understand the output of the feature importance? The downsize is the name needs to be unique, and will be long as it notes all the input features the leaf node uses.

The current slot naming is a bit useless (though much better than nothing).
Currently the slot names look like:

LeavesMainLabel.Tree033Leaf021 65.73584 LeavesMainLabel.Tree033Leaf023 -42.43543 LeavesMainLabel.Tree033Leaf019 -40.72021 LeavesMainLabel.Tree057Leaf020 -37.54552 LeavesMainLabel.Tree079Leaf007 -36.29255 LeavesMainLabel.Tree055Leaf019 -34.78884 LeavesMainLabel.Tree075Leaf009 34.58635 LeavesMainLabel.Tree020Leaf020 33.72996 LeavesMainLabel.Tree047Leaf022 31.86535 LeavesMainLabel.Tree074Leaf008 31.86535 LeavesMainLabel.Tree066Leaf008 31.74181 LeavesMainLabel.Tree040Leaf019 30.9242

justinormont

LGTM. Thanks for the new unit test!

wschin self-assigned this Jan 11, 2019

wschin requested review from codemzs and yaeldekel January 11, 2019 22:59

Remove another ISchema

e3fad19

wschin force-pushed the remove-tree-feat-ischema branch from f7a2fbe to e3fad19 Compare January 11, 2019 23:01

TomFinley reviewed Jan 14, 2019

View reviewed changes

Address comments

d2a62d7

justinormont reviewed Jan 15, 2019

View reviewed changes

src/Microsoft.ML.FastTree/TreeEnsembleFeaturizer.cs Show resolved Hide resolved

Address a comment

10c49d7

TomFinley approved these changes Jan 15, 2019

View reviewed changes

wschin added 5 commits January 15, 2019 11:24

Add a test to tree ferturization's output schema

b6fdbd5

Merge branch 'remove-tree-feat-ischema' of github.com:wschin/machinel…

1dee5e2

…earning into remove-tree-feat-ischema

Add a test to check output schema

f5093e3

Format

f6c91c1

Fix name

49c59ba

justinormont reviewed Jan 15, 2019

View reviewed changes

justinormont approved these changes Jan 15, 2019

View reviewed changes

wschin merged commit c0af761 into dotnet:master Jan 15, 2019

wschin deleted the remove-tree-feat-ischema branch January 15, 2019 22:49

ghost locked as resolved and limited conversation to collaborators Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove ISchema in TreeEnsembleFeaturizer #2132

Remove ISchema in TreeEnsembleFeaturizer #2132

Uh oh!

wschin commented Jan 11, 2019

Uh oh!

TomFinley Jan 14, 2019

Uh oh!

wschin Jan 15, 2019

Uh oh!

TomFinley Jan 14, 2019

Uh oh!

wschin Jan 15, 2019

Uh oh!

Uh oh!

TomFinley left a comment

Uh oh!

justinormont Jan 15, 2019

Uh oh!

justinormont left a comment

Uh oh!

Uh oh!

Remove ISchema in TreeEnsembleFeaturizer #2132

Remove ISchema in TreeEnsembleFeaturizer #2132

Uh oh!

Conversation

wschin commented Jan 11, 2019

Uh oh!

TomFinley Jan 14, 2019

Choose a reason for hiding this comment

Uh oh!

wschin Jan 15, 2019

Choose a reason for hiding this comment

Uh oh!

TomFinley Jan 14, 2019

Choose a reason for hiding this comment

Uh oh!

wschin Jan 15, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TomFinley left a comment

Choose a reason for hiding this comment

Uh oh!

justinormont Jan 15, 2019

Choose a reason for hiding this comment

Uh oh!

justinormont left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!