Should TensorFlowModelInfo be public? #2552

yaeldekel · 2019-02-14T19:09:54Z

Related to #2280.
The TensorFlowModelInfo class is still public, and so are two methods for creating a TensorFlowEstimator from a TensorFlowModelInfo. I was wondering whether we should have these, since the only way to get a TensorFlowModelInfo is by calling TensorFlowUtils.LoadTensorFlowModel(), and that uses a file name - and users that have a file name, can instantiate a TensorFlowEstimator from that directly.
There is one scenario that I can think of where a user would benefit from using this API: The TensorFlowModelInfo class has a method GetInputSchema(), that can be used to get the name of the input layer(s), and then the TensorFlowEstimator can be instantiated from the TensorFlowModelInfo instead of instantiating it from the file name and having to load it from disk again. Is this reason enough to keep these APIs?
If it is, then TensorFlowUtils.LoadTensorFlowModel should be moved to TensorFlowCatalog, and perhaps also the GetInputSchema() method of TensorFlowModelInfo.
@TomFinley , @zeahmed , @abgoswam , @ganik - thoughts?

Another unrelated issue: Microsoft.ML.TensorFlow has two AssemblyInfo.cs files, one in src\Microsoft.ML.TensorFlow\ and one in src\Microsoft.ML.TensorFlow\Properties. These should be unified.

abgoswam · 2019-02-14T22:41:23Z

I think you are right in that TensorFlowModelInfo was introduced so users could get the name / shape of the input layer(s), and could then pass the TensorFlowModelInfo object directly to the public API which would skip the model loading step.

In general I noticed people using the TFTransform found the DnnAnalyzer helpful, which suggests we should expose TensorFlowUtils.LoadTensorFlowModel() and the GetInputSchema() in the public API

Regarding whether we should have the 2 separate APIs for the TensorFlowTransform itself (one which accepts the model path and the other accepting TensorFlowModelInfo) I am ambivalent about it.

The usage I have seen from users is

Use the DnnAnalyzer or some other tool to know layer name / shape of input layers
Pass the model path directly to the public API

machinelearning/src/Microsoft.ML.TensorFlow/TensorflowCatalog.cs

Lines 48 to 51 in f8e58dd

    
           public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog catalog, 
        
               string modelLocation, 
        
               string[] outputColumnNames, 
        
               string[] inputColumnNames)

TomFinley · 2019-02-15T17:18:39Z

Hi @abgoswam could you clarify something about your response I find confusing?

During the first part you seemed to be arguing making it part of the public API was a good idea, but then you said:

The usage I have seen from users is

Use the DnnAnalyzer or some other tool to know layer name / shape of input layers

Pass the model path directly to the public API

This suggests exposing the information is not required, since people prefer to get that same information via another channel anyway, rendering another pathway redudant at best. (And it makes sense. If I had a tool I could just invoke to get some information, I wouldn't write a C# program just to perform the exact same task.) But it's possible I misunderstood your point since I am less familiar with this transform and its usage than some others.

abgoswam · 2019-02-15T18:50:31Z

Hi @TomFinley it seems Yael is asking two questions @yaeldekel please do correct me if i am wrong.

The TensorFlowModelInfo class is still public, and so are two methods for creating a TensorFlowEstimator from a TensorFlowModelInfo

My response was trying to address them as follows

Regarding keeping TensorFlowModelInfo class public along with TensorFlowUtils.LoadTensorFlowModel and GetInputSchema

I saw one customer (from MSFT News team) who said the DnnAnalyzer would have been helpful. Currenty do not ship the DnnAnalyzer as part of ML.NET nuget. So they ended up using the TensorFlowUtils.LoadTensorFlowModel and the GetInputSchema APIs to get information about the input layer name (which is needed to create the TensorFlowEstimator) .
Hence my suggestion to have these in our public API. Otherwise we have to point users to external tools to get this information.

Regarding two methods for creating a TensorFlowEstimator

We have two kinds APIs for creating a TensorFlowEstimator -
(a) Using the model path

machinelearning/src/Microsoft.ML.TensorFlow/TensorflowCatalog.cs

Lines 48 to 51 in f8e58dd

    
           public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog catalog, 
        
               string modelLocation, 
        
               string[] outputColumnNames, 
        
               string[] inputColumnNames)

(b) Using the TensorFlowModelInfo object

machinelearning/src/Microsoft.ML.TensorFlow/TensorflowCatalog.cs

Lines 81 to 84 in f8e58dd

    
           public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog catalog, 
        
               TensorFlowModelInfo tensorFlowModel, 
        
               string[] outputColumnNames, 
        
               string[] inputColumnNames)

having (b) in our public API surface is not a requirement for creating a TensorFlowEstimator .
the only benefit of having it in the public API surface is that if users have used 1. above to load a TF model and created a TensorFlowModelInfo object, they can create a TensorFlowEstimator by using (b) without having to load the model again. So this API only serves as an optimization

zeahmed · 2019-02-15T18:58:17Z

I think TensorFlowModelInfo class and GetModelSchema method both should be in the public surface API.

TensorFlowModelInfo

Yes, it is correct that this class was designed to expose the TensorFlowSharp model object so that user does not need to load the gigabyte models twice once for querying schema and once for loading the transform.

GetModelSchema (also GetInputSchema)

When we decided to have schema visualization of TF models in ML.NET, we decided that there should be a standalone tool for doing that as well as an API way to do the same. The API bring in sort of automation in the naming and reshaping of inputs e.g. resizing images to models expected size without hard coding the sizes etc.

TomFinley · 2019-02-15T20:19:23Z

Then is someone going to work to clean up the API? Right now I do not consider it terribly "great." The class itself is not sealed, whoops. It's unclear to me why these Get methods aren't (possibly lazily initialized) properties. I also see lots of things like TensorFlowUtils that are definitely not in acceptable state.

zeahmed · 2019-02-15T23:26:57Z

I can take it if no one started yet?

Ivanidzo4ka mentioned this issue Feb 15, 2019

TensorFlowUtils doesn't really fit with the 1.0 API #2572

Closed

zeahmed self-assigned this Feb 15, 2019

shauheen added this to the 0219 milestone Feb 19, 2019

zeahmed closed this as completed Feb 20, 2019

zeahmed reopened this Feb 20, 2019

zeahmed mentioned this issue Feb 20, 2019

Internalization of TensorFlowUtils.cs and refactored TensorFlowCatalog. #2672

Merged

zeahmed closed this as completed in #2672 Mar 1, 2019

ghost locked as resolved and limited conversation to collaborators Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should TensorFlowModelInfo be public? #2552

Should TensorFlowModelInfo be public? #2552

yaeldekel commented Feb 14, 2019

abgoswam commented Feb 14, 2019 •

edited

Loading

TomFinley commented Feb 15, 2019

abgoswam commented Feb 15, 2019

zeahmed commented Feb 15, 2019 •

edited

Loading

TomFinley commented Feb 15, 2019

zeahmed commented Feb 15, 2019

Should TensorFlowModelInfo be public? #2552

Should TensorFlowModelInfo be public? #2552

Comments

yaeldekel commented Feb 14, 2019

abgoswam commented Feb 14, 2019 • edited Loading

TomFinley commented Feb 15, 2019

abgoswam commented Feb 15, 2019

zeahmed commented Feb 15, 2019 • edited Loading

TensorFlowModelInfo

GetModelSchema (also GetInputSchema)

TomFinley commented Feb 15, 2019

zeahmed commented Feb 15, 2019

abgoswam commented Feb 14, 2019 •

edited

Loading

zeahmed commented Feb 15, 2019 •

edited

Loading