Skip to content

Handle inputs with unknown shapes in TensorFlow #857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 11, 2018

Conversation

yaeldekel
Copy link

This PR adds support for unknown shapes in the inputs and in the outputs of TensorFlow transform.
Closes #848 .

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Sep 8, 2018

            _host.CheckNonEmpty(output, nameof(outputs));

probably should be same as in input, can you change it to CheckNonWhiteSpace ? #Resolved


Refers to: src/Microsoft.ML.TensorFlow/TensorflowTransform.cs:210 in 34f91f8. [](commit_id = 34f91f8, deletion_comment = False)

@@ -191,6 +191,8 @@ private TensorFlowTransform(IHostEnvironment env, byte[] modelBytes, string[] in
Contracts.CheckValue(env, nameof(env));
_host = env.Register(nameof(RegistrationName));
_host.CheckValue(modelBytes, nameof(modelBytes));
_host.CheckNonEmpty(inputs, nameof(inputs));
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputs [](start = 47, length = 6)

outputs as well? #Resolved

resultDic[Transformer.Outputs[i]] = new SchemaShape.Column(Transformer.Outputs[i], SchemaShape.Column.VectorKind.Vector, Transformer.OutputTypes[i].ItemType, false);
{
resultDic[Transformer.Outputs[i]] = new SchemaShape.Column(Transformer.Outputs[i],
Transformer.OutputTypes[i].VectorSize > 0 ? SchemaShape.Column.VectorKind.Vector
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.VectorSize > 0 [](start = 46, length = 15)

can we use IsKnownSizeVector? #Resolved

@@ -311,26 +319,56 @@ public Mapper(IHostEnvironment env, TensorFlowTransform parent, ISchema inputSch
_schema = inputSchema;
_inputColIndices = new int[_parent.Inputs.Length];
_isInputVector = new bool[_parent.Inputs.Length];
_fullySpecifiedShapes = new TFShape[_parent.Inputs.Length];
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_fullySpecifiedShapes [](start = 16, length = 21)

I feel like this one should be part of Transformer, rather than mapper.
You do estimator.Fit(somedata) and gain transformer, and it resolves it's variable lengths[?,?,3] as [3,3,3].
Not sure it would be right to accept data rather than [3,3,3] to transformer after that.

(Same probably states for _isInputVector, not sure why I didn't put it to Transformer)
@Zruty0 to make sure I correctly understand estimator/transformer business.
#Closed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that when you instantiate the estimator you only have the TF model and not the IDV, so we don't necessarily know all the dimensions in the shape. However, in order to convert a VBuffer to a Tensor we need to know the fully specified shape.

Instead of having this field, I could instantiate a new shape object on every getter call, using _inputColIndices and _schema to figure out the input size. Do you think this is a good solution?


In reply to: 216115594 [](ancestors = 216115594)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just have incorrect understanding of estimator/transformer, and we don't have to have same schema for input fitting and input transforming.


In reply to: 216382814 [](ancestors = 216382814,216115594)

@@ -233,7 +240,7 @@ private TensorFlowTransform(IHostEnvironment env, byte[] modelBytes, string[] in
{
var tfOutput = new TFOutput(Graph[Outputs[i]]);
var shape = Graph.GetTensorShape(tfOutput);
int[] dims = shape.ToIntArray().Skip(shape[0] == -1 ? BatchSize : 0).ToArray();
int[] dims = shape.NumDimensions > 0 ? shape.ToIntArray().Skip(shape[0] == -1 ? BatchSize : 0).ToArray() : new[] { 0 };
Copy link
Contributor

@zeahmed zeahmed Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 [](start = 131, length = 1)

Does this zero mean variable length? #Closed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.


In reply to: 216404737 [](ancestors = 216404737)

Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

getNum(ref buffer);
getClasses(ref buffer);
}
}
Copy link
Member

@abgoswam abgoswam Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to validate anything here ? #Closed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The images I am using here don't have any detections, so all of the outputs will be all 0. Once I get the models uploaded, I will also upload some images that have detections, then I can add validation of the outputs.


In reply to: 216457130 [](ancestors = 216457130)

{
ModelFile = model_location,
OutputColumns = new[] { "Softmax", "dense/Relu" },
InputColumns = new[] { "Placeholder", "reshape_input" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reshape_input [](start = 55, length = 13)

this is specified as an input, but I do not see it when passing in the data. Is this required ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required, this column is created in line 273 by the CopyColumns transform.
This input is required for computing the "dense/Relu" output. Actually, I am not sure why it is needed, since if I understand correctly, reshape_input is computed by the input layer Placeholder by simply reshaping from 28x28 to 784. @zeahmed , do you know why "Placeholder" is not enough to compute "dense/Relu"?

Copy link
Contributor

@zeahmed zeahmed Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the problem with the model. https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py

If you want to access the features (named dense/Relu) only then reshape_input is required if you want to access the Softmax then Placeholder is required. This is the problem with model. If you closely look at the model graph attached. You would observe two graph that are working in parallel. Having said that I think its a good model to test two inputs and two outputs.

If you think if its going to make an issue I can update the model.

frozen_saved_model

In reply to: 216485826 [](ancestors = 216485826)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, i think we should rename the nodes properly to avoid strange names like dense/Relu etc....:)


In reply to: 216502475 [](ancestors = 216502475,216485826)

for (int j = 0; j < TFInputShapes[i].NumDimensions; j++)
newShape[j] = TFInputShapes[i][j] == -1 ? BatchSize : TFInputShapes[i][j];
TFInputShapes[i] = new TFShape(newShape);
if (TFInputShapes[i].NumDimensions != -1)
Copy link
Member

@abgoswam abgoswam Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (TFInputShapes[i].NumDimensions != -1) [](start = 16, length = 41)

am curious - why did we need this check ?

Did some of the pre-trained models have TFInputShapes[i].NumDimensions == -1 (that we were not handling before)
#Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the shape is completely unknown, then its NumDimensions property is -1.


In reply to: 216460739 [](ancestors = 216460739)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. in another comment i was asking if TF has this documented somewhere -- or we are using this as a heuristic based on models we have played with so far ?


In reply to: 216475056 [](ancestors = 216475056,216460739)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen it documented, I just saw it by debugging different models.


In reply to: 216476239 [](ancestors = 216476239,216475056,216460739)

if (TFInputShapes[i].NumDimensions != -1)
{
var newShape = new long[TFInputShapes[i].NumDimensions];
newShape[0] = TFInputShapes[i][0] == -1 ? BatchSize : TFInputShapes[i][0];
Copy link
Member

@abgoswam abgoswam Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newShape[0] = TFInputShapes[i][0] == -1 ? BatchSize : TFInputShapes[i][0]; [](start = 20, length = 74)

so we will have special handling only for the 1st dimension, and not for the other dimensions -- is that the intent ?

(looks like we should have been doing this previously too, instead of doing special handling for all the columns) #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is because when the first dimension is -1 it indicates that the first dimension is the batch size. For any other dimension, it just means the dimension can be anything. For example, if the dimension is [?,?,?,3], then the first ? is for the batch size, and the other two are for the width and height of the image. In this case we don't want to change this to 1, we want to keep it as -1, so that we still need to fill in this value when we see the actual example and know its size.


In reply to: 216461730 [](ancestors = 216461730)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap. good catch!


In reply to: 216477499 [](ancestors = 216477499,216461730)

@@ -233,7 +241,7 @@ private TensorFlowTransform(IHostEnvironment env, byte[] modelBytes, string[] in
{
var tfOutput = new TFOutput(Graph[Outputs[i]]);
var shape = Graph.GetTensorShape(tfOutput);
int[] dims = shape.ToIntArray().Skip(shape[0] == -1 ? BatchSize : 0).ToArray();
int[] dims = shape.NumDimensions > 0 ? shape.ToIntArray().Skip(shape[0] == -1 ? BatchSize : 0).ToArray() : new[] { 0 };
Copy link
Member

@abgoswam abgoswam Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shape.NumDimensions > 0 [](start = 29, length = 23)

I am presuming we are using this here as a check for models producing variable length outputs...Am i right ?

Does TF document such behaviour somewhere ? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the shape is unknown, it has shape.NumDimensions == -1, and shape.ToIntArray() == null, which would cause a null reference exception when we try to access shape[0].


In reply to: 216462762 [](ancestors = 216462762)

@@ -37,13 +37,14 @@ namespace Microsoft.ML.Transforms.TensorFlow
/// </summary>
/// <typeparam name="T[]">.NET type of tensor to create</typeparam>
/// <param name="data">value of tensor</param>
/// <param name="count">The number of elements in the tensor</param>
Copy link
Member

@abgoswam abgoswam Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we have to re-generate the TensorGeneric.cs file after making these changes ? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Apparently TensorGeneric.tt regenerates TensorGeneric.cs automatically whenever it is saved.


In reply to: 216466572 [](ancestors = 216466572)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome. thanks for the info.


In reply to: 216479096 [](ancestors = 216479096,216466572)

for (int i = 0; i < _parent.Outputs.Length; i++)
{
if (activeOutput(i))
{
var type = TFTensor.TypeFromTensorType(_parent.TFOutputTypes[i]);
_host.Assert(type == _parent.OutputTypes[i].ItemType.RawType);
var srcTensorGetters = GetTensorValueGetters(input);
valueGetters.Add(Utils.MarshalInvoke(MakeGetter<int>, type, input, i, srcTensorGetters, activeOutputColNames, outputCache));
valueGetters[i] = Utils.MarshalInvoke(MakeGetter<int>, type, input, i, srcTensorGetters, activeOutputColNames, outputCache);
Copy link
Member

@abgoswam abgoswam Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valueGetters[i] [](start = 24, length = 15)

am curious about this bug in the getter -- could you kindly elaborate a bit on this ? .. is this some artifact of the activateOutput() call above.. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug surfaced when I added the new unit test. It happens in the following situation:

  • Create a TensorFlowTransform that computes two outputs, say A and B.
  • Create another transform/learner that only uses B as input.
  • When we try to cursor over the data, the activeOutput predicate will say that output 0 is not active and output 1 is active, thus returning an array of length 1.
  • When we try to get the getter of column B, we do so using its index, which is 1 (the index of column A is 0 and the index of column B is 1). So we try to access the getters array which is of length 1, at index 1 which is out of bounds...

The fix was to always create an array with length equal to the number of output columns in the transform, but populate just the indices where the active columns are.


In reply to: 216469314 [](ancestors = 216469314)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

…o we don't need a separate nuget for it to work.
Copy link
Member

@abgoswam abgoswam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

{
return SetupTensor(dt, dims, data, start: 0, count: data.Length, size: size);
return SetupTensor(dt, dims, data, start: 0, count: count, size: size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start: 0, count: count, size: size [](start = 47, length = 34)

nit: do you need to specify param names here? don't you invoke function with all params already

@yaeldekel yaeldekel merged commit 5666dd1 into dotnet:master Sep 11, 2018
@yaeldekel yaeldekel deleted the unknownshape branch September 11, 2018 17:03
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants