Skip to content

Internalization of TensorFlowUtils.cs and refactored TensorFlowCatalog. #2672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 1, 2019
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ public static void Example()
// Load the TensorFlow model once.
// - Use it for quering the schema for input and output in the model
// - Use it for prediction in the pipeline.
var modelInfo = TensorFlowUtils.LoadTensorFlowModel(mlContext, modelLocation);
var modelInfo = mlContext.Model.LoadTensorFlowModel(modelLocation);
var schema = modelInfo.GetModelSchema();
var featuresType = (VectorType)schema["Features"].Type;
Copy link

@yaeldekel yaeldekel Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features [](start = 51, length = 8)

Can we add a sample that uses modelInfo.GetInputSchema() to find out what the name of the input node is?

#Resolved

Copy link
Contributor Author

@zeahmed zeahmed Feb 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see its being used at a couple of places in the tests e.g.

#Resolved

Console.WriteLine("Name: {0}, Type: {1}, Shape: (-1, {2})", "Features", featuresType.ItemType.RawType, featuresType.Dimensions[0]);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@
// See the LICENSE file in the project root for more information.

using System;
using Microsoft.ML.Transforms.TensorFlow;
using System.Collections.Generic;
using System.Linq;
using Microsoft.Data.DataView;
using Microsoft.ML.Data;

namespace Microsoft.ML.DnnAnalyzer
{
Expand All @@ -17,11 +20,40 @@ public static void Main(string[] args)
return;
}

foreach (var (name, opType, type, inputs) in TensorFlowUtils.GetModelNodes(new MLContext(), args[0]))
foreach (var (name, opType, type, inputs) in GetModelNodes(args[0]))
{
var inputsString = inputs.Length == 0 ? "" : $", input nodes: {string.Join(", ", inputs)}";
Console.WriteLine($"Graph node: '{name}', operation type: '{opType}', output type: '{type}'{inputsString}");
}
}

private static IEnumerable<(string, string, DataViewType, string[])> GetModelNodes(string modelPath)
{
var mlContext = new MLContext();
var tensorFlowModel = mlContext.Model.LoadTensorFlowModel(modelPath);
var schema = tensorFlowModel.GetModelSchema();

for (int i = 0; i < schema.Count; i++)
{
var name = schema[i].Name;
var type = schema[i].Type;

var metadataType = schema[i].Metadata.Schema.GetColumnOrNull("TensorflowOperatorType")?.Type;
ReadOnlyMemory<char> opType = default;
schema[i].Metadata.GetValue("TensorflowOperatorType", ref opType);
metadataType = schema[i].Metadata.Schema.GetColumnOrNull("TensorflowUpstreamOperators")?.Type;
VBuffer <ReadOnlyMemory<char>> inputOps = default;
if (metadataType != null)
{
schema[i].Metadata.GetValue("TensorflowUpstreamOperators", ref inputOps);
}

string[] inputOpsResult = inputOps.DenseValues()
.Select(input => input.ToString())
.ToArray();

yield return (name, opType.ToString(), type, inputOpsResult);
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ public OutColumn(Vector<float> input, string modelFile)
Input = input;
}

public OutColumn(Vector<float> input, TensorFlowModelInfo tensorFlowModel)
public OutColumn(Vector<float> input, TensorFlowModel tensorFlowModel)
: base(new Reconciler(tensorFlowModel), input)
{
Input = input;
Expand All @@ -30,7 +30,7 @@ public OutColumn(Vector<float> input, TensorFlowModelInfo tensorFlowModel)
private sealed class Reconciler : EstimatorReconciler
{
private readonly string _modelFile;
private readonly TensorFlowModelInfo _tensorFlowModel;
private readonly TensorFlowModel _tensorFlowModel;

public Reconciler(string modelFile)
{
Expand All @@ -39,7 +39,7 @@ public Reconciler(string modelFile)
_tensorFlowModel = null;
}

public Reconciler(TensorFlowModelInfo tensorFlowModel)
public Reconciler(TensorFlowModel tensorFlowModel)
{
Contracts.CheckValue(tensorFlowModel, nameof(tensorFlowModel));

Expand Down Expand Up @@ -81,7 +81,7 @@ public static Vector<float> ApplyTensorFlowGraph(this Vector<float> input, strin
/// Run a TensorFlow model provided through <paramref name="tensorFlowModel"/> on the input column and extract one output column.
/// The inputs and outputs are matched to TensorFlow graph nodes by name.
/// </summary>
public static Vector<float> ApplyTensorFlowGraph(this Vector<float> input, TensorFlowModelInfo tensorFlowModel)
public static Vector<float> ApplyTensorFlowGraph(this Vector<float> input, TensorFlowModel tensorFlowModel)
{
Contracts.CheckValue(input, nameof(input));
Contracts.CheckValue(tensorFlowModel, nameof(tensorFlowModel));
Expand Down
48 changes: 5 additions & 43 deletions src/Microsoft.ML.TensorFlow/TensorFlow/TensorflowUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ public static class TensorFlowUtils
/// Key to access operator's type (a string) in <see cref="DataViewSchema.Column.Metadata"/>.
/// Its value describes the Tensorflow operator that produces this <see cref="DataViewSchema.Column"/>.
/// </summary>
public const string TensorflowOperatorTypeKind = "TensorflowOperatorType";
internal const string TensorflowOperatorTypeKind = "TensorflowOperatorType";
/// <summary>
/// Key to access upstream operators' names (a string array) in <see cref="DataViewSchema.Column.Metadata"/>.
/// Its value states operators that the associated <see cref="DataViewSchema.Column"/>'s generator depends on.
/// </summary>
public const string TensorflowUpstreamOperatorsKind = "TensorflowUpstreamOperators";
internal const string TensorflowUpstreamOperatorsKind = "TensorflowUpstreamOperators";

internal static DataViewSchema GetModelSchema(IExceptionContext ectx, TFGraph graph, string opType = null)
{
Expand Down Expand Up @@ -94,50 +94,12 @@ internal static DataViewSchema GetModelSchema(IExceptionContext ectx, TFGraph gr
/// </summary>
/// <param name="env">The environment to use.</param>
/// <param name="modelPath">Model to load.</param>
public static DataViewSchema GetModelSchema(IHostEnvironment env, string modelPath)
internal static DataViewSchema GetModelSchema(IHostEnvironment env, string modelPath)
{
var model = LoadTensorFlowModel(env, modelPath);
return GetModelSchema(env, model.Session.Graph);
}

/// <summary>
/// This is a convenience method for iterating over the nodes of a TensorFlow model graph. It
/// iterates over the columns of the <see cref="DataViewSchema"/> returned by <see cref="GetModelSchema(IHostEnvironment, string)"/>,
/// and for each one it returns a tuple containing the name, operation type, column type and an array of input node names.
/// This method is convenient for filtering nodes based on certain criteria, for example, by the operation type.
/// </summary>
/// <param name="env">The environment to use.</param>
/// <param name="modelPath">Model to load.</param>
/// <returns></returns>
public static IEnumerable<(string, string, DataViewType, string[])> GetModelNodes(IHostEnvironment env, string modelPath)
{
var schema = GetModelSchema(env, modelPath);

for (int i = 0; i < schema.Count; i++)
{
var name = schema[i].Name;
var type = schema[i].Type;

var metadataType = schema[i].Metadata.Schema.GetColumnOrNull(TensorflowOperatorTypeKind)?.Type;
Contracts.Assert(metadataType != null && metadataType is TextDataViewType);
ReadOnlyMemory<char> opType = default;
schema[i].Metadata.GetValue(TensorflowOperatorTypeKind, ref opType);
metadataType = schema[i].Metadata.Schema.GetColumnOrNull(TensorflowUpstreamOperatorsKind)?.Type;
VBuffer<ReadOnlyMemory<char>> inputOps = default;
if (metadataType != null)
{
Contracts.Assert(metadataType.IsKnownSizeVector() && metadataType.GetItemType() is TextDataViewType);
schema[i].Metadata.GetValue(TensorflowUpstreamOperatorsKind, ref inputOps);
}

string[] inputOpsResult = inputOps.DenseValues()
.Select(input => input.ToString())
.ToArray();

yield return (name, opType.ToString(), type, inputOpsResult);
}
}

internal static PrimitiveDataViewType Tf2MlNetType(TFDataType type)
{
var mlNetType = Tf2MlNetTypeOrNull(type);
Expand Down Expand Up @@ -338,10 +300,10 @@ private static void CreateTempDirectoryWithAcl(string folder, string identity)
/// <param name="env">The environment to use.</param>
/// <param name="modelPath">The model to load.</param>
/// <returns></returns>
public static TensorFlowModelInfo LoadTensorFlowModel(IHostEnvironment env, string modelPath)
internal static TensorFlowModel LoadTensorFlowModel(IHostEnvironment env, string modelPath)
{
var session = GetSession(env, modelPath);
return new TensorFlowModelInfo(env, session, modelPath);
return new TensorFlowModel(env, session, modelPath);
}

internal static TFSession GetSession(IHostEnvironment env, string modelPath)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
// See the LICENSE file in the project root for more information.

using Microsoft.Data.DataView;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms.TensorFlow;

namespace Microsoft.ML.Transforms
Expand All @@ -20,20 +19,20 @@ namespace Microsoft.ML.Transforms
/// </item>
/// </list>
/// </summary>
public class TensorFlowModelInfo
public sealed class TensorFlowModel
Copy link
Contributor

@TomFinley TomFinley Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think possibly maybe there's some misunderstanding. Just to be explicit, I expect to see methods used to query the model to be on the model, that is, they will be just methods and properties of the model. Creatingtransformers, querying the schema, or whatever, will be here. The implication is that nearly everything in TensorflowCatalog will be moved out of there, and into here. (Except for loading the model, which of course must be on the model operations catalog.) #Resolved

{
internal TFSession Session { get; }
public string ModelPath { get; }
internal string ModelPath { get; }

private readonly IHostEnvironment _env;

/// <summary>
/// Instantiates <see cref="TensorFlowModelInfo"/>.
/// Instantiates <see cref="TensorFlowModel"/>.
/// </summary>
/// <param name="env">An <see cref="IHostEnvironment"/> object.</param>
/// <param name="session">TensorFlow session object.</param>
/// <param name="modelLocation">Location of the model from where <paramref name="session"/> was loaded.</param>
internal TensorFlowModelInfo(IHostEnvironment env, TFSession session, string modelLocation)
internal TensorFlowModel(IHostEnvironment env, TFSession session, string modelLocation)
{
Session = session;
ModelPath = modelLocation;
Expand Down
16 changes: 13 additions & 3 deletions src/Microsoft.ML.TensorFlow/TensorflowCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
using Microsoft.Data.DataView;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;
using Microsoft.ML.Transforms.TensorFlow;

namespace Microsoft.ML
{
Expand Down Expand Up @@ -59,7 +60,7 @@ public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog ca
/// <param name="inputColumnName"> The name of the model input.</param>
/// <param name="outputColumnName">The name of the requested model output.</param>
public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog catalog,
Copy link
Contributor

@TomFinley TomFinley Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScoreTensorFlowModel [](start = 42, length = 20)

This should be an operation on the model. #Resolved

TensorFlowModelInfo tensorFlowModel,
TensorFlowModel tensorFlowModel,
string outputColumnName,
string inputColumnName)
=> new TensorFlowEstimator(CatalogUtils.GetEnvironment(catalog), new[] { outputColumnName }, new[] { inputColumnName }, tensorFlowModel);
Expand All @@ -79,7 +80,7 @@ public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog ca
/// </format>
/// </example>
public static TensorFlowEstimator ScoreTensorFlowModel(this TransformsCatalog catalog,
Copy link
Contributor

@TomFinley TomFinley Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScoreTensorFlowModel [](start = 42, length = 20)

Likewise on the model.. #Resolved

TensorFlowModelInfo tensorFlowModel,
TensorFlowModel tensorFlowModel,
string[] outputColumnNames,
string[] inputColumnNames)
=> new TensorFlowEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnNames, inputColumnNames, tensorFlowModel);
Expand All @@ -102,7 +103,16 @@ public static TensorFlowEstimator TensorFlow(this TransformsCatalog catalog,
/// <param name="tensorFlowModel">The pre-loaded TensorFlow model.</param>
public static TensorFlowEstimator TensorFlow(this TransformsCatalog catalog,
Copy link
Contributor

@TomFinley TomFinley Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorFlow [](start = 42, length = 10)

So this I find pretty confusing. Do we create estimators via this method, or do we work through the model object? #Resolved

TensorFlowEstimator.Options options,
TensorFlowModelInfo tensorFlowModel)
TensorFlowModel tensorFlowModel)
=> new TensorFlowEstimator(CatalogUtils.GetEnvironment(catalog), options, tensorFlowModel);

/// <summary>
/// Load TensorFlow model into memory. This is the convenience method that allows the model to be loaded once and subsequently use it for querying schema and creation of
/// <see cref="TensorFlowEstimator"/> using <see cref="TensorFlow(TransformsCatalog, TensorFlowEstimator.Options, TensorFlowModel)"/>.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="modelLocation">Location of the TensorFlow model.</param>
public static TensorFlowModel LoadTensorFlowModel(this ModelOperationsCatalog catalog, string modelLocation)
=> TensorFlowUtils.LoadTensorFlowModel(CatalogUtils.GetEnvironment(catalog), modelLocation);
}
}
Loading