OnnxTransform V0 #1

shmoradims · 2018-09-05T14:50:27Z

The purpose of this PR is to review the overall structure of OnnxTransform as a checkpoint, to make sure all stakeholders agree on the direction.

OnnxTransform V0, works end to end for a sample fully connected Onnx model with one input and one output. The tester code scores one data point with the model.

The project is currently outside of ml.net src directory, b/c of .net core vs .net framework complications. The next step is to integrate this project as part of ml.net.

shmoradims · 2018-09-05T15:11:57Z

temp/OnnxTransformTempSolution/OnnxModel/manifest.json

+            {
+              "description": "",
+              "internalName": "Input3",
+              "name": "input0",


@jignesh - manifest.json file changes the "internalName" of nodes to "name". That's something to take care of when removing the dependency on manifest file.

shmoradims · 2018-09-05T15:13:28Z

temp/OnnxTransformTempSolution/Microsoft.ML.Transforms.Onnx/OnnxUtils.cs

+    /// OnnxNodeInfo contains all the information for a given node (e.g. inputs/outputs)
+    /// of an Onnx model.
+    /// </summary>
+    public class OnnxNodeInfo


@jignesh - this is the data structure I was thinking that would make sense for NodeInfo, instead of having multiple dictionaries, one for shape, one for type, etc.

@shmoradims , this is (roughly) equivalent to the the low-level API Lotus uses. Since we're (most likely) developing a new Lotus# package, it may be better to expose a NodeInfo style API there.

For now, you should be able to use the existing Sonoma API to construct a NodeInfo.

shmoradims · 2018-09-05T15:14:43Z

temp/OnnxTransformTempSolution/Microsoft.ML.Transforms.Onnx/OnnxUtils.cs

+                var listDst = new List<System.Single>();
+                var typedDst = (System.Single[])(object)dst;
+                tensor.CopyTo(listDst);
+                listDst.CopyTo(typedDst);


@jignesh - Tensor.CopyTo(List dst) requires a list input, whereas ML.NET provides array buffers to copy values to. This mismatch causes an extra copy. Do you know how to prevent this?

@shmoradims . I'll add a CopyTo(T[] dst) method to Tensor, which should skip this extra copy.

shmoradims · 2018-09-05T15:16:05Z

temp/OnnxTransformTempSolution/Microsoft.ML.Transforms.Onnx/OnnxUtils.cs

+
+        public OnnxModel(byte[] modelBytes)
+        {
+            throw new NotImplementedException("Need an API to serialize/deserialize onnx models to byte arrays!");


@jignesh - we need API to serialize/deserialize onnx models to byte arrays for saving and loading ml.net models. Does it exists in sonoma?

@shmoradims ,this method needs to be added. We can add as follows.

byte[]^ GetModelAsBytes((System::String^ modelName, int32_t version);

BTW, you're using ModelManager(..., flatDirectory=True), in which case you cannot specify a version number (i.e. version must always be equal to "IgnoredVersion"). Since versioning is turned off, and you have the full path to the model file, you can simply use the snippet below to get the byte representation. Versioning in Sonoma is used mostly by the Deep-Learning-Inferencing-Service in Bing, where they need to be able to load new versions of the same model without any downtime on production servers. We don't need to worry about multiple model versions in this transform.

var modelbytes = File.ReadAllBytes(fileName);

That would work for model->bytes. For bytes->model, I'd need to save the bytes to a temp file and the use the ctor(string filepath) and cleanup. I can do that, but it would make sense for the sonoma API to handle that directly with a ctor(byte[]).

Can we do the following?

1> To unblock demo, use the approach mentioned above (using temp file), similar to Python Trainer transform. This should be relatively quick stopgap fix.
2> In Sonoma, after we remove dependency on manifest file + model_export.exe (will take ~1 week or longer), we can add the API to serialize model to/from byte[] , and upgrade ml.net onnx transform accordingly.

The cleaner API (in Sonoma) requires quite a bit of refactoring -- the steps above would unblock the demo milestone, which we get the APIs in place for the Sonoma/ML.net release milestone.

shmoradims · 2018-09-05T15:19:45Z

temp/OnnxTransformTempSolution/Microsoft.ML.Transforms.Onnx/OnnxUtils.cs

+                throw new NotImplementedException($"Not implemented type {typeof(T)}");
+        }
+
+        public static PrimitiveType RawToMlNetType(Type type)


@Yael - do we have an existing utility to replace RawToMlNetType()?

yaeldekel · 2018-09-05T15:41:40Z

temp/OnnxTransformTempSolution/Microsoft.ML.Transforms.Onnx/OnnxTransform.cs

+            private static VersionInfo GetVersionInfo()
+            {
+                return new VersionInfo(
+                    modelSignature: "ONNX",


Signatures need to be 8 characters long.

What's your suggestion?
ONNXTRFM
ONNXTSFM
ONNXTRAN

Perhaps "ONNXSCOR" ?

In reply to: 215616699 [](ancestors = 215616699)

yaeldekel · 2018-09-05T16:27:28Z

temp/OnnxTransformTempSolution/Microsoft.ML.Transforms.Onnx/OnnxTransform.cs

+            [Argument(ArgumentType.Multiple | ArgumentType.Required, HelpText = "TBD", SortOrder = 2)]
+            public string OutputColumn;
+
+            public OnnxModelInfo ModelInfo;


ModelInfo [](start = 33, length = 9)

I'm not sure I understand. Where does this get set?
Wouldn't it make more sense to have a static method that creates one of these given an Arguments object?

OnnxModelInfo hold all the args that should be inferred from the model but are currently provided by user. This way it's easy to take it out once the sonoma API is ready. The creator of OnnxTransform should create it and pass it in. But it would be unnecessary later on.

yaeldekel · 2018-09-05T16:29:13Z

temp/OnnxTransformTempSolution/Microsoft.ML.Transforms.Onnx/Program.cs

+                    },
+                };
+
+                var transformArgs = new OnnxTransform.Arguments() { ModelFile = modelFile, InputColumn = "pixels", OutputColumn = "pixelsOut", ModelInfo = modelMetadata };


pixels [](start = 106, length = 6)

Doesn't this name need to match the name in modelMetadata?

The assumption is that there's only one InputIdvColumn and the model has one TensorInputNode. Same with output. So the names don't have to match, b/c there's 1-1 mapping.
For supporting multiple input/output, we need to do something similar to TF. Either require names to match, or allow users to create a mapping.

yaeldekel · 2018-09-07T18:47:20Z

temp/OnnxTransformTempSolution/Microsoft.ML.Transforms.Onnx/OnnxTransform.cs

+        /// </summary>
+        /// <param name="env">Host Environment.</param>
+        /// <param name="input">Input <see cref="IDataView"/>. This is the output from previous transform or loader.</param>
+        /// <param name="modelFile">This is the frozen Onnx model file. https://www.tensorflow.org/mobile/prepare_models </param>


https://www.tensorflow.org/mobile/prepare_models [](start = 72, length = 48)

This needs to be updated. The term "frozen" also is only for TF models, isn't it?

yaeldekel · 2018-09-07T19:02:51Z

temp/OnnxTransformTempSolution/Microsoft.ML.Transforms.Onnx/OnnxTransform.cs

+
+                ValueGetter<VBuffer<T>> valuegetter = (ref VBuffer<T> dst) =>
+                {
+                    var outputTensors = _model.Run(new List<Tensor> { _idvToTensorAdapter.GetTensor() });


_model [](start = 40, length = 6)

Does Microsoft.ML.Scoring take care of disposing this object? Otherwise, we should add it to the disposer in CreateGetters.

yaeldekel · 2018-09-07T19:57:03Z

temp/OnnxTransformTempSolution/Microsoft.ML.Transforms.Onnx/OnnxTransform.cs

+            public void InitializeValueGetters(IRow idvRow)
+            {
+                var type = _idvColumnType.ItemType.RawType;
+                _tensorValueGetter = Utils.MarshalInvoke(


_tensorValueGetter [](start = 16, length = 18)

I think it would be better to define a delegate that returns a Tensor, that way instead of having a mutable object, this method could return it and it can be used in line 184 instead of _idvToTensorAdapter.GetTensor().

yaeldekel · 2018-09-07T20:01:58Z

temp/OnnxTransformTempSolution/Microsoft.ML.Transforms.Onnx/OnnxUtils.cs

+    internal sealed class OnnxModel
+    {
+        private static readonly int IgnoredVersion = int.MaxValue;
+        private ModelManager _modelManager;


ModelManager [](start = 16, length = 12)

readonly?

* Added placeholder * Cleaned up Infos (replaced with ColumnPairs) * Added ColumnInfo * Added all the Create() methods. * Added Mapper * Commented out the EntryPoint * Added PcaEstimator2 * PcaWorkout test passes * Added pigsty api * Fixed EntryPoint * Fixed the arguments * Fixed tests and added pigsty test * Deleted Wrapped PCA transform * Float -> float * Cleaned docstrings * Removed some unnecessary checks * Simplified unnecessary code * Moved some fields to ColumnInfo for simplifications * Simplified weight columns * Address PR comments #1 * Addressed PR comments dotnet#2 * Moved the static test * PR comments dotnet#3 * Moved schema related information out of ColumnInfo and into Mapper.ColumnSchemaInfo. * PR comments * PR comments * Updated manifest for entrypoint PcaCalculator * Fixed schema exceptions

* Implement VBuffer master plan WIP #1 * Getting everything to build and tests passing * Keep moving to the master plan of VBuffer. * Remove the rest of the VBuffer.Count usages in ML.Data * Remove the rest of the VBuffer.Count usages and make VBuffer.Count private. * Fix two failing tests. * Fix FastTreeBinaryClassificationCategoricalSplitTest by remembering the underlying arrays in the column buffer in Transposer. Also enable a Transposer test, since it passes.

Shahab Moradi added 2 commits September 4, 2018 16:40

Temp solution for OnnxTransform V0

9a1494a

Moved the solution and added model files.

b2ce45b

shmoradims self-assigned this Sep 5, 2018

shmoradims commented Sep 5, 2018

View reviewed changes

yaeldekel reviewed Sep 5, 2018

View reviewed changes

yaeldekel reviewed Sep 7, 2018

View reviewed changes

shmoradims pushed a commit that referenced this pull request Oct 22, 2018

Address PR comments #1

db8d690

shmoradims closed this Oct 23, 2018

shmoradims deleted the onnx_transform4 branch October 23, 2018 19:20

github-staff deleted a comment May 27, 2024

OnnxTransform V0 #1

OnnxTransform V0 #1

Uh oh!

Conversation

shmoradims commented Sep 5, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jignparm Sep 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jignparm Sep 7, 2018 •

edited

Loading