Skip to content

OnnxTransform V0 #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

OnnxTransform V0 #1

wants to merge 2 commits into from

Conversation

shmoradims
Copy link
Owner

The purpose of this PR is to review the overall structure of OnnxTransform as a checkpoint, to make sure all stakeholders agree on the direction.

OnnxTransform V0, works end to end for a sample fully connected Onnx model with one input and one output. The tester code scores one data point with the model.

The project is currently outside of ml.net src directory, b/c of .net core vs .net framework complications. The next step is to integrate this project as part of ml.net.

@shmoradims shmoradims self-assigned this Sep 5, 2018
{
"description": "",
"internalName": "Input3",
"name": "input0",
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jignesh - manifest.json file changes the "internalName" of nodes to "name". That's something to take care of when removing the dependency on manifest file.

/// OnnxNodeInfo contains all the information for a given node (e.g. inputs/outputs)
/// of an Onnx model.
/// </summary>
public class OnnxNodeInfo
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jignesh - this is the data structure I was thinking that would make sense for NodeInfo, instead of having multiple dictionaries, one for shape, one for type, etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmoradims , this is (roughly) equivalent to the the low-level API Lotus uses. Since we're (most likely) developing a new Lotus# package, it may be better to expose a NodeInfo style API there.

For now, you should be able to use the existing Sonoma API to construct a NodeInfo.

var listDst = new List<System.Single>();
var typedDst = (System.Single[])(object)dst;
tensor.CopyTo(listDst);
listDst.CopyTo(typedDst);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jignesh - Tensor.CopyTo(List dst) requires a list input, whereas ML.NET provides array buffers to copy values to. This mismatch causes an extra copy. Do you know how to prevent this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmoradims . I'll add a CopyTo(T[] dst) method to Tensor, which should skip this extra copy.


public OnnxModel(byte[] modelBytes)
{
throw new NotImplementedException("Need an API to serialize/deserialize onnx models to byte arrays!");
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jignesh - we need API to serialize/deserialize onnx models to byte arrays for saving and loading ml.net models. Does it exists in sonoma?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmoradims ,this method needs to be added. We can add as follows.

byte[]^ GetModelAsBytes((System::String^ modelName, int32_t version);

Copy link

@jignparm jignparm Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, you're using ModelManager(..., flatDirectory=True), in which case you cannot specify a version number (i.e. version must always be equal to "IgnoredVersion"). Since versioning is turned off, and you have the full path to the model file, you can simply use the snippet below to get the byte representation. Versioning in Sonoma is used mostly by the Deep-Learning-Inferencing-Service in Bing, where they need to be able to load new versions of the same model without any downtime on production servers. We don't need to worry about multiple model versions in this transform.


var modelbytes = File.ReadAllBytes(fileName);

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work for model->bytes. For bytes->model, I'd need to save the bytes to a temp file and the use the ctor(string filepath) and cleanup. I can do that, but it would make sense for the sonoma API to handle that directly with a ctor(byte[]).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the following?

1> To unblock demo, use the approach mentioned above (using temp file), similar to Python Trainer transform. This should be relatively quick stopgap fix.
2> In Sonoma, after we remove dependency on manifest file + model_export.exe (will take ~1 week or longer), we can add the API to serialize model to/from byte[] , and upgrade ml.net onnx transform accordingly.

The cleaner API (in Sonoma) requires quite a bit of refactoring -- the steps above would unblock the demo milestone, which we get the APIs in place for the Sonoma/ML.net release milestone.

throw new NotImplementedException($"Not implemented type {typeof(T)}");
}

public static PrimitiveType RawToMlNetType(Type type)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yael - do we have an existing utility to replace RawToMlNetType()?

private static VersionInfo GetVersionInfo()
{
return new VersionInfo(
modelSignature: "ONNX",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signatures need to be 8 characters long.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your suggestion?
ONNXTRFM
ONNXTSFM
ONNXTRAN

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "ONNXSCOR" ?


In reply to: 215616699 [](ancestors = 215616699)

[Argument(ArgumentType.Multiple | ArgumentType.Required, HelpText = "TBD", SortOrder = 2)]
public string OutputColumn;

public OnnxModelInfo ModelInfo;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelInfo [](start = 33, length = 9)

I'm not sure I understand. Where does this get set?
Wouldn't it make more sense to have a static method that creates one of these given an Arguments object?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OnnxModelInfo hold all the args that should be inferred from the model but are currently provided by user. This way it's easy to take it out once the sonoma API is ready. The creator of OnnxTransform should create it and pass it in. But it would be unnecessary later on.

},
};

var transformArgs = new OnnxTransform.Arguments() { ModelFile = modelFile, InputColumn = "pixels", OutputColumn = "pixelsOut", ModelInfo = modelMetadata };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pixels [](start = 106, length = 6)

Doesn't this name need to match the name in modelMetadata?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption is that there's only one InputIdvColumn and the model has one TensorInputNode. Same with output. So the names don't have to match, b/c there's 1-1 mapping.
For supporting multiple input/output, we need to do something similar to TF. Either require names to match, or allow users to create a mapping.

/// </summary>
/// <param name="env">Host Environment.</param>
/// <param name="input">Input <see cref="IDataView"/>. This is the output from previous transform or loader.</param>
/// <param name="modelFile">This is the frozen Onnx model file. https://www.tensorflow.org/mobile/prepare_models </param>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.tensorflow.org/mobile/prepare_models [](start = 72, length = 48)

This needs to be updated. The term "frozen" also is only for TF models, isn't it?


ValueGetter<VBuffer<T>> valuegetter = (ref VBuffer<T> dst) =>
{
var outputTensors = _model.Run(new List<Tensor> { _idvToTensorAdapter.GetTensor() });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_model [](start = 40, length = 6)

Does Microsoft.ML.Scoring take care of disposing this object? Otherwise, we should add it to the disposer in CreateGetters.

public void InitializeValueGetters(IRow idvRow)
{
var type = _idvColumnType.ItemType.RawType;
_tensorValueGetter = Utils.MarshalInvoke(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_tensorValueGetter [](start = 16, length = 18)

I think it would be better to define a delegate that returns a Tensor, that way instead of having a mutable object, this method could return it and it can be used in line 184 instead of _idvToTensorAdapter.GetTensor().

internal sealed class OnnxModel
{
private static readonly int IgnoredVersion = int.MaxValue;
private ModelManager _modelManager;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelManager [](start = 16, length = 12)

readonly?

shmoradims pushed a commit that referenced this pull request Oct 22, 2018
@shmoradims shmoradims closed this Oct 23, 2018
@shmoradims shmoradims deleted the onnx_transform4 branch October 23, 2018 19:20
shmoradims pushed a commit that referenced this pull request Nov 5, 2018
* Added placeholder

* Cleaned up Infos (replaced with ColumnPairs)

* Added ColumnInfo

* Added all the Create() methods.

* Added Mapper

* Commented out the EntryPoint

* Added PcaEstimator2

* PcaWorkout test passes

* Added pigsty api

* Fixed EntryPoint

* Fixed the arguments

* Fixed tests and added pigsty test

* Deleted Wrapped PCA transform

* Float -> float

* Cleaned docstrings

* Removed some unnecessary checks

* Simplified unnecessary code

* Moved some fields to ColumnInfo for simplifications

* Simplified weight columns

* Address PR comments #1

* Addressed PR comments dotnet#2

* Moved the static test

* PR comments dotnet#3

* Moved schema related information out of ColumnInfo and into Mapper.ColumnSchemaInfo.

* PR comments

* PR comments

* Updated manifest for entrypoint PcaCalculator

* Fixed schema exceptions
shmoradims pushed a commit that referenced this pull request Nov 16, 2018
* Implement VBuffer master plan WIP #1

* Getting everything to build and tests passing

* Keep moving to the master plan of VBuffer.

* Remove the rest of the VBuffer.Count usages in ML.Data

* Remove the rest of the VBuffer.Count usages and make VBuffer.Count private.

* Fix two failing tests.

* Fix FastTreeBinaryClassificationCategoricalSplitTest by remembering the underlying arrays in the column buffer in Transposer.

Also enable a Transposer test, since it passes.
@github-staff github-staff deleted a comment May 27, 2024
@github-staff github-staff deleted a comment May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants