-
Notifications
You must be signed in to change notification settings - Fork 0
OnnxTransform V0 #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
{ | ||
"description": "", | ||
"internalName": "Input3", | ||
"name": "input0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jignesh - manifest.json file changes the "internalName" of nodes to "name". That's something to take care of when removing the dependency on manifest file.
/// OnnxNodeInfo contains all the information for a given node (e.g. inputs/outputs) | ||
/// of an Onnx model. | ||
/// </summary> | ||
public class OnnxNodeInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jignesh - this is the data structure I was thinking that would make sense for NodeInfo, instead of having multiple dictionaries, one for shape, one for type, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shmoradims , this is (roughly) equivalent to the the low-level API Lotus uses. Since we're (most likely) developing a new Lotus# package, it may be better to expose a NodeInfo style API there.
For now, you should be able to use the existing Sonoma API to construct a NodeInfo.
var listDst = new List<System.Single>(); | ||
var typedDst = (System.Single[])(object)dst; | ||
tensor.CopyTo(listDst); | ||
listDst.CopyTo(typedDst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jignesh - Tensor.CopyTo(List dst) requires a list input, whereas ML.NET provides array buffers to copy values to. This mismatch causes an extra copy. Do you know how to prevent this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shmoradims . I'll add a CopyTo(T[] dst) method to Tensor, which should skip this extra copy.
|
||
public OnnxModel(byte[] modelBytes) | ||
{ | ||
throw new NotImplementedException("Need an API to serialize/deserialize onnx models to byte arrays!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jignesh - we need API to serialize/deserialize onnx models to byte arrays for saving and loading ml.net models. Does it exists in sonoma?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shmoradims ,this method needs to be added. We can add as follows.
byte[]^ GetModelAsBytes((System::String^ modelName, int32_t version);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, you're using ModelManager(..., flatDirectory=True), in which case you cannot specify a version number (i.e. version must always be equal to "IgnoredVersion"). Since versioning is turned off, and you have the full path to the model file, you can simply use the snippet below to get the byte representation. Versioning in Sonoma is used mostly by the Deep-Learning-Inferencing-Service in Bing, where they need to be able to load new versions of the same model without any downtime on production servers. We don't need to worry about multiple model versions in this transform.
var modelbytes = File.ReadAllBytes(fileName);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would work for model->bytes. For bytes->model, I'd need to save the bytes to a temp file and the use the ctor(string filepath) and cleanup. I can do that, but it would make sense for the sonoma API to handle that directly with a ctor(byte[]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do the following?
1> To unblock demo, use the approach mentioned above (using temp file), similar to Python Trainer transform. This should be relatively quick stopgap fix.
2> In Sonoma, after we remove dependency on manifest file + model_export.exe (will take ~1 week or longer), we can add the API to serialize model to/from byte[] , and upgrade ml.net onnx transform accordingly.
The cleaner API (in Sonoma) requires quite a bit of refactoring -- the steps above would unblock the demo milestone, which we get the APIs in place for the Sonoma/ML.net release milestone.
throw new NotImplementedException($"Not implemented type {typeof(T)}"); | ||
} | ||
|
||
public static PrimitiveType RawToMlNetType(Type type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yael - do we have an existing utility to replace RawToMlNetType()?
private static VersionInfo GetVersionInfo() | ||
{ | ||
return new VersionInfo( | ||
modelSignature: "ONNX", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signatures need to be 8 characters long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your suggestion?
ONNXTRFM
ONNXTSFM
ONNXTRAN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Argument(ArgumentType.Multiple | ArgumentType.Required, HelpText = "TBD", SortOrder = 2)] | ||
public string OutputColumn; | ||
|
||
public OnnxModelInfo ModelInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ModelInfo [](start = 33, length = 9)
I'm not sure I understand. Where does this get set?
Wouldn't it make more sense to have a static method that creates one of these given an Arguments object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OnnxModelInfo hold all the args that should be inferred from the model but are currently provided by user. This way it's easy to take it out once the sonoma API is ready. The creator of OnnxTransform should create it and pass it in. But it would be unnecessary later on.
}, | ||
}; | ||
|
||
var transformArgs = new OnnxTransform.Arguments() { ModelFile = modelFile, InputColumn = "pixels", OutputColumn = "pixelsOut", ModelInfo = modelMetadata }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pixels [](start = 106, length = 6)
Doesn't this name need to match the name in modelMetadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption is that there's only one InputIdvColumn and the model has one TensorInputNode. Same with output. So the names don't have to match, b/c there's 1-1 mapping.
For supporting multiple input/output, we need to do something similar to TF. Either require names to match, or allow users to create a mapping.
/// </summary> | ||
/// <param name="env">Host Environment.</param> | ||
/// <param name="input">Input <see cref="IDataView"/>. This is the output from previous transform or loader.</param> | ||
/// <param name="modelFile">This is the frozen Onnx model file. https://www.tensorflow.org/mobile/prepare_models </param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.tensorflow.org/mobile/prepare_models [](start = 72, length = 48)
This needs to be updated. The term "frozen" also is only for TF models, isn't it?
|
||
ValueGetter<VBuffer<T>> valuegetter = (ref VBuffer<T> dst) => | ||
{ | ||
var outputTensors = _model.Run(new List<Tensor> { _idvToTensorAdapter.GetTensor() }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_model [](start = 40, length = 6)
Does Microsoft.ML.Scoring take care of disposing this object? Otherwise, we should add it to the disposer in CreateGetters.
public void InitializeValueGetters(IRow idvRow) | ||
{ | ||
var type = _idvColumnType.ItemType.RawType; | ||
_tensorValueGetter = Utils.MarshalInvoke( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_tensorValueGetter [](start = 16, length = 18)
I think it would be better to define a delegate that returns a Tensor, that way instead of having a mutable object, this method could return it and it can be used in line 184 instead of _idvToTensorAdapter.GetTensor().
internal sealed class OnnxModel | ||
{ | ||
private static readonly int IgnoredVersion = int.MaxValue; | ||
private ModelManager _modelManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ModelManager [](start = 16, length = 12)
readonly?
* Added placeholder * Cleaned up Infos (replaced with ColumnPairs) * Added ColumnInfo * Added all the Create() methods. * Added Mapper * Commented out the EntryPoint * Added PcaEstimator2 * PcaWorkout test passes * Added pigsty api * Fixed EntryPoint * Fixed the arguments * Fixed tests and added pigsty test * Deleted Wrapped PCA transform * Float -> float * Cleaned docstrings * Removed some unnecessary checks * Simplified unnecessary code * Moved some fields to ColumnInfo for simplifications * Simplified weight columns * Address PR comments #1 * Addressed PR comments dotnet#2 * Moved the static test * PR comments dotnet#3 * Moved schema related information out of ColumnInfo and into Mapper.ColumnSchemaInfo. * PR comments * PR comments * Updated manifest for entrypoint PcaCalculator * Fixed schema exceptions
* Implement VBuffer master plan WIP #1 * Getting everything to build and tests passing * Keep moving to the master plan of VBuffer. * Remove the rest of the VBuffer.Count usages in ML.Data * Remove the rest of the VBuffer.Count usages and make VBuffer.Count private. * Fix two failing tests. * Fix FastTreeBinaryClassificationCategoricalSplitTest by remembering the underlying arrays in the column buffer in Transposer. Also enable a Transposer test, since it passes.
The purpose of this PR is to review the overall structure of OnnxTransform as a checkpoint, to make sure all stakeholders agree on the direction.
OnnxTransform V0, works end to end for a sample fully connected Onnx model with one input and one output. The tester code scores one data point with the model.
The project is currently outside of ml.net src directory, b/c of .net core vs .net framework complications. The next step is to integrate this project as part of ml.net.