From 5da49a3ed75afc21611262b717d255d1023d0039 Mon Sep 17 00:00:00 2001 From: Senja Filipi Date: Mon, 4 Jun 2018 11:45:45 -0700 Subject: [PATCH 1/9] Adding EntryPoints.md and GraphRunner.md --- docs/code/EntryPoints.md | 188 +++++++++++++++++++++++++++++++++++++++ docs/code/GraphRunner.md | 123 +++++++++++++++++++++++++ 2 files changed, 311 insertions(+) create mode 100644 docs/code/EntryPoints.md create mode 100644 docs/code/GraphRunner.md diff --git a/docs/code/EntryPoints.md b/docs/code/EntryPoints.md new file mode 100644 index 0000000000..06ae8312d7 --- /dev/null +++ b/docs/code/EntryPoints.md @@ -0,0 +1,188 @@ +# Overview + +An 'entry point', is a representation of a ML.Net type in json format and it is used to serialize and deserialize an ML.Net type in JSON. +It is also one of the ways ML.Net uses to deserialize experiments, and the recommended way to interface with other languages. +In terms defining experiments w.r.t entry points, experiments are entry points DAGs, and respectively, entry points are experiment graph nodes. +That's why through the documentaiton, we also refer to them as 'entry points nodes'. +The graph 'variables', the various values of the experiemnt graph json properties serve to describe the relationship between the entry point nodes. +The 'variables' are therefore the edges of the DAG. + +All of ML.Net entry points are described by their manifest. The manifest is another json object that documents and describes the structure of an entry points. +Manifests are referenced to understand what an entry point does, and how it should be constructed, in a graph. + +This document briefly describes the structure of the entry points, the structure of an entry point manifest, and mentions the ML.Net classes that help construct an entry point +graph. + +## `EntryPoint manifest - the definition of an entry point` + +An example of an entry point manifest object, specifically for the MissingValueIndicator transform, is: + +```javascript + { + "Name": "Transforms.MissingValueIndicator", + "Desc": "Create a boolean output column with the same number of slots as the input column, where the output value is true if the value in the input column is missing.", + "FriendlyName": "NA Indicator Transform", + "ShortName": "NAInd", + "Inputs": [ + { + "Name": "Column", + "Type": { + "Kind": "Array", + "ItemType": { + "Kind": "Struct", + "Fields": [ + { + "Name": "Name", + "Type": "String", + "Desc": "Name of the new column", + "Aliases": [ + "name" + ], + "Required": false, + "SortOrder": 150.0, + "IsNullable": false, + "Default": null + }, + { + "Name": "Source", + "Type": "String", + "Desc": "Name of the source column", + "Aliases": [ + "src" + ], + "Required": false, + "SortOrder": 150.0, + "IsNullable": false, + "Default": null + } + ] + } + }, + "Desc": "New column definition(s) (optional form: name:src)", + "Aliases": [ + "col" + ], + "Required": true, + "SortOrder": 1.0, + "IsNullable": false + }, + { + "Name": "Data", + "Type": "DataView", + "Desc": "Input dataset", + "Required": true, + "SortOrder": 1.0, + "IsNullable": false + } + ], + "Outputs": [ + { + "Name": "OutputData", + "Type": "DataView", + "Desc": "Transformed dataset" + }, + { + "Name": "Model", + "Type": "TransformModel", + "Desc": "Transform model" + } + ], + "InputKind": [ + "ITransformInput" + ], + "OutputKind": [ + "ITransformOutput" + ] + } +``` + +The respective entry point, constructed based on this manifest would be: + +```javascript + { + "Name": "Transforms.MissingValueIndicator", + "Inputs": { + "Column": [ + { + "Name": "Features", + "Source": "Features" + } + ], + "Data": "$data0" + }, + "Outputs": { + "OutputData": "$Output_1528136517433", + "Model": "$TransformModel_1528136517433" + } + } +``` + +## `EntryPointGraph` + +This class encapsulates the list of nodes (`EntryPointNode`) and edges +(`EntryPointVariable` inside a `RunContext`) of the graph. + +## `EntryPointNode` + +This class represents a node in the graph, and wraps an entry point call. It +has methods for creating and running entry points. It also has a reference to +the `RunContext` to allow it to get and set values from `EntryPointVariable`s. + +To express the inputs that are set through variables, a set of dictionaries +are used. The `InputBindingMap` maps an input parameter name to a list of +`ParameterBinding`s. The `InputMap` maps a `ParameterBinding` to a +`VariableBinding`. For example, if the JSON looks like this: + +```javascript +'foo': '$bar' +``` + +the `InputBindingMap` will have one entry that maps the string "foo" to a list +that has only one element, a `SimpleParameterBinding` with the name "foo" and +the `InputMap` will map the `SimpleParameterBinding` to a +`SimpleVariableBinding` with the name "bar". For a more complicated example, +let's say we have this JSON: + +```javascript +'foo': [ '$bar[3]', '$baz'] +``` + +the `InputBindingMap` will have one entry that maps the string "foo" to a list +that has two elements, an `ArrayIndexParameterBinding` with the name "foo" and +index 0 and another one with index 1. The `InputMap` will map the first +`ArrayIndexParameterBinding` to an `ArrayIndexVariableBinding` with name "bar" +and index 3 and the second `ArrayIndexParameterBinding` to a +`SimpleVariableBinding` with the name "baz". + +For outputs, a node assumes that an output is mapped to a variable, so the +`OutputMap` is a simple dictionary from string to string. + +## `EntryPointVariable` + +This class represents an edge in the entry point graph. It has a name, a type +and a value. Variables can be simple, arrays and/or dictionaries. Currently, +only data views, file handles, predictor models and transform models are +allowed as element types for a variable. + +## `RunContext` + +This class is just a container for all the variables in a graph. + +## VariableBinding and Derived Classes + +The abstract base class represents a "pointer to a (part of a) variable". It +is used in conjunction with `ParameterBinding`s to specify inputs to an entry +point node. The `SimpleVariableBinding` is a pointer to an entire variable, +the `ArrayIndexVariableBinding` is a pointer to a specific index in an array +variable, and the `DictionaryKeyVariableBinding` is a pointer to a specific +key in a dictionary variable. + +## ParameterBinding and Derived Classes + +The abstract base class represents a "pointer to a (part of a) parameter". It +parallels the `VariableBinding` hierarchy and it is used to specify the inputs +to an entry point node. The `SimpleParameterBinding` is a pointer to a +non-array, non-dictionary parameter, the `ArrayIndexParameterBinding` is a +pointer to a specific index of an array parameter and the +`DictionaryKeyParameterBinding` is a pointer to a specific key of a dictionary +parameter. \ No newline at end of file diff --git a/docs/code/GraphRunner.md b/docs/code/GraphRunner.md new file mode 100644 index 0000000000..a62e9714b2 --- /dev/null +++ b/docs/code/GraphRunner.md @@ -0,0 +1,123 @@ +# JSON Graph format + +The entry point graph in TLC is an array of _nodes_. Each node is an object with the following fields: + +- _name_: string. Required. Name of the entry point. +- _inputs_: object. Optional. Specifies non-default inputs to the entry point. +Note that if the entry point has required inputs (which is very common), the _inputs_ field is requred. +- _outputs_: object. Optional. Specifies the variables that will hold the node's outputs. + +## Input and output types +The following types are supported in JSON graphs: + +- _string_. Represented as a JSON string, maps to a C# string. +- _float_. Represented as a JSON float, maps to a C# float or double. +- _bool_. Represented as a JSON bool, maps to a C# bool. +- _enum_. Represented as a JSON string, maps to a C# enum. The allowed values are those of the C# enum (they are also listed in the manifest). +- _int_. Currently not implemented. Represented as a JSON integer, maps to a C# int or long. +- _array_ of the above. Represented as a JSON array, maps to a C# array. +- _dictionary_. Currently not implemented. Represented as a JSON object, maps to a C# `Dictionary`. +- _component_. Currently not implemented. Represented as a JSON object with 2 fields: _name_:string and _settings_:object. + +## Variables +The following input/output types can not be represented as a JSON value: +- _DataView_ +- _FileHandle_ +- _TransformModel_ +- _PredictorModel_ + +These must be passed as _variables_. The variable is represented as a JSON string that begins with "$". +Note the following rules: + +- A variable can appear in the _outputs_ only once per graph. That is, the variable can be 'assigned' only once. +- If the variable is present in _inputs_ of one node and in the _outputs_ of another node, this signifies the graph 'edge'. +The same variable can participate in many edges. +- If the variable is present only in _inputs_, but never in _outputs_, it is a _graph input_. All graph inputs must be provided before +a graph can be run. +- The variable has a type, which is the type of inputs (and, optionally, output) that it appears in. If the type of the variable is +ambiguous, TLC throws an exception. +- Circular references. The experiment graph is expected to be a DAG. If the circular dependency is detected, TLC throws an exception. +_Currently, this is done lazily: if we couldn't ever run a node because it's waiting for inputs, we throw._ + +### Variables for arrays and dictionaries. +It is allowed to define variables for arrays and dictionaries, as long as the item types are valid variable types (the four types listed above). +They are treated the same way as regular 'scalar' variables. + +If we want to reference an item of the collection, we can use the `[]` syntax: +- `$var[5]` denotes 5th element of an array variable. +- `$var[foo]` and `$var['foo']` both denote the element with key 'foo' of a dictionary variable. +_This is not yet implemented._ + +Conversely, if we want to build a collection (array or dictionary) of variables, we can do it using JSON arrays and objects: +- `["$v1", "$v2", "$v3"]` denotes an array containing 3 variables. +- `{"foo": "$v1", "bar": "$v2"}` denotes a collection containing 2 key-value pairs. +_This is also not yet implemented._ + +## Example of a JSON entry point manifest object, and the respective entry point graph node +Let's consider the following manifest snippet, describing an entry point _'CVSplit.Split'_: +``` + { + "name": "CVSplit.Split", + "desc": "Split the dataset into the specified number of cross-validation folds (train and test sets)", + "inputs": [ + { + "name": "Data", + "type": "DataView", + "desc": "Input dataset", + "required": true + }, + { + "name": "NumFolds", + "type": "Int", + "desc": "Number of folds to split into", + "required": false, + "default": 2 + }, + { + "name": "StratificationColumn", + "type": "String", + "desc": "Stratification column", + "aliases": [ + "strat" + ], + "required": false, + "default": null + } + ], + "outputs": [ + { + "name": "TrainData", + "type": { + "kind": "Array", + "itemType": "DataView" + }, + "desc": "Training data (one dataset per fold)" + }, + { + "name": "TestData", + "type": { + "kind": "Array", + "itemType": "DataView" + }, + "desc": "Testing data (one dataset per fold)" + } + ] + } +``` + +As we can see, the entry point has 3 inputs (one of them required), and 2 outputs. +The following is a correct graph containing call to this entry point: +``` +{ + "nodes": [ + { + "name": "CVSplit.Split", + "inputs": { + "Data": "$data1" + }, + "outputs": { + "TrainData": "$cv" + } + }] +} +``` \ No newline at end of file From 63b9fe846359680b8dfffc3fade8ffefa3a652de Mon Sep 17 00:00:00 2001 From: Senja Filipi Date: Tue, 5 Jun 2018 09:35:26 -0700 Subject: [PATCH 2/9] addressing PR feedback --- docs/code/EntryPoints.md | 289 +++++++++++++++++++++++++++------------ docs/code/GraphRunner.md | 39 +++--- 2 files changed, 220 insertions(+), 108 deletions(-) diff --git a/docs/code/EntryPoints.md b/docs/code/EntryPoints.md index 06ae8312d7..fc06fda963 100644 --- a/docs/code/EntryPoints.md +++ b/docs/code/EntryPoints.md @@ -1,120 +1,227 @@ -# Overview +# Entry Points And Helper Classes -An 'entry point', is a representation of a ML.Net type in json format and it is used to serialize and deserialize an ML.Net type in JSON. -It is also one of the ways ML.Net uses to deserialize experiments, and the recommended way to interface with other languages. -In terms defining experiments w.r.t entry points, experiments are entry points DAGs, and respectively, entry points are experiment graph nodes. +## Overview + +An 'entry point', is a representation of a ML.NET type in JSON format. Entry points are used to serialize and deserialize an ML.NET type in JSON. +It is also the recommended way to interface with other languages. +Defined based on entry points, experiments are entry points DAGs, and respectively, entry points are experiment graph nodes. That's why through the documentaiton, we also refer to them as 'entry points nodes'. -The graph 'variables', the various values of the experiemnt graph json properties serve to describe the relationship between the entry point nodes. +The graph 'variables', the various values of the experiment graph JSON properties serve to describe the relationship between the entry point nodes. The 'variables' are therefore the edges of the DAG. -All of ML.Net entry points are described by their manifest. The manifest is another json object that documents and describes the structure of an entry points. +All of ML.NET entry points are described by their manifest. The manifest is another JSON object that documents and describes the structure of an entry points. Manifests are referenced to understand what an entry point does, and how it should be constructed, in a graph. -This document briefly describes the structure of the entry points, the structure of an entry point manifest, and mentions the ML.Net classes that help construct an entry point +This document briefly describes the structure of the entry points, the structure of an entry point manifest, and mentions the ML.NET classes that help construct an entry point graph. -## `EntryPoint manifest - the definition of an entry point` +## EntryPoint manifest - the definition of an entry point An example of an entry point manifest object, specifically for the MissingValueIndicator transform, is: ```javascript - { - "Name": "Transforms.MissingValueIndicator", - "Desc": "Create a boolean output column with the same number of slots as the input column, where the output value is true if the value in the input column is missing.", - "FriendlyName": "NA Indicator Transform", - "ShortName": "NAInd", - "Inputs": [ +{ + "Name": "Transforms.ColumnTypeConverter", + "Desc": "Converts a column to a different type, using standard conversions.", + "FriendlyName": "Convert Transform", + "ShortName": "Convert", + "Inputs": [ { - "Name": "Column", - "Type": { - "Kind": "Array", - "ItemType": { - "Kind": "Struct", - "Fields": [ - { - "Name": "Name", - "Type": "String", - "Desc": "Name of the new column", - "Aliases": [ - "name" - ], - "Required": false, - "SortOrder": 150.0, - "IsNullable": false, - "Default": null - }, - { - "Name": "Source", - "Type": "String", - "Desc": "Name of the source column", - "Aliases": [ - "src" - ], - "Required": false, - "SortOrder": 150.0, - "IsNullable": false, - "Default": null + "Name": "Column", + "Type": { + "Kind": "Array", + "ItemType": { + "Kind": "Struct", + "Fields": [ + { + "Name": "ResultType", + "Type": { + "Kind": "Enum", + "Values": [ + "I1", + "U1", + "I2", + "U2", + "I4", + "U4", + "I8", + "U8", + "R4", + "Num", + "R8", + "TX", + "Text", + "TXT", + "BL", + "Bool", + "TimeSpan", + "TS", + "DT", + "DateTime", + "DZ", + "DateTimeZone", + "UG", + "U16" + ] + }, + "Desc": "The result type", + "Aliases": [ + "type" + ], + "Required": false, + "SortOrder": 150, + "IsNullable": true, + "Default": null + }, + { + "Name": "Range", + "Type": "String", + "Desc": "For a key column, this defines the range of values", + "Aliases": [ + "key" + ], + "Required": false, + "SortOrder": 150, + "IsNullable": false, + "Default": null + }, + { + "Name": "Name", + "Type": "String", + "Desc": "Name of the new column", + "Aliases": [ + "name" + ], + "Required": false, + "SortOrder": 150, + "IsNullable": false, + "Default": null + }, + { + "Name": "Source", + "Type": "String", + "Desc": "Name of the source column", + "Aliases": [ + "src" + ], + "Required": false, + "SortOrder": 150, + "IsNullable": false, + "Default": null + } + ] } - ] - } - }, - "Desc": "New column definition(s) (optional form: name:src)", - "Aliases": [ - "col" - ], - "Required": true, - "SortOrder": 1.0, - "IsNullable": false + }, + "Desc": "New column definition(s) (optional form: name:type:src)", + "Aliases": [ + "col" + ], + "Required": true, + "SortOrder": 1, + "IsNullable": false + }, + { + "Name": "Data", + "Type": "DataView", + "Desc": "Input dataset", + "Required": true, + "SortOrder": 2, + "IsNullable": false + }, + { + "Name": "ResultType", + "Type": { + "Kind": "Enum", + "Values": [ + "I1", + "U1", + "I2", + "U2", + "I4", + "U4", + "I8", + "U8", + "R4", + "Num", + "R8", + "TX", + "Text", + "TXT", + "BL", + "Bool", + "TimeSpan", + "TS", + "DT", + "DateTime", + "DZ", + "DateTimeZone", + "UG", + "U16" + ] + }, + "Desc": "The result type", + "Aliases": [ + "type" + ], + "Required": false, + "SortOrder": 2, + "IsNullable": true, + "Default": null }, { - "Name": "Data", - "Type": "DataView", - "Desc": "Input dataset", - "Required": true, - "SortOrder": 1.0, - "IsNullable": false + "Name": "Range", + "Type": "String", + "Desc": "For a key column, this defines the range of values", + "Aliases": [ + "key" + ], + "Required": false, + "SortOrder": 150, + "IsNullable": false, + "Default": null } - ], - "Outputs": [ + ], + "Outputs": [ { - "Name": "OutputData", - "Type": "DataView", - "Desc": "Transformed dataset" + "Name": "OutputData", + "Type": "DataView", + "Desc": "Transformed dataset" }, { - "Name": "Model", - "Type": "TransformModel", - "Desc": "Transform model" + "Name": "Model", + "Type": "TransformModel", + "Desc": "Transform model" } - ], - "InputKind": [ + ], + "InputKind": [ "ITransformInput" - ], - "OutputKind": [ + ], + "OutputKind": [ "ITransformOutput" - ] - } + ] +} ``` The respective entry point, constructed based on this manifest would be: ```javascript - { - "Name": "Transforms.MissingValueIndicator", - "Inputs": { - "Column": [ - { - "Name": "Features", - "Source": "Features" - } - ], - "Data": "$data0" - }, - "Outputs": { - "OutputData": "$Output_1528136517433", - "Model": "$TransformModel_1528136517433" - } - } + { + "Name": "Transforms.ColumnTypeConverter", + "Inputs": { + "Column": [ + { + "Name": "Features", + "Source": "Features" + } + ], + "Data": "$data0", + "ResultType": "R4" + }, + "Outputs": { + "OutputData": "$Convert_Output", + "Model": "$Convert_TransformModel" + } + } ``` ## `EntryPointGraph` @@ -168,7 +275,7 @@ allowed as element types for a variable. This class is just a container for all the variables in a graph. -## VariableBinding and Derived Classes +## `VariableBinding` and Derived Classes The abstract base class represents a "pointer to a (part of a) variable". It is used in conjunction with `ParameterBinding`s to specify inputs to an entry @@ -177,7 +284,7 @@ the `ArrayIndexVariableBinding` is a pointer to a specific index in an array variable, and the `DictionaryKeyVariableBinding` is a pointer to a specific key in a dictionary variable. -## ParameterBinding and Derived Classes +## `ParameterBinding` and Derived Classes The abstract base class represents a "pointer to a (part of a) parameter". It parallels the `VariableBinding` hierarchy and it is used to specify the inputs diff --git a/docs/code/GraphRunner.md b/docs/code/GraphRunner.md index a62e9714b2..074b959dd0 100644 --- a/docs/code/GraphRunner.md +++ b/docs/code/GraphRunner.md @@ -1,6 +1,9 @@ # JSON Graph format -The entry point graph in TLC is an array of _nodes_. Each node is an object with the following fields: +The entry point graph in TLC is an array of _nodes_. More information about the definition of entry points and classes that help construct entry point graphs +can be found in the [EntryPoint.md document](./EntryPoints.md). + +Each node is an object with the following fields: - _name_: string. Required. Name of the entry point. - _inputs_: object. Optional. Specifies non-default inputs to the entry point. @@ -10,27 +13,27 @@ Note that if the entry point has required inputs (which is very common), the _in ## Input and output types The following types are supported in JSON graphs: -- _string_. Represented as a JSON string, maps to a C# string. -- _float_. Represented as a JSON float, maps to a C# float or double. -- _bool_. Represented as a JSON bool, maps to a C# bool. -- _enum_. Represented as a JSON string, maps to a C# enum. The allowed values are those of the C# enum (they are also listed in the manifest). -- _int_. Currently not implemented. Represented as a JSON integer, maps to a C# int or long. -- _array_ of the above. Represented as a JSON array, maps to a C# array. -- _dictionary_. Currently not implemented. Represented as a JSON object, maps to a C# `Dictionary`. -- _component_. Currently not implemented. Represented as a JSON object with 2 fields: _name_:string and _settings_:object. +- `string`. Represented as a JSON string, maps to a C# string. +- `float`. Represented as a JSON float, maps to a C# float or double. +- `bool`. Represented as a JSON bool, maps to a C# bool. +- `enum`. Represented as a JSON string, maps to a C# enum. The allowed values are those of the C# enum (they are also listed in the manifest). +- `int`. Represented as a JSON integer, maps to a C# int or long. +- `array` of the above. Represented as a JSON array, maps to a C# array. +- `dictionary`. Currently not implemented. Represented as a JSON object, maps to a C# `Dictionary`. +- `component`. Represented as a JSON object with 2 fields: _name_:string and _settings_:object. ## Variables The following input/output types can not be represented as a JSON value: -- _DataView_ -- _FileHandle_ -- _TransformModel_ -- _PredictorModel_ +- `IDataView` +- `IFileHandle` +- `ITransformModel` +- `IPredictorModel` -These must be passed as _variables_. The variable is represented as a JSON string that begins with "$". +These must be passed as _variables_. The variable is represented as a JSON string that begins with `$`. Note the following rules: - A variable can appear in the _outputs_ only once per graph. That is, the variable can be 'assigned' only once. -- If the variable is present in _inputs_ of one node and in the _outputs_ of another node, this signifies the graph 'edge'. +- If the variable is present in _inputs_ of one node and in the _outputs_ of another node, this signifies a graph 'edge'. The same variable can participate in many edges. - If the variable is present only in _inputs_, but never in _outputs_, it is a _graph input_. All graph inputs must be provided before a graph can be run. @@ -55,7 +58,8 @@ _This is also not yet implemented._ ## Example of a JSON entry point manifest object, and the respective entry point graph node Let's consider the following manifest snippet, describing an entry point _'CVSplit.Split'_: -``` + +```javascript { "name": "CVSplit.Split", "desc": "Split the dataset into the specified number of cross-validation folds (train and test sets)", @@ -107,7 +111,8 @@ Let's consider the following manifest snippet, describing an entry point _'CVSpl As we can see, the entry point has 3 inputs (one of them required), and 2 outputs. The following is a correct graph containing call to this entry point: -``` + +```javascript { "nodes": [ { From 73cb7c848da620dfbfa5d6825a5018526f71e3e4 Mon Sep 17 00:00:00 2001 From: Senja Filipi Date: Tue, 5 Jun 2018 10:22:49 -0700 Subject: [PATCH 3/9] Updating the title of the GraphRunner.md file --- docs/code/GraphRunner.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/code/GraphRunner.md b/docs/code/GraphRunner.md index 074b959dd0..55cf911fee 100644 --- a/docs/code/GraphRunner.md +++ b/docs/code/GraphRunner.md @@ -1,4 +1,4 @@ -# JSON Graph format +# Entry Point JSON Graph format The entry point graph in TLC is an array of _nodes_. More information about the definition of entry points and classes that help construct entry point graphs can be found in the [EntryPoint.md document](./EntryPoints.md). From a962fe9f87e4eaf29cba7ddae5c6222b5c30a343 Mon Sep 17 00:00:00 2001 From: Senja Filipi Date: Mon, 11 Jun 2018 16:17:05 -0700 Subject: [PATCH 4/9] adressing Tom's feedback --- docs/code/EntryPoints.md | 134 +++++++++------------------------------ 1 file changed, 29 insertions(+), 105 deletions(-) diff --git a/docs/code/EntryPoints.md b/docs/code/EntryPoints.md index fc06fda963..b6f2f955e3 100644 --- a/docs/code/EntryPoints.md +++ b/docs/code/EntryPoints.md @@ -2,10 +2,11 @@ ## Overview -An 'entry point', is a representation of a ML.NET type in JSON format. Entry points are used to serialize and deserialize an ML.NET type in JSON. -It is also the recommended way to interface with other languages. -Defined based on entry points, experiments are entry points DAGs, and respectively, entry points are experiment graph nodes. -That's why through the documentaiton, we also refer to them as 'entry points nodes'. +Entry-points are a way to interface with ML.NET components, by specifying an execution graph of connected inputs and outputs of those components. +Both the manifest describing available components and their inputs/outputs, and an "experiment" graph description, are expressed in JSON. +The recommended way of interacting with ML.NET through other programming languages is by composing, and exchanging pipeline or experiment graphs. + +Through the documentaiton, we also refer to them as 'entry points nodes', and not just entry points, and that is because they are used as nodes of the experiemnt graphs. The graph 'variables', the various values of the experiment graph JSON properties serve to describe the relationship between the entry point nodes. The 'variables' are therefore the edges of the DAG. @@ -26,8 +27,7 @@ An example of an entry point manifest object, specifically for the MissingValueI "FriendlyName": "Convert Transform", "ShortName": "Convert", "Inputs": [ - { - "Name": "Column", + { "Name": "Column", "Type": { "Kind": "Array", "ItemType": { @@ -37,73 +37,37 @@ An example of an entry point manifest object, specifically for the MissingValueI "Name": "ResultType", "Type": { "Kind": "Enum", - "Values": [ - "I1", - "U1", - "I2", - "U2", - "I4", - "U4", - "I8", - "U8", - "R4", - "Num", - "R8", - "TX", - "Text", - "TXT", - "BL", - "Bool", - "TimeSpan", - "TS", - "DT", - "DateTime", - "DZ", - "DateTimeZone", - "UG", - "U16" - ] + "Values": [ "I1","I2","U2","I4","U4","I8","U8","R4","Num","R8","TX","Text","TXT","BL","Bool","TimeSpan","TS","DT","DateTime","DZ","DateTimeZone","UG","U16"] }, "Desc": "The result type", - "Aliases": [ - "type" - ], + "Aliases": [ "type" ], "Required": false, "SortOrder": 150, "IsNullable": true, "Default": null }, - { - "Name": "Range", + { "Name": "Range", "Type": "String", "Desc": "For a key column, this defines the range of values", - "Aliases": [ - "key" - ], + "Aliases": [ "key" ], "Required": false, "SortOrder": 150, "IsNullable": false, "Default": null }, - { - "Name": "Name", + { "Name": "Name", "Type": "String", "Desc": "Name of the new column", - "Aliases": [ - "name" - ], + "Aliases": [ "name" ], "Required": false, "SortOrder": 150, "IsNullable": false, "Default": null }, - { - "Name": "Source", + { "Name": "Source", "Type": "String", "Desc": "Name of the source column", - "Aliases": [ - "src" - ], + "Aliases": ["src"], "Required": false, "SortOrder": 150, "IsNullable": false, @@ -113,68 +77,34 @@ An example of an entry point manifest object, specifically for the MissingValueI } }, "Desc": "New column definition(s) (optional form: name:type:src)", - "Aliases": [ - "col" - ], + "Aliases": ["col"], "Required": true, "SortOrder": 1, "IsNullable": false }, - { - "Name": "Data", + { "Name": "Data", "Type": "DataView", "Desc": "Input dataset", "Required": true, "SortOrder": 2, "IsNullable": false }, - { - "Name": "ResultType", + { "Name": "ResultType", "Type": { "Kind": "Enum", - "Values": [ - "I1", - "U1", - "I2", - "U2", - "I4", - "U4", - "I8", - "U8", - "R4", - "Num", - "R8", - "TX", - "Text", - "TXT", - "BL", - "Bool", - "TimeSpan", - "TS", - "DT", - "DateTime", - "DZ", - "DateTimeZone", - "UG", - "U16" - ] + "Values": [ "I1","I2","U2","I4","U4","I8","U8","R4","Num","R8","TX","Text","TXT","BL","Bool","TimeSpan","TS","DT","DateTime","DZ","DateTimeZone","UG","U16"] }, "Desc": "The result type", - "Aliases": [ - "type" - ], + "Aliases": ["type" ], "Required": false, "SortOrder": 2, "IsNullable": true, "Default": null }, - { - "Name": "Range", + { "Name": "Range", "Type": "String", "Desc": "For a key column, this defines the range of values", - "Aliases": [ - "key" - ], + "Aliases": ["key"], "Required": false, "SortOrder": 150, "IsNullable": false, @@ -182,10 +112,10 @@ An example of an entry point manifest object, specifically for the MissingValueI } ], "Outputs": [ - { + { "Name": "OutputData", "Type": "DataView", - "Desc": "Transformed dataset" + "Desc": "Transformed dataset" }, { "Name": "Model", @@ -193,12 +123,8 @@ An example of an entry point manifest object, specifically for the MissingValueI "Desc": "Transform model" } ], - "InputKind": [ - "ITransformInput" - ], - "OutputKind": [ - "ITransformOutput" - ] + "InputKind": ["ITransformInput"], + "OutputKind": ["ITransformOutput"] } ``` @@ -208,12 +134,10 @@ The respective entry point, constructed based on this manifest would be: { "Name": "Transforms.ColumnTypeConverter", "Inputs": { - "Column": [ - { - "Name": "Features", - "Source": "Features" - } - ], + "Column": [{ + "Name": "Features", + "Source": "Features" + }], "Data": "$data0", "ResultType": "R4" }, From 12d3537e56f8f5f719cbb6e473c5862f7ac8da67 Mon Sep 17 00:00:00 2001 From: Senja Filipi Date: Tue, 12 Jun 2018 08:14:52 -0700 Subject: [PATCH 5/9] adressing feedback --- docs/code/EntryPoints.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/code/EntryPoints.md b/docs/code/EntryPoints.md index b6f2f955e3..cf1eb4f382 100644 --- a/docs/code/EntryPoints.md +++ b/docs/code/EntryPoints.md @@ -4,7 +4,7 @@ Entry-points are a way to interface with ML.NET components, by specifying an execution graph of connected inputs and outputs of those components. Both the manifest describing available components and their inputs/outputs, and an "experiment" graph description, are expressed in JSON. -The recommended way of interacting with ML.NET through other programming languages is by composing, and exchanging pipeline or experiment graphs. +The recommended way of interacting with ML.NET through other, non-.NET programming languages, is by composing, and exchanging pipeline or experiment graphs. Through the documentaiton, we also refer to them as 'entry points nodes', and not just entry points, and that is because they are used as nodes of the experiemnt graphs. The graph 'variables', the various values of the experiment graph JSON properties serve to describe the relationship between the entry point nodes. @@ -37,7 +37,7 @@ An example of an entry point manifest object, specifically for the MissingValueI "Name": "ResultType", "Type": { "Kind": "Enum", - "Values": [ "I1","I2","U2","I4","U4","I8","U8","R4","Num","R8","TX","Text","TXT","BL","Bool","TimeSpan","TS","DT","DateTime","DZ","DateTimeZone","UG","U16"] + "Values": [ "I1","I2","U2","I4","U4","I8","U8","R4","Num","R8","TX","Text","TXT","BL","Bool","TimeSpan","TS","DT","DateTime","DZ","DateTimeZone","UG","U16" ] }, "Desc": "The result type", "Aliases": [ "type" ], @@ -67,7 +67,7 @@ An example of an entry point manifest object, specifically for the MissingValueI { "Name": "Source", "Type": "String", "Desc": "Name of the source column", - "Aliases": ["src"], + "Aliases": [ "src" ], "Required": false, "SortOrder": 150, "IsNullable": false, @@ -77,7 +77,7 @@ An example of an entry point manifest object, specifically for the MissingValueI } }, "Desc": "New column definition(s) (optional form: name:type:src)", - "Aliases": ["col"], + "Aliases": [ "col" ], "Required": true, "SortOrder": 1, "IsNullable": false @@ -92,10 +92,10 @@ An example of an entry point manifest object, specifically for the MissingValueI { "Name": "ResultType", "Type": { "Kind": "Enum", - "Values": [ "I1","I2","U2","I4","U4","I8","U8","R4","Num","R8","TX","Text","TXT","BL","Bool","TimeSpan","TS","DT","DateTime","DZ","DateTimeZone","UG","U16"] + "Values": [ "I1","I2","U2","I4","U4","I8","U8","R4","Num","R8","TX","Text","TXT","BL","Bool","TimeSpan","TS","DT","DateTime","DZ","DateTimeZone","UG","U16" ] }, "Desc": "The result type", - "Aliases": ["type" ], + "Aliases": [ "type" ], "Required": false, "SortOrder": 2, "IsNullable": true, @@ -104,7 +104,7 @@ An example of an entry point manifest object, specifically for the MissingValueI { "Name": "Range", "Type": "String", "Desc": "For a key column, this defines the range of values", - "Aliases": ["key"], + "Aliases": [ "key" ], "Required": false, "SortOrder": 150, "IsNullable": false, @@ -123,8 +123,8 @@ An example of an entry point manifest object, specifically for the MissingValueI "Desc": "Transform model" } ], - "InputKind": ["ITransformInput"], - "OutputKind": ["ITransformOutput"] + "InputKind": ["ITransformInput" ], + "OutputKind": [ "ITransformOutput" ] } ``` @@ -136,7 +136,7 @@ The respective entry point, constructed based on this manifest would be: "Inputs": { "Column": [{ "Name": "Features", - "Source": "Features" + "Source": "Features" }], "Data": "$data0", "ResultType": "R4" From cca4f43fd61609be410516aa4f38f4bd1a74fb9a Mon Sep 17 00:00:00 2001 From: Senja Filipi Date: Tue, 12 Jun 2018 08:40:18 -0700 Subject: [PATCH 6/9] code formatting for class names --- docs/code/EntryPoints.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/code/EntryPoints.md b/docs/code/EntryPoints.md index cf1eb4f382..ae87872478 100644 --- a/docs/code/EntryPoints.md +++ b/docs/code/EntryPoints.md @@ -18,7 +18,7 @@ graph. ## EntryPoint manifest - the definition of an entry point -An example of an entry point manifest object, specifically for the MissingValueIndicator transform, is: +An example of an entry point manifest object, specifically for the `MissingValueIndicator` transform, is: ```javascript { From 55174f3087c8a6b08d8929ae892359e4890fbdd4 Mon Sep 17 00:00:00 2001 From: Senja Filipi Date: Mon, 18 Jun 2018 16:15:33 -0700 Subject: [PATCH 7/9] Addressing Gal's comments --- docs/code/EntryPoints.md | 29 ++++++++++++++++++++--------- docs/code/GraphRunner.md | 6 +++--- 2 files changed, 23 insertions(+), 12 deletions(-) diff --git a/docs/code/EntryPoints.md b/docs/code/EntryPoints.md index ae87872478..581a01972f 100644 --- a/docs/code/EntryPoints.md +++ b/docs/code/EntryPoints.md @@ -2,23 +2,23 @@ ## Overview -Entry-points are a way to interface with ML.NET components, by specifying an execution graph of connected inputs and outputs of those components. +Entry points are a way to interface with ML.NET components, by specifying an execution graph of connected inputs and outputs of those components. Both the manifest describing available components and their inputs/outputs, and an "experiment" graph description, are expressed in JSON. -The recommended way of interacting with ML.NET through other, non-.NET programming languages, is by composing, and exchanging pipeline or experiment graphs. +The recommended way of interacting with ML.NET through other, non-.NET programming languages, is by composing, and exchanging pipelines or experiment graphs. -Through the documentaiton, we also refer to them as 'entry points nodes', and not just entry points, and that is because they are used as nodes of the experiemnt graphs. -The graph 'variables', the various values of the experiment graph JSON properties serve to describe the relationship between the entry point nodes. -The 'variables' are therefore the edges of the DAG. +Through the documentation, we also refer to entry points as 'entry points nodes', and that is because they are the nodes of the graph representing the experiment. +The graph 'variables', the various values of the experiment graph JSON properties, serve to describe the relationship between the entry point nodes. +The 'variables' are therefore the edges of the DAG (Directed Acyclic Graph). All of ML.NET entry points are described by their manifest. The manifest is another JSON object that documents and describes the structure of an entry points. Manifests are referenced to understand what an entry point does, and how it should be constructed, in a graph. -This document briefly describes the structure of the entry points, the structure of an entry point manifest, and mentions the ML.NET classes that help construct an entry point -graph. +This document briefly describes the structure of the entry points, the structure of an entry point manifest, and mentions the ML.NET classes that help construct an entry point graph. ## EntryPoint manifest - the definition of an entry point -An example of an entry point manifest object, specifically for the `MissingValueIndicator` transform, is: +The components manifest is build by scanning the ML.Net assemblies through reflection and searching for types having the: `SignatureEntryPointModule` signature in their `LoadableClass` assembly attribute definition. +An example of an entry point manifest object, specifically for the `ColumnTypeConverter` transform, is: ```javascript { @@ -216,4 +216,15 @@ to an entry point node. The `SimpleParameterBinding` is a pointer to a non-array, non-dictionary parameter, the `ArrayIndexParameterBinding` is a pointer to a specific index of an array parameter and the `DictionaryKeyParameterBinding` is a pointer to a specific key of a dictionary -parameter. \ No newline at end of file +parameter. + +## How to create an entry point for an existing ML.Net component + +The steps to take, to create an entry point for an existing ML.Net component, are: +1. Add the `SignatureEntryPointModule` signature to the `LoadableClass` assembly attribute. +2. Create a public static method, that: + a. Takes as input, among others, an object representing the arguments of the component you want to expose. + b. Initializes and run the components, returning one of the nested classes of `Microsoft.ML.Runtime.EntryPoints.CommonOutputs` + c. Is annotated with the `TlcModule.EntryPoint` attribute + +Based on the type of entry point being created, there are further conventions on the name of the method, for example, the Trainers entry points are typically called: 'TrainMultiClass', 'TrainBinary' etc, based on the task. \ No newline at end of file diff --git a/docs/code/GraphRunner.md b/docs/code/GraphRunner.md index 55cf911fee..d8be5bb2fd 100644 --- a/docs/code/GraphRunner.md +++ b/docs/code/GraphRunner.md @@ -1,6 +1,6 @@ # Entry Point JSON Graph format -The entry point graph in TLC is an array of _nodes_. More information about the definition of entry points and classes that help construct entry point graphs +The entry point graph in ML.Net is an array of _nodes_. More information about the definition of entry points and classes that help construct entry point graphs can be found in the [EntryPoint.md document](./EntryPoints.md). Each node is an object with the following fields: @@ -38,8 +38,8 @@ The same variable can participate in many edges. - If the variable is present only in _inputs_, but never in _outputs_, it is a _graph input_. All graph inputs must be provided before a graph can be run. - The variable has a type, which is the type of inputs (and, optionally, output) that it appears in. If the type of the variable is -ambiguous, TLC throws an exception. -- Circular references. The experiment graph is expected to be a DAG. If the circular dependency is detected, TLC throws an exception. +ambiguous, ML.Net throws an exception. +- Circular references. The experiment graph is expected to be a DAG. If the circular dependency is detected, ML.Net throws an exception. _Currently, this is done lazily: if we couldn't ever run a node because it's waiting for inputs, we throw._ ### Variables for arrays and dictionaries. From 14a727c48e790c37a22d3cd3e0243ebef944df3c Mon Sep 17 00:00:00 2001 From: Senja Filipi Date: Tue, 19 Jun 2018 08:10:33 -0700 Subject: [PATCH 8/9] Adding an example of an entry point. Fixing casing on ML.NET --- docs/code/EntryPoints.md | 9 +++++---- docs/code/GraphRunner.md | 8 ++++---- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/docs/code/EntryPoints.md b/docs/code/EntryPoints.md index 581a01972f..098632ffae 100644 --- a/docs/code/EntryPoints.md +++ b/docs/code/EntryPoints.md @@ -17,7 +17,7 @@ This document briefly describes the structure of the entry points, the structure ## EntryPoint manifest - the definition of an entry point -The components manifest is build by scanning the ML.Net assemblies through reflection and searching for types having the: `SignatureEntryPointModule` signature in their `LoadableClass` assembly attribute definition. +The components manifest is build by scanning the ML.NET assemblies through reflection and searching for types having the: `SignatureEntryPointModule` signature in their `LoadableClass` assembly attribute definition. An example of an entry point manifest object, specifically for the `ColumnTypeConverter` transform, is: ```javascript @@ -218,13 +218,14 @@ pointer to a specific index of an array parameter and the `DictionaryKeyParameterBinding` is a pointer to a specific key of a dictionary parameter. -## How to create an entry point for an existing ML.Net component +## How to create an entry point for an existing ML.NET component -The steps to take, to create an entry point for an existing ML.Net component, are: +The steps to take, to create an entry point for an existing ML.NET component, are: 1. Add the `SignatureEntryPointModule` signature to the `LoadableClass` assembly attribute. 2. Create a public static method, that: a. Takes as input, among others, an object representing the arguments of the component you want to expose. b. Initializes and run the components, returning one of the nested classes of `Microsoft.ML.Runtime.EntryPoints.CommonOutputs` c. Is annotated with the `TlcModule.EntryPoint` attribute -Based on the type of entry point being created, there are further conventions on the name of the method, for example, the Trainers entry points are typically called: 'TrainMultiClass', 'TrainBinary' etc, based on the task. \ No newline at end of file +Based on the type of entry point being created, there are further conventions on the name of the method, for example, the Trainers entry points are typically called: 'TrainMultiClass', 'TrainBinary' etc, based on the task. +Look at [OnlineGradientDescent](.././src/Microsoft.ML.StandardLearners/Standard/Online/OnlineGradientDescent.cs) for an example of a component and its entry point. \ No newline at end of file diff --git a/docs/code/GraphRunner.md b/docs/code/GraphRunner.md index d8be5bb2fd..b7fddc9476 100644 --- a/docs/code/GraphRunner.md +++ b/docs/code/GraphRunner.md @@ -1,13 +1,13 @@ # Entry Point JSON Graph format -The entry point graph in ML.Net is an array of _nodes_. More information about the definition of entry points and classes that help construct entry point graphs +The entry point graph in ML.NET is an array of _nodes_. More information about the definition of entry points and classes that help construct entry point graphs can be found in the [EntryPoint.md document](./EntryPoints.md). Each node is an object with the following fields: - _name_: string. Required. Name of the entry point. - _inputs_: object. Optional. Specifies non-default inputs to the entry point. -Note that if the entry point has required inputs (which is very common), the _inputs_ field is requred. +Note that if the entry point has required inputs (which is very common), the _inputs_ field is required. - _outputs_: object. Optional. Specifies the variables that will hold the node's outputs. ## Input and output types @@ -38,8 +38,8 @@ The same variable can participate in many edges. - If the variable is present only in _inputs_, but never in _outputs_, it is a _graph input_. All graph inputs must be provided before a graph can be run. - The variable has a type, which is the type of inputs (and, optionally, output) that it appears in. If the type of the variable is -ambiguous, ML.Net throws an exception. -- Circular references. The experiment graph is expected to be a DAG. If the circular dependency is detected, ML.Net throws an exception. +ambiguous, ML.NET throws an exception. +- Circular references. The experiment graph is expected to be a DAG. If the circular dependency is detected, ML.NET throws an exception. _Currently, this is done lazily: if we couldn't ever run a node because it's waiting for inputs, we throw._ ### Variables for arrays and dictionaries. From e9b3a11fcd40ebbc0b142c659f099bfe8b1b508f Mon Sep 17 00:00:00 2001 From: Senja Filipi Date: Tue, 19 Jun 2018 08:13:56 -0700 Subject: [PATCH 9/9] fixing link --- docs/code/EntryPoints.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/code/EntryPoints.md b/docs/code/EntryPoints.md index 098632ffae..dbcc4e6bc9 100644 --- a/docs/code/EntryPoints.md +++ b/docs/code/EntryPoints.md @@ -228,4 +228,4 @@ The steps to take, to create an entry point for an existing ML.NET component, ar c. Is annotated with the `TlcModule.EntryPoint` attribute Based on the type of entry point being created, there are further conventions on the name of the method, for example, the Trainers entry points are typically called: 'TrainMultiClass', 'TrainBinary' etc, based on the task. -Look at [OnlineGradientDescent](.././src/Microsoft.ML.StandardLearners/Standard/Online/OnlineGradientDescent.cs) for an example of a component and its entry point. \ No newline at end of file +Look at [OnlineGradientDescent](../../src/Microsoft.ML.StandardLearners/Standard/Online/OnlineGradientDescent.cs) for an example of a component and its entry point. \ No newline at end of file