Skip to content

Commit 5da49a3

Browse files
committed
Adding EntryPoints.md and GraphRunner.md
1 parent 465b123 commit 5da49a3

File tree

2 files changed

+311
-0
lines changed

2 files changed

+311
-0
lines changed

docs/code/EntryPoints.md

+188
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Overview
2+
3+
An 'entry point', is a representation of a ML.Net type in json format and it is used to serialize and deserialize an ML.Net type in JSON.
4+
It is also one of the ways ML.Net uses to deserialize experiments, and the recommended way to interface with other languages.
5+
In terms defining experiments w.r.t entry points, experiments are entry points DAGs, and respectively, entry points are experiment graph nodes.
6+
That's why through the documentaiton, we also refer to them as 'entry points nodes'.
7+
The graph 'variables', the various values of the experiemnt graph json properties serve to describe the relationship between the entry point nodes.
8+
The 'variables' are therefore the edges of the DAG.
9+
10+
All of ML.Net entry points are described by their manifest. The manifest is another json object that documents and describes the structure of an entry points.
11+
Manifests are referenced to understand what an entry point does, and how it should be constructed, in a graph.
12+
13+
This document briefly describes the structure of the entry points, the structure of an entry point manifest, and mentions the ML.Net classes that help construct an entry point
14+
graph.
15+
16+
## `EntryPoint manifest - the definition of an entry point`
17+
18+
An example of an entry point manifest object, specifically for the MissingValueIndicator transform, is:
19+
20+
```javascript
21+
{
22+
"Name": "Transforms.MissingValueIndicator",
23+
"Desc": "Create a boolean output column with the same number of slots as the input column, where the output value is true if the value in the input column is missing.",
24+
"FriendlyName": "NA Indicator Transform",
25+
"ShortName": "NAInd",
26+
"Inputs": [
27+
{
28+
"Name": "Column",
29+
"Type": {
30+
"Kind": "Array",
31+
"ItemType": {
32+
"Kind": "Struct",
33+
"Fields": [
34+
{
35+
"Name": "Name",
36+
"Type": "String",
37+
"Desc": "Name of the new column",
38+
"Aliases": [
39+
"name"
40+
],
41+
"Required": false,
42+
"SortOrder": 150.0,
43+
"IsNullable": false,
44+
"Default": null
45+
},
46+
{
47+
"Name": "Source",
48+
"Type": "String",
49+
"Desc": "Name of the source column",
50+
"Aliases": [
51+
"src"
52+
],
53+
"Required": false,
54+
"SortOrder": 150.0,
55+
"IsNullable": false,
56+
"Default": null
57+
}
58+
]
59+
}
60+
},
61+
"Desc": "New column definition(s) (optional form: name:src)",
62+
"Aliases": [
63+
"col"
64+
],
65+
"Required": true,
66+
"SortOrder": 1.0,
67+
"IsNullable": false
68+
},
69+
{
70+
"Name": "Data",
71+
"Type": "DataView",
72+
"Desc": "Input dataset",
73+
"Required": true,
74+
"SortOrder": 1.0,
75+
"IsNullable": false
76+
}
77+
],
78+
"Outputs": [
79+
{
80+
"Name": "OutputData",
81+
"Type": "DataView",
82+
"Desc": "Transformed dataset"
83+
},
84+
{
85+
"Name": "Model",
86+
"Type": "TransformModel",
87+
"Desc": "Transform model"
88+
}
89+
],
90+
"InputKind": [
91+
"ITransformInput"
92+
],
93+
"OutputKind": [
94+
"ITransformOutput"
95+
]
96+
}
97+
```
98+
99+
The respective entry point, constructed based on this manifest would be:
100+
101+
```javascript
102+
{
103+
"Name": "Transforms.MissingValueIndicator",
104+
"Inputs": {
105+
"Column": [
106+
{
107+
"Name": "Features",
108+
"Source": "Features"
109+
}
110+
],
111+
"Data": "$data0"
112+
},
113+
"Outputs": {
114+
"OutputData": "$Output_1528136517433",
115+
"Model": "$TransformModel_1528136517433"
116+
}
117+
}
118+
```
119+
120+
## `EntryPointGraph`
121+
122+
This class encapsulates the list of nodes (`EntryPointNode`) and edges
123+
(`EntryPointVariable` inside a `RunContext`) of the graph.
124+
125+
## `EntryPointNode`
126+
127+
This class represents a node in the graph, and wraps an entry point call. It
128+
has methods for creating and running entry points. It also has a reference to
129+
the `RunContext` to allow it to get and set values from `EntryPointVariable`s.
130+
131+
To express the inputs that are set through variables, a set of dictionaries
132+
are used. The `InputBindingMap` maps an input parameter name to a list of
133+
`ParameterBinding`s. The `InputMap` maps a `ParameterBinding` to a
134+
`VariableBinding`. For example, if the JSON looks like this:
135+
136+
```javascript
137+
'foo': '$bar'
138+
```
139+
140+
the `InputBindingMap` will have one entry that maps the string "foo" to a list
141+
that has only one element, a `SimpleParameterBinding` with the name "foo" and
142+
the `InputMap` will map the `SimpleParameterBinding` to a
143+
`SimpleVariableBinding` with the name "bar". For a more complicated example,
144+
let's say we have this JSON:
145+
146+
```javascript
147+
'foo': [ '$bar[3]', '$baz']
148+
```
149+
150+
the `InputBindingMap` will have one entry that maps the string "foo" to a list
151+
that has two elements, an `ArrayIndexParameterBinding` with the name "foo" and
152+
index 0 and another one with index 1. The `InputMap` will map the first
153+
`ArrayIndexParameterBinding` to an `ArrayIndexVariableBinding` with name "bar"
154+
and index 3 and the second `ArrayIndexParameterBinding` to a
155+
`SimpleVariableBinding` with the name "baz".
156+
157+
For outputs, a node assumes that an output is mapped to a variable, so the
158+
`OutputMap` is a simple dictionary from string to string.
159+
160+
## `EntryPointVariable`
161+
162+
This class represents an edge in the entry point graph. It has a name, a type
163+
and a value. Variables can be simple, arrays and/or dictionaries. Currently,
164+
only data views, file handles, predictor models and transform models are
165+
allowed as element types for a variable.
166+
167+
## `RunContext`
168+
169+
This class is just a container for all the variables in a graph.
170+
171+
## VariableBinding and Derived Classes
172+
173+
The abstract base class represents a "pointer to a (part of a) variable". It
174+
is used in conjunction with `ParameterBinding`s to specify inputs to an entry
175+
point node. The `SimpleVariableBinding` is a pointer to an entire variable,
176+
the `ArrayIndexVariableBinding` is a pointer to a specific index in an array
177+
variable, and the `DictionaryKeyVariableBinding` is a pointer to a specific
178+
key in a dictionary variable.
179+
180+
## ParameterBinding and Derived Classes
181+
182+
The abstract base class represents a "pointer to a (part of a) parameter". It
183+
parallels the `VariableBinding` hierarchy and it is used to specify the inputs
184+
to an entry point node. The `SimpleParameterBinding` is a pointer to a
185+
non-array, non-dictionary parameter, the `ArrayIndexParameterBinding` is a
186+
pointer to a specific index of an array parameter and the
187+
`DictionaryKeyParameterBinding` is a pointer to a specific key of a dictionary
188+
parameter.

docs/code/GraphRunner.md

+123
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# JSON Graph format
2+
3+
The entry point graph in TLC is an array of _nodes_. Each node is an object with the following fields:
4+
5+
- _name_: string. Required. Name of the entry point.
6+
- _inputs_: object. Optional. Specifies non-default inputs to the entry point.
7+
Note that if the entry point has required inputs (which is very common), the _inputs_ field is requred.
8+
- _outputs_: object. Optional. Specifies the variables that will hold the node's outputs.
9+
10+
## Input and output types
11+
The following types are supported in JSON graphs:
12+
13+
- _string_. Represented as a JSON string, maps to a C# string.
14+
- _float_. Represented as a JSON float, maps to a C# float or double.
15+
- _bool_. Represented as a JSON bool, maps to a C# bool.
16+
- _enum_. Represented as a JSON string, maps to a C# enum. The allowed values are those of the C# enum (they are also listed in the manifest).
17+
- _int_. Currently not implemented. Represented as a JSON integer, maps to a C# int or long.
18+
- _array_ of the above. Represented as a JSON array, maps to a C# array.
19+
- _dictionary_. Currently not implemented. Represented as a JSON object, maps to a C# `Dictionary<string,T>`.
20+
- _component_. Currently not implemented. Represented as a JSON object with 2 fields: _name_:string and _settings_:object.
21+
22+
## Variables
23+
The following input/output types can not be represented as a JSON value:
24+
- _DataView_
25+
- _FileHandle_
26+
- _TransformModel_
27+
- _PredictorModel_
28+
29+
These must be passed as _variables_. The variable is represented as a JSON string that begins with "$".
30+
Note the following rules:
31+
32+
- A variable can appear in the _outputs_ only once per graph. That is, the variable can be 'assigned' only once.
33+
- If the variable is present in _inputs_ of one node and in the _outputs_ of another node, this signifies the graph 'edge'.
34+
The same variable can participate in many edges.
35+
- If the variable is present only in _inputs_, but never in _outputs_, it is a _graph input_. All graph inputs must be provided before
36+
a graph can be run.
37+
- The variable has a type, which is the type of inputs (and, optionally, output) that it appears in. If the type of the variable is
38+
ambiguous, TLC throws an exception.
39+
- Circular references. The experiment graph is expected to be a DAG. If the circular dependency is detected, TLC throws an exception.
40+
_Currently, this is done lazily: if we couldn't ever run a node because it's waiting for inputs, we throw._
41+
42+
### Variables for arrays and dictionaries.
43+
It is allowed to define variables for arrays and dictionaries, as long as the item types are valid variable types (the four types listed above).
44+
They are treated the same way as regular 'scalar' variables.
45+
46+
If we want to reference an item of the collection, we can use the `[]` syntax:
47+
- `$var[5]` denotes 5th element of an array variable.
48+
- `$var[foo]` and `$var['foo']` both denote the element with key 'foo' of a dictionary variable.
49+
_This is not yet implemented._
50+
51+
Conversely, if we want to build a collection (array or dictionary) of variables, we can do it using JSON arrays and objects:
52+
- `["$v1", "$v2", "$v3"]` denotes an array containing 3 variables.
53+
- `{"foo": "$v1", "bar": "$v2"}` denotes a collection containing 2 key-value pairs.
54+
_This is also not yet implemented._
55+
56+
## Example of a JSON entry point manifest object, and the respective entry point graph node
57+
Let's consider the following manifest snippet, describing an entry point _'CVSplit.Split'_:
58+
```
59+
{
60+
"name": "CVSplit.Split",
61+
"desc": "Split the dataset into the specified number of cross-validation folds (train and test sets)",
62+
"inputs": [
63+
{
64+
"name": "Data",
65+
"type": "DataView",
66+
"desc": "Input dataset",
67+
"required": true
68+
},
69+
{
70+
"name": "NumFolds",
71+
"type": "Int",
72+
"desc": "Number of folds to split into",
73+
"required": false,
74+
"default": 2
75+
},
76+
{
77+
"name": "StratificationColumn",
78+
"type": "String",
79+
"desc": "Stratification column",
80+
"aliases": [
81+
"strat"
82+
],
83+
"required": false,
84+
"default": null
85+
}
86+
],
87+
"outputs": [
88+
{
89+
"name": "TrainData",
90+
"type": {
91+
"kind": "Array",
92+
"itemType": "DataView"
93+
},
94+
"desc": "Training data (one dataset per fold)"
95+
},
96+
{
97+
"name": "TestData",
98+
"type": {
99+
"kind": "Array",
100+
"itemType": "DataView"
101+
},
102+
"desc": "Testing data (one dataset per fold)"
103+
}
104+
]
105+
}
106+
```
107+
108+
As we can see, the entry point has 3 inputs (one of them required), and 2 outputs.
109+
The following is a correct graph containing call to this entry point:
110+
```
111+
{
112+
"nodes": [
113+
{
114+
"name": "CVSplit.Split",
115+
"inputs": {
116+
"Data": "$data1"
117+
},
118+
"outputs": {
119+
"TrainData": "$cv"
120+
}
121+
}]
122+
}
123+
```

0 commit comments

Comments
 (0)