Skip to content

Commit acec961

Browse files
author
Ivan Matantsev
committed
merge with master
2 parents 055abae + 129b47c commit acec961

File tree

602 files changed

+7821
-7504
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

602 files changed

+7821
-7504
lines changed

Microsoft.ML.sln

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,12 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Microsoft.Data.DataView", "
260260
EndProject
261261
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "RemoteExecutorConsoleApp", "test\RemoteExecutorConsoleApp\RemoteExecutorConsoleApp.csproj", "{5E920CAC-5A28-42FB-936E-49C472130953}"
262262
EndProject
263+
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Microsoft.ML.Ensemble", "Microsoft.ML.Ensemble", "{AD7058C9-5608-49A8-BE23-58C33A74EE91}"
264+
ProjectSection(SolutionItems) = preProject
265+
pkg\Microsoft.ML.Ensemble\Microsoft.ML.Ensemble.nupkgproj = pkg\Microsoft.ML.Ensemble\Microsoft.ML.Ensemble.nupkgproj
266+
pkg\Microsoft.ML.Ensemble\Microsoft.ML.Ensemble.symbols.nupkgproj = pkg\Microsoft.ML.Ensemble\Microsoft.ML.Ensemble.symbols.nupkgproj
267+
EndProjectSection
268+
EndProject
263269
Global
264270
GlobalSection(SolutionConfigurationPlatforms) = preSolution
265271
Debug|Any CPU = Debug|Any CPU
@@ -1026,6 +1032,7 @@ Global
10261032
{85D0CAFD-2FE8-496A-88C7-585D35B94243} = {09EADF06-BE25-4228-AB53-95AE3E15B530}
10271033
{31D38B21-102B-41C0-9E0A-2FE0BF68D123} = {D3D38B03-B557-484D-8348-8BADEE4DF592}
10281034
{5E920CAC-5A28-42FB-936E-49C472130953} = {AED9C836-31E3-4F3F-8ABC-929555D3F3C4}
1035+
{AD7058C9-5608-49A8-BE23-58C33A74EE91} = {D3D38B03-B557-484D-8348-8BADEE4DF592}
10291036
EndGlobalSection
10301037
GlobalSection(ExtensibilityGlobals) = postSolution
10311038
SolutionGuid = {41165AF1-35BB-4832-A189-73060F82B01D}

build/BranchInfo.props

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<Project>
22
<PropertyGroup>
33
<MajorVersion>0</MajorVersion>
4-
<MinorVersion>11</MinorVersion>
4+
<MinorVersion>12</MinorVersion>
55
<PatchVersion>0</PatchVersion>
66
<PreReleaseLabel>preview</PreReleaseLabel>
77
</PropertyGroup>

docs/code/IDataViewDesignPrinciples.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ only when needed to satisfy a local request for information.
4747
The IDataView design fulfills the following design requirements:
4848

4949
* **General schema**: Each view carries schema information, which specifies
50-
the names and types of the view's columns, together with metadata associated
50+
the names and types of the view's columns, together with annotations associated
5151
with the columns. The system is optimized for a reasonably small number of
5252
columns (hundreds). See [here](#basics).
5353

@@ -112,14 +112,14 @@ The IDataView system design does *not* include the following:
112112
* **Multi-view schema information**: There is no direct support for specifying
113113
cross-view schema information, for example, that certain columns are primary
114114
keys, and that there are foreign key relationships among tables. However,
115-
the column metadata support, together with conventions, may be used to
115+
the column annotation support, together with conventions, may be used to
116116
represent such information.
117117

118118
* **Standard ML schema**: The IDataView system does not define, nor prescribe,
119119
standard ML schema representation. For example, it does not dictate
120120
representation of nor distinction between different semantic interpretations
121121
of columns, such as label, feature, score, weight, etc. However, the column
122-
metadata support, together with conventions, may be used to represent such
122+
annotation support, together with conventions, may be used to represent such
123123
interpretations.
124124

125125
* **Row count**: A view is not required to provide its row count. The
@@ -149,7 +149,7 @@ The IDataView system design does *not* include the following:
149149

150150
IDataView has general schema support, in that a view can have an arbitrary
151151
number of columns, each having an associated name, index, data type, and
152-
optional metadata.
152+
optional annotation.
153153

154154
Column names are case sensitive. Multiple columns can share the same name, in
155155
which case, one of the columns hides the others, in the sense that the name
@@ -177,7 +177,7 @@ The set of standard types will likely be expanded over time.
177177
The IDataView type system is specified in a separate document, *IDataView Type
178178
System Specification*.
179179

180-
IDataView provides a general mechanism for associating semantic metadata with
180+
IDataView provides a general mechanism for associating semantic annotations with
181181
columns, such as designating sets of score columns, names associated with the
182182
individual slots of a vector-valued column, values associated with a key type
183183
column, whether a column's data is normalized, etc.

docs/code/IDataViewImplementation.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -313,10 +313,10 @@ are initialized using the pseudo-random number generator in an `IHost` that
313313
changes from one to another. But, that's a bit nit-picky.
314314

315315
Note also: when we say functionally identical we include everything about it:
316-
not just the data, but the schema, its metadata, the implementation of
316+
not just the data, but the schema, its annotations, the implementation of
317317
shuffling, etc. For this reason, while serializing the data *model* has
318318
guarantees of consistency, serializing the *data* has no such guarantee: if
319-
you serialize data using the text saver, practically all metadata (except slot
319+
you serialize data using the text saver, practically all annotations (except slot
320320
names) will be completely lost, which can have implications on how some
321321
transforms and downstream processes work. Or: if you serialize data using the
322322
binary saver, suddenly it may become shufflable whereas it may not have been
@@ -475,7 +475,7 @@ helpful).
475475

476476
The schema contains information about the columns. As we see in [the design
477477
principles](IDataViewDesignPrinciples.md), it has index, data type, and
478-
optional metadata.
478+
optional annotations.
479479

480480
While *programmatically* accesses to an `IDataView` are by index, from a
481481
user's perspective the indices are by name; most training algorithms
@@ -498,20 +498,20 @@ things like key-types and vector-types, when returned, should not be created
498498
in the function itself (thereby creating a new object every time), but rather
499499
stored somewhere and returned.
500500

501-
## Metadata
501+
## Annotations
502502

503-
Since metadata is *optional*, one is not obligated to necessarily produce it,
503+
Since annotations are *optional*, one is not obligated to necessarily produce it,
504504
or conform to any particular schemas for any particular kinds (beyond, say,
505505
the obvious things like making sure that the types and values are consistent).
506506
However, the flip side of that freedom given to *producers*, is that
507507
*consumers* are obligated, when processing a data view input, to react
508-
gracefully when metadata of a certain kind is absent, or not in a form that
509-
one expects. One should *never* fail when input metadata is in a form one does
508+
gracefully when an annotation of a certain kind is absent, or not in a form that
509+
one expects. One should *never* fail when input annotations are in a form one does
510510
not expect.
511511

512512
To give a practical example of this: many transforms, learners, or other
513513
components that process `IDataView`s will do something with the slot names,
514-
but when the `SlotNames` metadata kind for a given column is either absent,
514+
but when the `SlotNames` annotation kind for a given column is either absent,
515515
*or* not of the right type (vectors of strings), *or* not of the right size
516516
(same length vectors as the input), the behavior is not to throw or yield
517517
errors or do anything of the kind, but to simply say, "oh, I don't really have

docs/code/IDataViewTypeSystem.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ components. At a high level, it is analogous to the .Net interface
6363
While `IEnumerable<T>` is a sequence of objects of type `T`, `IDataView` is a
6464
sequence of rows. An `IDataView` object has an associated `ISchema` object
6565
that defines the `IDataView`'s columns, including their names, types, indices,
66-
and associated metadata. Each row of the `IDataView` has a value for each
66+
and associated annotations. Each row of the `IDataView` has a value for each
6767
column defined by the schema.
6868

6969
Just as `IEnumerable<T>` has an associated enumerator interface, namely
@@ -224,29 +224,29 @@ to a dense representation having the suppressed entries filled in with the
224224
entries are emphatically *not* the missing/`NA` value of the item type, unless
225225
the missing and default values are identical, as they are for key types.
226226

227-
### Metadata
227+
### Annotations
228228

229-
A column in an `ISchema` can have additional column-wide information, known as
230-
metadata. For each string value, known as a metadata kind, a column may have a
231-
value associated with that metadata kind. The value also has an associated
229+
A column in an `DataViewSchema` can have additional column-wide information, known as
230+
annotations. For each string value, known as an annotation kind, a column may have a
231+
value associated with that annotation kind. The value also has an associated
232232
type, which is a compatible column type.
233233

234234
For example:
235235

236236
* A column may indicate that it is normalized, by providing a `BL` valued
237-
piece of metadata named `IsNormalized`.
237+
annotation named `IsNormalized`.
238238

239239
* A column whose type is `V<R4,17>`, meaning a vector of length 17 whose items
240-
are single-precision floating-point values, might have `SlotNames` metadata
240+
are single-precision floating-point values, might have `SlotNames` annotation
241241
of type `V<TX,17>`, meaning a vector of length 17 whose items are text.
242242

243243
* A column produced by a scorer may have several pieces of associated
244-
metadata, indicating the "scoring column group id" that it belongs to, what
244+
annotations, indicating the "scoring column group id" that it belongs to, what
245245
kind of scorer produced the column (for example, binary classification), and the
246246
precise semantics of the column (for example, predicted label, raw score,
247247
probability).
248248

249-
The `ISchema` interface, including the metadata API, is fully specified in
249+
The `DataViewSchema` class, including the annotations API, is fully specified in
250250
another document.
251251

252252
## Text Type

0 commit comments

Comments
 (0)