|
| 1 | +# ML.NET 0.2 Release Notes |
| 2 | + |
| 3 | +We would like to thank the community for the engagement so far and helping us |
| 4 | +shape ML.NET. |
| 5 | + |
| 6 | +Today we are releasing ML.NET 0.2. This release focuses on addressing |
| 7 | +questions/issues, adding clustering to the list of supported machine learning |
| 8 | +tasks, enabling using data from memory to train models, easier model |
| 9 | +validation, and more. |
| 10 | + |
| 11 | +### Installation |
| 12 | + |
| 13 | +ML.NET supports Windows, MacOS, and Linux. See [supported OS versions of .NET |
| 14 | +Core |
| 15 | +2.0](https://github.com/dotnet/core/blob/master/release-notes/2.0/2.0-supported-os.md) |
| 16 | +for more details. |
| 17 | + |
| 18 | +You can install ML.NET NuGet from the CLI using: |
| 19 | +``` |
| 20 | +dotnet add package Microsoft.ML |
| 21 | +``` |
| 22 | + |
| 23 | +From package manager: |
| 24 | +``` |
| 25 | +Install-Package Microsoft.ML |
| 26 | +``` |
| 27 | + |
| 28 | +### Release Notes |
| 29 | + |
| 30 | +Below are some of the highlights from this release. |
| 31 | + |
| 32 | +* Added clustering to the list of supported machine learning tasks |
| 33 | + |
| 34 | + * Clustering is an unsupervised learning task that groups sets of items |
| 35 | + based on their features. It identifies which items are more similar to |
| 36 | + each other than other items. This might be useful in scenarios such as |
| 37 | + organizing news articles into groups based on their topics, segmenting |
| 38 | + users based on their shopping habits, and grouping viewers based on |
| 39 | + their taste in movies. |
| 40 | + |
| 41 | + * ML.NET 0.2 exposes `KMeansPlusPlusClusterer` which implements [K-Means++ |
| 42 | + clustering](http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf) |
| 43 | + with [Yinyang K-means |
| 44 | + acceleration](https://www.microsoft.com/en-us/research/publication/yinyang-k-means-a-drop-in-replacement-of-the-classic-k-means-with-consistent-speedup/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fpubs%2Fdefault.aspx%3Fid%3D252149). |
| 45 | + [This |
| 46 | + test](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/ClusteringTests.cs) |
| 47 | + shows how to use it (from |
| 48 | + [#222](https://github.com/dotnet/machinelearning/pull/222)). |
| 49 | + |
| 50 | +* Train using data objects in addition to loading data from a file using |
| 51 | + `CollectionDataSource`. ML.NET 0.1 enabled loading data from a delimited |
| 52 | + text file. `CollectionDataSource` in ML.NET 0.2 adds the ability to use a |
| 53 | + collection of objects as the input to a `LearningPipeline`. See sample usage |
| 54 | + [here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/CollectionDataSourceTests.cs#L133) |
| 55 | + (from [#106](https://github.com/dotnet/machinelearning/pull/106)). |
| 56 | + |
| 57 | +* Easier model validation with cross-validation and train-test |
| 58 | + |
| 59 | + * [Cross-validation](https://en.wikipedia.org/wiki/Cross-validation_(statistics)) |
| 60 | + is an approach to validating how well your model statistically performs. |
| 61 | + It does not require a separate test dataset, but rather uses your |
| 62 | + training data to test your model (it partitions the data so different |
| 63 | + data is used for training and testing, and it does this multiple times). |
| 64 | + [Here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L51) |
| 65 | + is an example for doing cross-validation (from |
| 66 | + [#212](https://github.com/dotnet/machinelearning/pull/212)). |
| 67 | + |
| 68 | + * Train-test is a shortcut to testing your model on a separate dataset. |
| 69 | + See example usage |
| 70 | + [here](https://github.com/dotnet/machinelearning/blob/78810563616f3fcb0b63eb8a50b8b2e62d9d65fc/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs#L36). |
| 71 | + |
| 72 | + * Note that the `LearningPipeline` is prepared the same way in both cases. |
| 73 | + |
| 74 | +* Speed improvement for predictions: by not creating a parallel cursor for |
| 75 | + dataviews that only have one element, we get a significant speed-up for |
| 76 | + predictions (see |
| 77 | + [#179](https://github.com/dotnet/machinelearning/issues/179) for a few |
| 78 | + measurements). |
| 79 | + |
| 80 | +* Updated `TextLoader` API: the `TextLoader` API is now code generated and was |
| 81 | + updated to take explicit declarations for the columns in the data, which is |
| 82 | + required in some scenarios. See |
| 83 | + [#142](https://github.com/dotnet/machinelearning/pull/142). |
| 84 | + |
| 85 | +* Added daily NuGet builds of the project: daily NuGet builds of ML.NET are |
| 86 | + now available |
| 87 | + [here](https://dotnet.myget.org/feed/dotnet-core/package/nuget/Microsoft.ML). |
| 88 | + |
| 89 | +Additional issues closed in this milestone can be found [here](https://github.com/dotnet/machinelearning/milestone/1?closed=1). |
| 90 | + |
| 91 | +### Acknowledgements |
| 92 | + |
| 93 | +Shoutout to tincann, rantri, yamachu, pkulikov, Sorrien, v-tsymbalistyi, Ky7m, |
| 94 | +forki, jessebenson, mfaticaearnin, and the ML.NET team for their contributions |
| 95 | +as part of this release! |
0 commit comments