|
1 |
| -# Clustering Iris flowers (F#) |
| 1 | +# Clustering Iris Data |
2 | 2 |
|
3 | 3 | | ML.NET version | API type | Status | App Type | Data type | Scenario | ML Task | Algorithms |
|
4 | 4 | |----------------|-------------------|-------------------------------|-------------|-----------|---------------------|---------------------------|-----------------------------|
|
5 |
| -| v0.7 | Dynamic API | README.md needs update | Console app | .txt file | Clustering Iris flowers | Clustering | K-means++ | |
| 5 | +| v0.9 | Dynamic API | Up-to-date | Console app | .txt file | Clustering Iris flowers | Clustering | K-means++ | |
6 | 6 |
|
7 | 7 | In this introductory sample, you'll see how to use [ML.NET](https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet) to divide iris flowers into different groups that correspond to different types of iris. In the world of machine learning, this task is known as **clustering**.
|
8 | 8 |
|
@@ -30,59 +30,73 @@ To solve this problem, first we will build and train an ML model. Then we will u
|
30 | 30 |
|
31 | 31 | ### 1. Build model
|
32 | 32 |
|
33 |
| -Building a model includes: uploading data (`iris-full.txt` with `TextLoader`), transforming the data so it can be used effectively by an ML algorithm (with `ConcatEstimator`), and choosing a learning algorithm (`KMeansPlusPlusTrainer`). All of those steps are stored in a `EstimatorChain`: |
| 33 | +Building a model includes: uploading data (`iris-full.txt` with `TextLoader`), transforming the data so it can be used effectively by an ML algorithm (with `Concatenate`), and choosing a learning algorithm (`KMeans`). All of those steps are stored in `trainingPipeline`: |
| 34 | + |
34 | 35 | ```fsharp
|
35 |
| - // LearningPipeline holds all steps of the learning process: data, transforms, learners. |
| 36 | + // STEP 1: Common data loading configuration |
| 37 | + let textLoader = |
| 38 | + mlContext.Data.CreateTextReader( |
| 39 | + hasHeader = true, |
| 40 | + separatorChar = '\t', |
| 41 | + columns = |
| 42 | + [| |
| 43 | + TextLoader.Column("Label", Nullable DataKind.R4, 0) |
| 44 | + TextLoader.Column("SepalLength", Nullable DataKind.R4, 1) |
| 45 | + TextLoader.Column("SepalWidth", Nullable DataKind.R4, 2) |
| 46 | + TextLoader.Column("PetalLength", Nullable DataKind.R4, 3) |
| 47 | + TextLoader.Column("PetalWidth", Nullable DataKind.R4, 4) |
| 48 | + |] |
| 49 | + ) |
| 50 | +
|
| 51 | + let fullData = textLoader.Read dataPath |
36 | 52 |
|
37 |
| - //1. Create ML.NET context/environment |
38 |
| - use env = new LocalEnvironment() |
39 |
| -
|
40 |
| - //2. Create DataReader with data schema mapped to file's columns |
41 |
| - let reader = |
42 |
| - TextLoader( |
43 |
| - env, |
44 |
| - TextLoader.Arguments( |
45 |
| - Separator = "tab", |
46 |
| - HasHeader = true, |
47 |
| - Column = |
48 |
| - [| |
49 |
| - TextLoader.Column("Label", Nullable DataKind.R4, 0) |
50 |
| - TextLoader.Column("SepalLength", Nullable DataKind.R4, 1) |
51 |
| - TextLoader.Column("SepalWidth", Nullable DataKind.R4, 2) |
52 |
| - TextLoader.Column("PetalLength", Nullable DataKind.R4, 3) |
53 |
| - TextLoader.Column("PetalWidth", Nullable DataKind.R4, 4) |
54 |
| - |] |
55 |
| - ) |
56 |
| - ) |
57 |
| -
|
58 |
| - //Load training data |
59 |
| - let trainingDataView = MultiFileSource(DataPath) |> reader.Read |
| 53 | + //Split dataset in two parts: TrainingDataset (80%) and TestDataset (20%) |
| 54 | + let struct(trainingDataView, testingDataView) = mlContext.Clustering.TrainTestSplit(fullData, testFraction = 0.2) |
| 55 | +
|
| 56 | + //STEP 2: Process data transformations in pipeline |
| 57 | + let dataProcessPipeline = mlContext.Transforms.Concatenate("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth") |
| 58 | +
|
| 59 | + // (Optional) Peek data in training DataView after applying the ProcessPipeline's transformations |
| 60 | + Common.ConsoleHelper.peekDataViewInConsole<IrisData> mlContext trainingDataView dataProcessPipeline 10 |> ignore |
| 61 | + Common.ConsoleHelper.peekVectorColumnDataInConsole mlContext "Features" trainingDataView dataProcessPipeline 10 |> ignore |
| 62 | +
|
| 63 | + // STEP 3: Create and train the model |
| 64 | + let trainer = mlContext.Clustering.Trainers.KMeans(features = "Features", clustersCount = 3) |
| 65 | +
|
| 66 | + let modelBuilder = |
| 67 | + Common.ModelBuilder.create mlContext dataProcessPipeline |
| 68 | + |> Common.ModelBuilder.addTrainer trainer |
| 69 | +
|
| 70 | + let trainedModel = |
| 71 | + modelBuilder |
| 72 | + |> Common.ModelBuilder.train trainingDataView |
60 | 73 | ```
|
| 74 | + |
61 | 75 | ### 2. Train model
|
62 |
| -Training the model is a process of running the chosen algorithm on the given data. It is implemented in the `Fit()` method from the Estimator object. To perform training we just call the method and provide our data. |
63 |
| -```fsharp |
64 |
| - let model = |
65 |
| - env |
66 |
| - |> Pipeline.concatEstimator "Features" [| "SepalLength"; "SepalWidth"; "PetalLength"; "PetalWidth" |] |
67 |
| - |> Pipeline.append (KMeansPlusPlusTrainer(env, "Features", clustersCount = 3)) |
68 |
| - |> Pipeline.fit trainingDataView |
| 76 | +Training the model is a process of running the chosen algorithm on the given data. To perform training you need to call the Fit() method. |
69 | 77 |
|
| 78 | +```fsharp |
| 79 | + let trainedModel = |
| 80 | + modelBuilder |
| 81 | + |> Common.ModelBuilder.train trainingDataView |
70 | 82 | ```
|
71 | 83 | ### 3. Consume model
|
72 | 84 | After the model is build and trained, we can use the `Predict()` API to predict the cluster for an iris flower and calculate the distance from given flower parameters to each cluster (each centroid of a cluster).
|
73 | 85 |
|
74 | 86 | ```fsharp
|
75 |
| - let sampleIrisData = |
76 |
| - { |
| 87 | + let sampleIrisData = |
| 88 | + { |
77 | 89 | SepalLength = 3.3f
|
78 | 90 | SepalWidth = 1.6f
|
79 | 91 | PetalLength = 0.2f
|
80 |
| - PetalWidth = 5.1f |
| 92 | + PetalWidth = 5.1f |
81 | 93 | }
|
82 | 94 |
|
83 |
| - let predictionFunc = loadedModel.MakePredictionFunction<IrisData, IrisPrediction> env |
84 |
| - let prediction = predictionFunc.Predict sampleIrisData |
| 95 | + //Create the clusters: Create data files and plot a chart |
| 96 | + let prediction = |
| 97 | + Common.ModelScorer.create mlContext |
| 98 | + |> Common.ModelScorer.loadModelFromZipFile modelPath |
| 99 | + |> Common.ModelScorer.predictSingle sampleIrisData |
85 | 100 |
|
86 |
| - printfn "Clusters assigned for setosa flowers: %d" prediction.SelectedClusterId |
87 |
| -``` |
| 101 | + printfn "Cluster assigned for setosa flowers: %d" prediction.SelectedClusterId``` |
88 | 102 | ```
|
0 commit comments