Skip to content

Created sample for text normalizing API. #3133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 1, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
using System;
using System.Collections.Generic;
using System.Text;

namespace Microsoft.ML.Samples.Dynamic
{
public static class NormalizeText
{
public static void Example()
{
// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
var mlContext = new MLContext();

// Create an empty data sample list. The 'NormalizeText' API does not require training data as
// the estimator ('TextNormalizingEstimator') created by 'NormalizeText' API is not a trainable estimator.
// The empty list is only needed to pass input schema to the pipeline.
var emptySamples = new List<TextData>();

// Convert sample list to an empty IDataView.
var emptyDataView = mlContext.Data.LoadFromEnumerable(emptySamples);

// A pipeline for normalizing text.
var normTextPipeline = mlContext.Transforms.Text.NormalizeText("NormalizedText", "Text",
Transforms.Text.TextNormalizingEstimator.CaseMode.Lower,
keepDiacritics: false,
keepPunctuations: false,
keepNumbers: false);

// Fit to data.
var normTextTransformer = normTextPipeline.Fit(emptyDataView);

// Create the prediction engine to get the normalized text from the input text/string.
var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData, TransformedTextData>(normTextTransformer);

// Call the prediction API.
var data = new TextData() { Text = "ML.NET's NormalizeText API changes the case of the TEXT and removes/keeps diâcrîtîcs, punctuations, and/or numbers (123)." };
var prediction = predictionEngine.Predict(data);

// Print the normalized text.
Console.WriteLine($"Normalized Text: {prediction.NormalizedText}");

// Expected output:
// Normalized Text: mlnets normalizetext api changes the case of the text and removeskeeps diacritics punctuations andor numbers
}

public class TextData
{
public string Text { get; set; }
}

public class TransformedTextData : TextData
{
public string NormalizedText { get; set; }
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

7 changes: 7 additions & 0 deletions src/Microsoft.ML.Transforms/Text/TextCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,13 @@ internal static TokenizingByCharactersEstimator TokenizeIntoCharactersAsKeys(thi
/// <param name="keepDiacritics">Whether to keep diacritical marks or remove them.</param>
/// <param name="keepPunctuations">Whether to keep punctuation marks or remove them.</param>
/// <param name="keepNumbers">Whether to keep numbers or remove them.</param>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[NormalizeText](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/NormalizeText.cs)]
/// ]]>
/// </format>
/// </example>
public static TextNormalizingEstimator NormalizeText(this TransformsCatalog.TextTransforms catalog,
string outputColumnName,
string inputColumnName = null,
Expand Down