-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Cleaning and Fixing public API for set of learners. #2765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 14 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
19d25e2
Scrubbing FieldAwareFactorizationMachine learner.
zeahmed b89c63b
Addressed reviewers' comments.
zeahmed 0bc0c92
Fixed entrypoint json.
zeahmed 4051d6b
Fixed entrypoint json.
zeahmed 17f83a9
Merge branch 'learners_check' of https://github.com/zeahmed/machinele…
zeahmed 7d4e7ed
Resolved merged conflicts.
zeahmed 3f7cf53
Resolved a compilation bug.
zeahmed 30faca7
Addressed reviewers' comments.
zeahmed c2746e2
Resolved merged conflicts.
zeahmed 003b6a0
Addressed reviewers' comments.
zeahmed b16b1e3
Fixing and cleaning public API surface.
zeahmed f43e8d3
Resolved merged conflicts.
zeahmed d1a24f8
Resolved merged conflicts.
zeahmed 0039fc6
Resolved merged conflicts.
zeahmed 69d654e
Resolved merged conflicts.
zeahmed 8235b0a
Resolved merged conflicts.
zeahmed 03957a0
Addressed reviewers' comments.
zeahmed 5c139a1
Updated entrypoint list file.
zeahmed 06977ed
Merge remote-tracking branch 'upstream/master' into learners_check_2
zeahmed 88a14c4
Addressed reviewers' comments.
zeahmed 14f6b76
Addressed reviewers' comments.
zeahmed File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,59 +7,64 @@ public static class RandomTrainer | |
{ | ||
public static void Example() | ||
{ | ||
// Downloading the dataset from github.com/dotnet/machinelearning. | ||
// This will create a sentiment.tsv file in the filesystem. | ||
// You can open this file, if you want to see the data. | ||
string dataFile = SamplesUtils.DatasetUtils.DownloadSentimentDataset()[0]; | ||
// Create a new context for ML.NET operations. It can be used for exception tracking and logging, | ||
// as a catalog of available operations and as the source of randomness. | ||
var mlContext = new MLContext(seed: 1); | ||
|
||
// Download and featurize the dataset. | ||
var dataFiles = SamplesUtils.DatasetUtils.DownloadSentimentDataset(); | ||
var trainFile = dataFiles[0]; | ||
var testFile = dataFiles[1]; | ||
|
||
// A preview of the data. | ||
// Sentiment SentimentText | ||
// 0 " :Erm, thank you. " | ||
// 1 ==You're cool== | ||
|
||
// Create a new context for ML.NET operations. It can be used for exception tracking and logging, | ||
// as a catalog of available operations and as the source of randomness. | ||
var mlContext = new MLContext(seed: 1); | ||
|
||
// Step 1: Load the data as an IDataView. | ||
// First, we define the loader: specify the data columns and where to find them in the text file. | ||
var loader = mlContext.Data.CreateTextLoader( | ||
// Step 1: Read the data as an IDataView. | ||
// First, we define the reader: specify the data columns and where to find them in the text file. | ||
var reader = mlContext.Data.CreateTextLoader( | ||
columns: new[] | ||
{ | ||
new TextLoader.Column("Sentiment", DataKind.Single, 0), | ||
new TextLoader.Column("SentimentText", DataKind.String, 1) | ||
}, | ||
hasHeader: true | ||
); | ||
|
||
// Load the data | ||
var data = loader.Load(dataFile); | ||
|
||
// Split it between training and test data | ||
var trainTestData = mlContext.BinaryClassification.TrainTestSplit(data); | ||
// Read the data | ||
var trainData = reader.Load(trainFile); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
nit.: revert #Resolved |
||
|
||
// Step 2: Pipeline | ||
// Featurize the text column through the FeaturizeText API. | ||
// Then append a binary classifier, setting the "Label" column as the label of the dataset, and | ||
// the "Features" column produced by FeaturizeText as the features column. | ||
var pipeline = mlContext.Transforms.Text.FeaturizeText("Features", "SentimentText") | ||
.AppendCacheCheckpoint(mlContext) // Add a data-cache step within a pipeline. | ||
.AppendCacheCheckpoint(mlContext) | ||
.Append(mlContext.BinaryClassification.Trainers.Random()); | ||
|
||
// Step 3: Train the pipeline | ||
var trainedPipeline = pipeline.Fit(trainTestData.TrainSet); | ||
var trainedPipeline = pipeline.Fit(trainData); | ||
|
||
// Step 4: Evaluate on the test set | ||
var transformedData = trainedPipeline.Transform(trainTestData.TestSet); | ||
var transformedData = trainedPipeline.Transform(reader.Load(testFile)); | ||
var evalMetrics = mlContext.BinaryClassification.Evaluate(transformedData, label: "Sentiment"); | ||
|
||
// Step 5: Inspect the output | ||
Console.WriteLine("Accuracy: " + evalMetrics.Accuracy); | ||
SamplesUtils.ConsoleUtils.PrintMetrics(evalMetrics); | ||
|
||
// We expect an output probability closet to 0.5 as the Random trainer outputs a random prediction. | ||
// Regardless of the input features, the trainer will predict either positive or negative label with equal probability. | ||
// Expected output (close to 0.5): | ||
// Accuracy: 0.588235294117647 | ||
// Expected output: (close to 0.5): | ||
|
||
// Accuracy: 0.56 | ||
// AUC: 0.57 | ||
// F1 Score: 0.60 | ||
// Negative Precision: 0.57 | ||
// Negative Recall: 0.44 | ||
// Positive Precision: 0.55 | ||
// Positive Recall: 0.67 | ||
// LogLoss: 1.53 | ||
// LogLossReduction: -53.37 | ||
// Entropy: 1.00 | ||
} | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load. Artidoro had a whole PR to replace Read with Load.
#Resolved