-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Scrubbing FieldAwareFactorizationMachine learner. #2730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
19d25e2
Scrubbing FieldAwareFactorizationMachine learner.
zeahmed b89c63b
Addressed reviewers' comments.
zeahmed 0bc0c92
Fixed entrypoint json.
zeahmed 4051d6b
Fixed entrypoint json.
zeahmed 17f83a9
Merge branch 'learners_check' of https://github.com/zeahmed/machinele…
zeahmed 7d4e7ed
Resolved merged conflicts.
zeahmed 3f7cf53
Resolved a compilation bug.
zeahmed 30faca7
Addressed reviewers' comments.
zeahmed c2746e2
Resolved merged conflicts.
zeahmed 003b6a0
Addressed reviewers' comments.
zeahmed File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
71 changes: 0 additions & 71 deletions
71
docs/samples/Microsoft.ML.Samples/Dynamic/FieldAwareFactorizationMachine.cs
This file was deleted.
Oops, something went wrong.
74 changes: 74 additions & 0 deletions
74
...rosoft.ML.Samples/Dynamic/Trainers/BinaryClassification/FieldAwareFactorizationMachine.cs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
using System; | ||
using System.Linq; | ||
using Microsoft.ML.Data; | ||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public static class FFMBinaryClassification | ||
{ | ||
public static void Example() | ||
{ | ||
// Create a new context for ML.NET operations. It can be used for exception tracking and logging, | ||
// as a catalog of available operations and as the source of randomness. | ||
var mlContext = new MLContext(); | ||
|
||
// Download and featurize the dataset. | ||
var dataviews = SamplesUtils.DatasetUtils.LoadFeaturizedSentimentDataset(mlContext); | ||
var trainData = dataviews[0]; | ||
var testData = dataviews[1]; | ||
|
||
// ML.NET doesn't cache data set by default. Therefore, if one reads a data set from a file and accesses it many times, it can be slow due to | ||
// expensive featurization and disk operations. When the considered data can fit into memory, a solution is to cache the data in memory. Caching is especially | ||
// helpful when working with iterative algorithms which needs many data passes. Since SDCA is the case, we cache. Inserting a | ||
// cache step in a pipeline is also possible, please see the construction of pipeline below. | ||
trainData = mlContext.Data.Cache(trainData); | ||
|
||
// Step 2: Pipeline | ||
// Create the 'FieldAwareFactorizationMachine' binary classifier, setting the "Sentiment" column as the label of the dataset, and | ||
// the "Features" column as the features column. | ||
var pipeline = new EstimatorChain<ITransformer>().AppendCacheCheckpoint(mlContext) | ||
.Append(mlContext.BinaryClassification.Trainers. | ||
FieldAwareFactorizationMachine(labelColumnName: "Sentiment", featureColumnNames: new[] { "Features" })); | ||
|
||
// Fit the model. | ||
var model = pipeline.Fit(trainData); | ||
|
||
// Let's get the model parameters from the model. | ||
var modelParams = model.LastTransformer.Model; | ||
|
||
// Let's inspect the model parameters. | ||
var featureCount = modelParams.FeatureCount; | ||
var fieldCount = modelParams.FieldCount; | ||
var latentDim = modelParams.LatentDimension; | ||
var linearWeights = modelParams.GetLinearWeights(); | ||
var latentWeights = modelParams.GetLatentWeights(); | ||
|
||
Console.WriteLine("The feature count is: " + featureCount); | ||
Console.WriteLine("The number of fields is: " + fieldCount); | ||
Console.WriteLine("The latent dimension is: " + latentDim); | ||
Console.WriteLine("The linear weights of some of the features are: " + | ||
string.Concat(Enumerable.Range(1, 10).Select(i => $"{linearWeights[i]:F4} "))); | ||
Console.WriteLine("The weights of some of the latent features are: " + | ||
string.Concat(Enumerable.Range(1, 10).Select(i => $"{latentWeights[i]:F4} "))); | ||
|
||
// The feature count is: 9374 | ||
// The number of fields is: 1 | ||
// The latent dimension is: 20 | ||
// The linear weights of some of the features are: 0.0196 0.0000 -0.0045 -0.0205 0.0000 0.0032 0.0682 0.0091 -0.0151 0.0089 | ||
// The weights of some of the latent features are: 0.3316 0.2140 0.0752 0.0908 -0.0495 -0.0810 0.0761 0.0966 0.0090 -0.0962 | ||
|
||
// Evaluate how the model is doing on the test data. | ||
var dataWithPredictions = model.Transform(testData); | ||
|
||
var metrics = mlContext.BinaryClassification.Evaluate(dataWithPredictions, "Sentiment"); | ||
SamplesUtils.ConsoleUtils.PrintMetrics(metrics); | ||
|
||
// Accuracy: 0.72 | ||
// AUC: 0.75 | ||
// F1 Score: 0.74 | ||
// Negative Precision: 0.75 | ||
// Negative Recall: 0.67 | ||
// Positive Precision: 0.70 | ||
// Positive Recall: 0.78 | ||
} | ||
} | ||
} |
83 changes: 83 additions & 0 deletions
83
...mples/Dynamic/Trainers/BinaryClassification/FieldAwareFactorizationMachinewWithOptions.cs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
using System; | ||
using System.Linq; | ||
using Microsoft.ML.Data; | ||
using Microsoft.ML.FactorizationMachine; | ||
|
||
namespace Microsoft.ML.Samples.Dynamic | ||
{ | ||
public static class FFMBinaryClassificationWithOptions | ||
{ | ||
public static void Example() | ||
{ | ||
// Create a new context for ML.NET operations. It can be used for exception tracking and logging, | ||
// as a catalog of available operations and as the source of randomness. | ||
var mlContext = new MLContext(); | ||
|
||
// Download and featurize the dataset. | ||
var dataviews = SamplesUtils.DatasetUtils.LoadFeaturizedSentimentDataset(mlContext); | ||
var trainData = dataviews[0]; | ||
var testData = dataviews[1]; | ||
|
||
// ML.NET doesn't cache data set by default. Therefore, if one reads a data set from a file and accesses it many times, it can be slow due to | ||
// expensive featurization and disk operations. When the considered data can fit into memory, a solution is to cache the data in memory. Caching is especially | ||
// helpful when working with iterative algorithms which needs many data passes. Since SDCA is the case, we cache. Inserting a | ||
// cache step in a pipeline is also possible, please see the construction of pipeline below. | ||
trainData = mlContext.Data.Cache(trainData); | ||
|
||
// Step 2: Pipeline | ||
// Create the 'FieldAwareFactorizationMachine' binary classifier, setting the "Sentiment" column as the label of the dataset, and | ||
// the "Features" column as the features column. | ||
var pipeline = new EstimatorChain<ITransformer>().AppendCacheCheckpoint(mlContext) | ||
.Append(mlContext.BinaryClassification.Trainers. | ||
FieldAwareFactorizationMachine( | ||
new FieldAwareFactorizationMachineTrainer.Options | ||
{ | ||
FeatureColumn = "Features", | ||
LabelColumn = "Sentiment", | ||
LearningRate = 0.1f, | ||
Iterations = 10 | ||
})); | ||
|
||
// Fit the model. | ||
var model = pipeline.Fit(trainData); | ||
|
||
// Let's get the model parameters from the model. | ||
var modelParams = model.LastTransformer.Model; | ||
|
||
// Let's inspect the model parameters. | ||
var featureCount = modelParams.FeatureCount; | ||
var fieldCount = modelParams.FieldCount; | ||
var latentDim = modelParams.LatentDimension; | ||
var linearWeights = modelParams.GetLinearWeights(); | ||
var latentWeights = modelParams.GetLatentWeights(); | ||
|
||
Console.WriteLine("The feature count is: " + featureCount); | ||
Console.WriteLine("The number of fields is: " + fieldCount); | ||
Console.WriteLine("The latent dimension is: " + latentDim); | ||
Console.WriteLine("The linear weights of some of the features are: " + | ||
string.Concat(Enumerable.Range(1, 10).Select(i => $"{linearWeights[i]:F4} "))); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
Console.WriteLine("The weights of some of the latent features are: " + | ||
string.Concat(Enumerable.Range(1, 10).Select(i => $"{latentWeights[i]:F4} "))); | ||
|
||
// The feature count is: 9374 | ||
// The number of fields is: 1 | ||
// The latent dimension is: 20 | ||
// The linear weights of some of the features are: 0.0410 0.0000 -0.0078 -0.0285 0.0000 0.0114 0.1313 0.0183 -0.0224 0.0166 | ||
// The weights of some of the latent features are: -0.0326 0.1127 0.0621 0.1446 0.2038 0.1608 0.2084 0.0141 0.2458 -0.0625 | ||
|
||
// Evaluate how the model is doing on the test data. | ||
var dataWithPredictions = model.Transform(testData); | ||
|
||
var metrics = mlContext.BinaryClassification.Evaluate(dataWithPredictions, "Sentiment"); | ||
SamplesUtils.ConsoleUtils.PrintMetrics(metrics); | ||
|
||
// Accuracy: 0.78 | ||
// AUC: 0.81 | ||
// F1 Score: 0.78 | ||
// Negative Precision: 0.78 | ||
// Negative Recall: 0.78 | ||
// Positive Precision: 0.78 | ||
// Positive Recall: 0.78 | ||
} | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think right now I can do following:
Evaluate(model.Transform(Data)) -> AUC = X1
Evaluate(model.Transform(Data)) -> AUC = X2
X1 != X2
Which is awful.
Can you check
MatrixFactoraztionPredictor
and how it handles arrays?I also don't understand
GetFeatureCoun()
functions, why can't I just domodelParams.FeatureCount
?#Resolved