-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Added a test showing example of text classification using TensorFlow in ML.Net #2302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 11 commits
47b757a
8415c9a
3e4bbcd
2d83c15
8397ac0
0eb434b
fdc0868
daa4333
0cc516e
ddbd9da
57e730c
18f5f78
f984e0f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -141,5 +141,22 @@ public static ValueMappingEstimator<TInputType, TOutputType> ValueMap<TInputType | |
IEnumerable<TOutputType> values, | ||
params (string source, string name)[] columns) | ||
=> new ValueMappingEstimator<TInputType, TOutputType>(CatalogUtils.GetEnvironment(catalog), keys, values, columns); | ||
|
||
/// <summary> | ||
/// Maps the <paramref name="columns.input"/> using the keys in the dictionary to the values of dictionary i.e. | ||
/// a value 'x' in the <paramref name="columns.input"/> would be mappped to a value stored in dictionary[x]. | ||
/// In this case, the <paramref name="lookupMap"/> is used to build up the dictionary where <paramref name="keyColumn"/> | ||
/// and <paramref name="valueColumn"/> specify the keys and values of dictionary respectively. | ||
/// </summary> | ||
/// <param name="catalog">The categorical transform's catalog</param> | ||
/// <param name="lookupMap">An instance of <see cref="IDataView"/> that contains the key and value columns.</param> | ||
/// <param name="keyColumn">Name of the key column in <paramref name="lookupMap"/>.</param> | ||
/// <param name="valueColumn">Name of the value column in <paramref name="lookupMap"/>.</param> | ||
/// <param name="columns">The columns to apply this transform on.</param> | ||
/// <returns></returns> | ||
public static ValueMappingEstimator ValueMap( | ||
this TransformsCatalog.ConversionTransforms catalog, | ||
IDataView lookupMap, string keyColumn, string valueColumn, params (string input, string output)[] columns) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomFinley @sfilipi - is this consistent with the order we've decided on with #2064? #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any thoughts guys? I saw the method above has same pattern so I followed that. In reply to: 252014795 [](ancestors = 252014795) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (string outputColumnName, string inputColumnName) You'll see that if you update to latest. In reply to: 252027697 [](ancestors = 252027697,252014795) |
||
=> new ValueMappingEstimator(CatalogUtils.GetEnvironment(catalog), lookupMap, keyColumn, valueColumn, columns); | ||
} | ||
} |
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
090706417EC29D91EEEABC5C25576374A86426CF25F27556C0EED4FD815D814C4F09FA7389ED8F614E4B34BF6438B9AE0ADA402BEA7CC9441446AB783A6F187D |
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
5359609DDF69D66474F720D6A1ED669942FEB6842096CFC3EAF44B84FA3F2F659829778446BD3C7C83871F7293CA481AC4732DF6DC7921ADA100B459E37198BD |
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
49DB72CDD8D10B78BB1CD17A058DF508E04B38BD287FF53EB9173A48D3994E11741B1EE6C9108303739819845F2F9D777EE3E767D737C24DB3A28B67FF68C951 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,8 +11,10 @@ | |
using Microsoft.ML.ImageAnalytics; | ||
using Microsoft.ML.RunTests; | ||
using Microsoft.ML.Transforms; | ||
using Microsoft.ML.Transforms.Conversions; | ||
using Microsoft.ML.Transforms.Normalizers; | ||
using Microsoft.ML.Transforms.TensorFlow; | ||
using Microsoft.ML.Transforms.Text; | ||
using Xunit; | ||
|
||
namespace Microsoft.ML.Scenarios | ||
|
@@ -846,5 +848,59 @@ public void TensorFlowTransformCifarInvalidShape() | |
} | ||
Assert.True(thrown); | ||
} | ||
|
||
/// <summary> | ||
/// Class to hold features and predictions. | ||
/// </summary> | ||
public class TensorFlowSentiment | ||
{ | ||
public string Sentiment_Text; | ||
[VectorType(600)] | ||
public int[] Features; | ||
[VectorType(2)] | ||
public float[] Prediction; | ||
} | ||
|
||
[ConditionalFact(typeof(Environment), nameof(Environment.Is64BitProcess))] | ||
public void TensorFlowSentimentClassificationTest() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test is going to fail as Microsoft.ML.TensorFlow.TestModels nuget is not updated yet. #Resolved |
||
{ | ||
var mlContext = new MLContext(seed: 1, conc: 1); | ||
var data = new[] { new TensorFlowSentiment() { Sentiment_Text = "this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert is an amazing actor and now the same being director father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also to the two little boy's that played the of norman and paul they were just brilliant children are often left out of the list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all" } }; | ||
var dataView = mlContext.Data.ReadFromEnumerable(data); | ||
|
||
var lookupMap = mlContext.Data.ReadFromTextFile(@"sentiment_model/imdb_word_index.csv", | ||
columns: new[] | ||
{ | ||
new TextLoader.Column("Words", DataKind.TX, 0), | ||
new TextLoader.Column("Ids", DataKind.I4, 1), | ||
}, | ||
separatorChar: ',' | ||
); | ||
|
||
// We cannot resize variable length vector to fixed length vector in ML.NET | ||
// The trick here is to create two pipelines. | ||
// The first pipeline 'dataPipe' tokenzies the string into words and maps each word to an integer which is an index in the dictionary. | ||
// Then this integer vector is retrieved from the pipeline and resized to fixed length. | ||
// The second pipeline 'tfEnginePipe' takes the resized integer vector and passed to TensoFlow and get the classification scores. | ||
var estimator = mlContext.Transforms.Text.TokenizeWords("Sentiment_Text", "TokenizedWords") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
if you rebase to latest, you'll have to swap those. #Resolved |
||
.Append(mlContext.Transforms.Conversion.ValueMap(lookupMap, "Words", "Ids", new[] { ("TokenizedWords", "Features") })); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are calling the How would this work for the 'Transform' API... where the testData has 2 rows (say) row1 -> "Hi" -> dimension 50 (say) Will it work ? In reply to: 251978928 [](ancestors = 251978928,251978523,251977773,251976925) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here we are not training the TF model at all. It is just the prediction pipeline. For the case you are mentioning, it would require the same resize operation on dataview instead of single prediction. In reply to: 251981308 [](ancestors = 251981308,251978928,251978523,251977773,251976925) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand we are not training the TF model. The Fit() for the TFTransform would not do anyting in this example. I wanted to know if the re-size operation on dataview would be supported -- If it is supported, can we add it to the unit test with at least 2 rows of text data + use of the This test case does single prediction (use of In reply to: 252003040 [](ancestors = 252003040,251981308,251978928,251978523,251977773,251976925) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is actually not in the scope of this test. I will try to add more training related test later on but not in this PR because of the scope. In reply to: 252020455 [](ancestors = 252020455,252003040,251981308,251978928,251978523,251977773,251976925) |
||
var dataPipe = estimator.Fit(dataView) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
is there a particular reason why we have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
.CreatePredictionEngine<TensorFlowSentiment, TensorFlowSentiment>(mlContext); | ||
|
||
// For explanation on how was the `sentiment_model` created | ||
// c.f. https://github.com/dotnet/machinelearning-testdata/blob/master/Microsoft.ML.TensorFlow.TestModels/sentiment_model/README.md | ||
string modelLocation = @"sentiment_model"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
so this TF model takes as input a vector of floats. Am i right ? Perhaps we should add a comment how the model was created etc. #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
var tfEnginePipe = mlContext.Transforms.ScoreTensorFlowModel(modelLocation, new[] { "Features" }, new[] { "Prediction/Softmax" }) | ||
.Append(mlContext.Transforms.CopyColumns(("Prediction/Softmax", "Prediction"))) | ||
.Fit(dataView) | ||
.CreatePredictionEngine<TensorFlowSentiment, TensorFlowSentiment>(mlContext); | ||
|
||
var processedData = dataPipe.Predict(data[0]); | ||
Array.Resize(ref processedData.Features, 600); | ||
var prediction = tfEnginePipe.Predict(processedData); | ||
|
||
Assert.Equal(2, prediction.Prediction.Length); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we verify that the predictions were somewhat correct? #Resolved |
||
Assert.InRange(prediction.Prediction[1], 0.650032759 - 0.01, 0.650032759 + 0.01); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please use Assert.Equal. If there are only two prediction values, can we check them all? #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean by Assert.Equal? Here we are checking the range within particular threshold. In reply to: 252005383 [](ancestors = 252005383) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I got you what you meant with Assert.Equal. I actually want to check if my values are in range e.g. 0.64 <= prediction <= 0.66 which I cannot do with Assert.Equal, can I? In reply to: 252006645 [](ancestors = 252006645,252005383) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You have tolerance in Assert.Equal. #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It uses number of decimal places which is not applicable here. In reply to: 252050957 [](ancestors = 252050957) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could trim 0.650032759 to 0.65, if we're comparing as ± 0.01. |
||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add
a value x in the input would be mapped to value stored in dictionary[x]
? #Resolved