Skip to content

Added a test showing example of text classification using TensorFlow in ML.Net #2302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 30, 2019
Merged
2 changes: 1 addition & 1 deletion build/Dependencies.props
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
<SystemDrawingCommonPackageVersion>4.5.0</SystemDrawingCommonPackageVersion>
<SystemIOFileSystemAccessControl>4.5.0</SystemIOFileSystemAccessControl>
<SystemSecurityPrincipalWindows>4.5.0</SystemSecurityPrincipalWindows>
<TensorFlowVersion>1.10.0</TensorFlowVersion>
<TensorFlowVersion>1.12.0</TensorFlowVersion>
</PropertyGroup>

<!-- Code Analyzer Dependencies -->
Expand Down
17 changes: 17 additions & 0 deletions src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -140,5 +140,22 @@ public static ValueMappingEstimator<TInputType, TOutputType> ValueMap<TInputType
IEnumerable<TOutputType> values,
params (string outputColumnName, string inputColumnName)[] columns)
=> new ValueMappingEstimator<TInputType, TOutputType>(CatalogUtils.GetEnvironment(catalog), keys, values, columns);

/// <summary>
/// Maps the <paramref name="columns.input"/> using the keys in the dictionary to the values of dictionary i.e.
/// a value 'x' in the <paramref name="columns.input"/> would be mappped to a value stored in dictionary[x].
/// In this case, the <paramref name="lookupMap"/> is used to build up the dictionary where <paramref name="keyColumn"/>
/// and <paramref name="valueColumn"/> specify the keys and values of dictionary respectively.
Copy link
Member

@wschin wschin Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a value x in the input would be mapped to value stored in dictionary[x]? #Resolved

/// </summary>
/// <param name="catalog">The categorical transform's catalog</param>
/// <param name="lookupMap">An instance of <see cref="IDataView"/> that contains the key and value columns.</param>
/// <param name="keyColumn">Name of the key column in <paramref name="lookupMap"/>.</param>
/// <param name="valueColumn">Name of the value column in <paramref name="lookupMap"/>.</param>
/// <param name="columns">The columns to apply this transform on.</param>
/// <returns></returns>
public static ValueMappingEstimator ValueMap(
this TransformsCatalog.ConversionTransforms catalog,
IDataView lookupMap, string keyColumn, string valueColumn, params (string outputColumnName, string inputColumnName)[] columns)
=> new ValueMappingEstimator(CatalogUtils.GetEnvironment(catalog), lookupMap, keyColumn, valueColumn, columns);
}
}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
090706417EC29D91EEEABC5C25576374A86426CF25F27556C0EED4FD815D814C4F09FA7389ED8F614E4B34BF6438B9AE0ADA402BEA7CC9441446AB783A6F187D

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
5359609DDF69D66474F720D6A1ED669942FEB6842096CFC3EAF44B84FA3F2F659829778446BD3C7C83871F7293CA481AC4732DF6DC7921ADA100B459E37198BD

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
49DB72CDD8D10B78BB1CD17A058DF508E04B38BD287FF53EB9173A48D3994E11741B1EE6C9108303739819845F2F9D777EE3E767D737C24DB3A28B67FF68C951
2 changes: 1 addition & 1 deletion test/Microsoft.ML.Tests/Microsoft.ML.Tests.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
<NativeAssemblyReference Condition="'$(OS)' != 'Windows_NT'" Include="tensorflow_framework" />
</ItemGroup>
<ItemGroup>
<PackageReference Include="Microsoft.ML.TensorFlow.TestModels" Version="0.0.6-test" />
<PackageReference Include="Microsoft.ML.TensorFlow.TestModels" Version="0.0.7-test" />
<PackageReference Include="Microsoft.ML.Onnx.TestModels" Version="0.0.2-test" />
</ItemGroup>
</Project>
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,10 @@
using Microsoft.ML.ImageAnalytics;
using Microsoft.ML.RunTests;
using Microsoft.ML.Transforms;
using Microsoft.ML.Transforms.Conversions;
using Microsoft.ML.Transforms.Normalizers;
using Microsoft.ML.Transforms.TensorFlow;
using Microsoft.ML.Transforms.Text;
using Xunit;

namespace Microsoft.ML.Scenarios
Expand Down Expand Up @@ -846,5 +848,59 @@ public void TensorFlowTransformCifarInvalidShape()
}
Assert.True(thrown);
}

/// <summary>
/// Class to hold features and predictions.
/// </summary>
public class TensorFlowSentiment
{
public string Sentiment_Text;
[VectorType(600)]
public int[] Features;
[VectorType(2)]
public float[] Prediction;
}

[ConditionalFact(typeof(Environment), nameof(Environment.Is64BitProcess))]
public void TensorFlowSentimentClassificationTest()
Copy link
Contributor Author

@zeahmed zeahmed Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is going to fail as Microsoft.ML.TensorFlow.TestModels nuget is not updated yet. #Resolved

{
var mlContext = new MLContext(seed: 1, conc: 1);
var data = new[] { new TensorFlowSentiment() { Sentiment_Text = "this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert is an amazing actor and now the same being director father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also to the two little boy's that played the of norman and paul they were just brilliant children are often left out of the list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all" } };
var dataView = mlContext.Data.ReadFromEnumerable(data);

var lookupMap = mlContext.Data.ReadFromTextFile(@"sentiment_model/imdb_word_index.csv",
columns: new[]
{
new TextLoader.Column("Words", DataKind.TX, 0),
new TextLoader.Column("Ids", DataKind.I4, 1),
},
separatorChar: ','
);

// We cannot resize variable length vector to fixed length vector in ML.NET
// The trick here is to create two pipelines.
// The first pipeline 'dataPipe' tokenzies the string into words and maps each word to an integer which is an index in the dictionary.
// Then this integer vector is retrieved from the pipeline and resized to fixed length.
// The second pipeline 'tfEnginePipe' takes the resized integer vector and passes it to TensoFlow and gets the classification scores.
var estimator = mlContext.Transforms.Text.TokenizeWords("TokenizedWords", "Sentiment_Text")
.Append(mlContext.Transforms.Conversion.ValueMap(lookupMap, "Words", "Ids", new[] { ("Features", "TokenizedWords") }));
var dataPipe = estimator.Fit(dataView)
Copy link
Member

@abgoswam abgoswam Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataPipe [](start = 16, length = 8)

is there a particular reason why we have dataPipe and tfEnginePipe separate ? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments in the code.


In reply to: 251916406 [](ancestors = 251916406)

.CreatePredictionEngine<TensorFlowSentiment, TensorFlowSentiment>(mlContext);

// For explanation on how was the `sentiment_model` created
// c.f. https://github.com/dotnet/machinelearning-testdata/blob/master/Microsoft.ML.TensorFlow.TestModels/sentiment_model/README.md
string modelLocation = @"sentiment_model";
Copy link
Member

@abgoswam abgoswam Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sentiment_model [](start = 37, length = 15)

so this TF model takes as input a vector of floats. Am i right ?

Perhaps we should add a comment how the model was created etc. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it takes integers as input.


In reply to: 251917695 [](ancestors = 251917695)

var tfEnginePipe = mlContext.Transforms.ScoreTensorFlowModel(modelLocation, new[] { "Prediction/Softmax" }, new[] { "Features" })
.Append(mlContext.Transforms.CopyColumns(("Prediction", "Prediction/Softmax")))
.Fit(dataView)
.CreatePredictionEngine<TensorFlowSentiment, TensorFlowSentiment>(mlContext);

var processedData = dataPipe.Predict(data[0]);
Array.Resize(ref processedData.Features, 600);
var prediction = tfEnginePipe.Predict(processedData);

Assert.Equal(2, prediction.Prediction.Length);
Copy link
Member

@eerhardt eerhardt Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we verify that the predictions were somewhat correct? #Resolved

Assert.InRange(prediction.Prediction[1], 0.650032759 - 0.01, 0.650032759 + 0.01);
Copy link
Member

@wschin wschin Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Assert.Equal. If there are only two prediction values, can we check them all? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by Assert.Equal? Here we are checking the range within particular threshold.
No need to check another value. These are probabilities.


In reply to: 252005383 [](ancestors = 252005383)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I got you what you meant with Assert.Equal. I actually want to check if my values are in range e.g. 0.64 <= prediction <= 0.66 which I cannot do with Assert.Equal, can I?
Also, I feel InRange more readable than other when asserting thresholds.


In reply to: 252006645 [](ancestors = 252006645,252005383)

Copy link
Member

@wschin wschin Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have tolerance in Assert.Equal. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses number of decimal places which is not applicable here.


In reply to: 252050957 [](ancestors = 252050957)

Copy link
Contributor

@justinormont justinormont Jan 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could trim 0.650032759 to 0.65, if we're comparing as ± 0.01.

}
}
}