RandomizedPCA Anomaly Detection fraud detection sample #589

colbylwilliams · 2019-07-29T14:34:17Z

RandomizedPCA sample based on this Binary Classification sample
Note that final completion of this sample is waiting on fix of this issue

…tCardFraudDetection

…editCardFraudDetection

bamurtaugh · 2019-07-29T17:42:19Z

Looks good when I run it.

However, a couple weeks ago, we completed migration to ML.NET version 1.2.0 and preview version 0.14.0. I noticed the Directory.Build.props file in this project is using MicrosoftMLVersion 1.0.0 and MicrosoftMLPreviewVersion 0.12.0. Can you please verify that everything works with the most updated versions as well?

CESARDELATORRE · 2019-07-30T17:56:05Z

Hey @colbylwilliams , when possible for you can you switch to the latest versions of ML.NET (v1.2) and make sure the sample is working properly, then we can make a final test and merge it? :)

...rted/AnomalyDetection_CreditCardFraudDetection/CreditCardFraudDetection.Predictor/Program.cs

bamurtaugh · 2019-08-01T17:28:10Z

...rted/AnomalyDetection_CreditCardFraudDetection/CreditCardFraudDetection.Predictor/Program.cs

+            if (!File.Exists(Path.Combine(trainOutput, "testData.csv")) ||
+                !File.Exists(Path.Combine(trainOutput, "randomizedPca.zip")))
+            {
+                Console.WriteLine("***** YOU NEED TO RUN THE TRAINING PROJECT FIRST *****");


Is there a reason we don't include a copy of randomizedPca.zip for the user already (i.e. is it too large of a file to upload to GitHub)? For the other samples, I believe we include a copy of the model/files produced from training, and the user doesn't have to train first themselves to get the predictor to work.

I based this on how it was done in the BinaryClassification version of this sample. Although, it actually looks like that sample does commit the model into the input directory of the sample's Predictor project. However, as mentioned above, both the test data csv and the model are generated by the Trainer project (in both samples) and looks like the BinaryClassification version of this sample doesn't commit that csv either. So even in the BinaryClassification sample, the Trainer project will need to be run before the Predictor sample will work.

I'm happy to follow you guidance here. If we want to commit the model and test data set into the Predictor project, I can definitely make the change, but we should probably do it for both samples. I believe the longer-term goal is to have all these samples run as a single project vs. separate Trainer/Predictor projects. But again, that's a change that should be made in both the CreditCardFraudDetection samples, and may be outside the scope of this PR.

Let me know how you want to move forward.

@CESARDELATORRE Would love to get your thoughts here!

This concrete sample has two projects. The predictor/scoring project should have the model.ZIP file plus the Test dataset for doing multiple predictions, so it works out-of-the-box without running the training project.

But the training project doesn't need to have the model.zip file since it will generate it after training and since the code is split in two projects it is even better that the training project doesn't have it.

About the dataset, we're only committing/pushing a dataset .zip file (instead of directly the .csv files) because this concrete dataset is larger than 100MB and that's not allowed by GitHub, therefore we have the .zip file for the dataset which is a bit smaller than 100MB.

But for the predictor/scored I think we could include both the model .zip and the test dataset so a user could just try predictions, if desired, and it'll work out-of-the-box instead of raising an error and saying that you first need to run the training project.

In any case we can merge as it is now and we can change those details while reviewing it further. It is not critical. . :)

Btw, never mind. For this case it might be better if the scoring/client app copies the model so it takes the latest training. We'll change the code so it is only copying the .zip model and the test dataset. The scoring project doesn't need the training dataset and git ignore files that are being copied currently. But this is a minor improvement for clarity.

Thanks! 👍

samples/csharp/getting-started/AnomalyDetection_CreditCardFraudDetection/Readme.md

…dDetection/Readme.md Co-Authored-By: Brigit Murtaugh <[email protected]>

colbylwilliams added 12 commits May 28, 2019 11:32

empty console app creation

44b264b

README and Third Party Notices copied from BinaryClassification_Credi…

b596587

…tCardFraudDetection

add new phase for solution to .vsts-dotnet-ci.yml

0f1d6c7

add project references to .Common and Microsoft.ML Nuget

ded041f

Copy DataModels from BinaryClassification_CreditCardFraudDetection

c3c950e

initial go at sample... lot's of copying from BinaryClassification_Cr…

f69e1c6

…editCardFraudDetection

gitignores

e7f45be

console helper for anomaly detectcion

0fae83f

cleanup and tweak normalization

16509f5

sln format

9879fd9

cleanup code comments and update README

fa33ef3

Reference PredictedLabel issue (and some cleanup)

c991cf8

colbylwilliams requested a review from CESARDELATORRE July 29, 2019 14:34

Merge branch 'master' into AnomalyDetection_FraudDetection

1d374d7

bamurtaugh reviewed Aug 1, 2019

View reviewed changes

...rted/AnomalyDetection_CreditCardFraudDetection/CreditCardFraudDetection.Predictor/Program.cs Show resolved Hide resolved

bamurtaugh reviewed Aug 1, 2019

View reviewed changes

samples/csharp/getting-started/AnomalyDetection_CreditCardFraudDetection/Readme.md Outdated Show resolved Hide resolved

bamurtaugh reviewed Aug 1, 2019

View reviewed changes

samples/csharp/getting-started/AnomalyDetection_CreditCardFraudDetection/Readme.md Outdated Show resolved Hide resolved

colbylwilliams and others added 2 commits August 1, 2019 21:46

Update samples/csharp/getting-started/AnomalyDetection_CreditCardFrau…

0bc7045

…dDetection/Readme.md Co-Authored-By: Brigit Murtaugh <[email protected]>

update readme to v1.2.0

fcf7724

CESARDELATORRE merged commit 7cb26f3 into master Aug 2, 2019

colbylwilliams deleted the AnomalyDetection_FraudDetection branch August 6, 2019 16:00

colbylwilliams mentioned this pull request Aug 8, 2019

Create/finish the Anomaly Detection for Fraud detection based on AnomalyDetection-PCA #557

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RandomizedPCA Anomaly Detection fraud detection sample #589

RandomizedPCA Anomaly Detection fraud detection sample #589

colbylwilliams commented Jul 29, 2019

bamurtaugh commented Jul 29, 2019

CESARDELATORRE commented Jul 30, 2019

bamurtaugh Aug 1, 2019

colbylwilliams Aug 2, 2019

bamurtaugh Aug 2, 2019

CESARDELATORRE Aug 2, 2019 •

edited

Loading

CESARDELATORRE Aug 2, 2019

RandomizedPCA Anomaly Detection fraud detection sample #589

RandomizedPCA Anomaly Detection fraud detection sample #589

Conversation

colbylwilliams commented Jul 29, 2019

bamurtaugh commented Jul 29, 2019

CESARDELATORRE commented Jul 30, 2019

bamurtaugh Aug 1, 2019

Choose a reason for hiding this comment

colbylwilliams Aug 2, 2019

Choose a reason for hiding this comment

bamurtaugh Aug 2, 2019

Choose a reason for hiding this comment

CESARDELATORRE Aug 2, 2019 • edited Loading

Choose a reason for hiding this comment

CESARDELATORRE Aug 2, 2019

Choose a reason for hiding this comment

CESARDELATORRE Aug 2, 2019 •

edited

Loading