Skip to content

Sample for IID spike and changepoint detection using time series stateful prediction engine. #1762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Dec 3, 2018

Conversation

codemzs
Copy link
Member

@codemzs codemzs commented Nov 28, 2018

Sample for IID spike and changepoint detection using time series stateful prediction engine. Fixes #978

@montebhoover
Copy link
Contributor

@sfilipi is added to the review. #Closed

@@ -88,5 +92,92 @@ public static void IidChangePointDetectorTransform()
// 7 0 7.00 0.50 0.00
// 7 0 7.00 0.50 0.00
}

// This example creates a time series (list of Data with the i-th element corresponding to the i-th time slot).
Copy link
Contributor

@montebhoover montebhoover Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking to @ @sfilipi I realized that this will automatically go into the docs site, showing both samples on the docs for IidChangePointDetector because of the xml tag here:

/// [!code-csharp[MF](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/IidChangePointDetectorTransform.cs)]
. So maybe we will want this in a separate file and choose which example we show on the docs site. #Resolved

Copy link
Member Author

@codemzs codemzs Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this supposed to be a sample for how to use IidChangePointDetector? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was just mentioning that everything we add to this file will show up verbatim in the docs site on the page for IidChangePointDetector. Since this adds a second sample to the file, we now have two samples on the docs site - just checking if that is what we actually want.


In reply to: 237313709 [](ancestors = 237313709)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll sync offline with you on this.


In reply to: 237596871 [](ancestors = 237596871,237313709)

Console.WriteLine("{0}\t{1}\t{2:0.00}\t{3:0.00}", 10, spikePrediction.Prediction[0],
spikePrediction.Prediction[1], spikePrediction.Prediction[2]);

// Checkpoint the model.
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Checkpoint the model. [](start = 12, length = 24)

Why we doing this?
I mean
A)
public void CheckPoint(IHostEnvironment env, string modelPath)
has zero comments.
B) can we spell out in what cases you want to call this function? and what it for? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are checkpointing the model so that we can reload it with the updated saved from the predictions. Yep, I have addressed that as part of documentation.


In reply to: 237327645 [](ancestors = 237327645)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the documentation? I wanted to see if I need to add anything for Martingale, P-Value, etc.


In reply to: 237702797 [](ancestors = 237702797,237327645)

Copy link
Member Author

@codemzs codemzs Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its in this PR, check files under src/Microsoft.ML.TimeSeries #Resolved

// Anomaly spike detection.
var prediction = engine.Predict(new IidSpikeData(5));
Console.WriteLine("{0}\t{1}\t{2:0.00}\t{3:0.00}", 5, prediction.Prediction[0],
prediction.Prediction[1], prediction.Prediction[2]);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No offense, but this is awful.
How I'm suppose to know size of array I need to pass?
How I'm suppose to know what is meaning behind index 0, index 1, index 2?

Why this not wrap in a proper class with properties? #WontFix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Zeeshan made a note of this in the PR for prediction engine, and mentioned that it would require a bit of extra refactoring so he would make it in a separate PR. I'll let him elaborate.


In reply to: 237329405 [](ancestors = 237329405)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are supposed to know the size of the array you will need to pass same way as you would need to know the info about output columns produced by other trainers and transforms in ML.NET.


In reply to: 237595774 [](ancestors = 237595774,237329405)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to do the future work to wrap this output vector column in a class with named properties, or even turning it into three scalar output columns?


In reply to: 237625679 [](ancestors = 237625679,237595774,237329405)

var ml = new MLContext();

// Generate sample series data with a change
const int size = 16;
Copy link
Contributor

@montebhoover montebhoover Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use the C# convention of PascalCase for constants? #Resolved

};

// Time Series model.
ITransformer model = new IidChangePointEstimator(ml, args).Fit(dataView);
Copy link
Contributor

@GalOshri GalOshri Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the IidChangePointEstimator be accessed through MLContext instead? #Resolved

Copy link
Contributor

@montebhoover montebhoover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few things I wanted to note for future work, or at least evaluate them and decide if they are worth prioritizing: #1794

/// <param name="env">Usually <see cref="MLContext"/></param>
/// <param name="ignoreMissingColumns"></param>
/// <param name="inputSchemaDefinition"></param>
/// <param name="outputSchemaDefinition"></param>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you either fill it with description or just remove from public facing API? #Resolved

/// <param name="ignoreMissingColumns"></param>
/// <param name="inputSchemaDefinition"></param>
/// <param name="outputSchemaDefinition"></param>
/// <returns></returns>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove #Resolved

/// </summary>
/// <typeparam name="TSrc">Class describing input schema to the model.</typeparam>
/// <typeparam name="TDst">Class describing the output schema of the prediction.</typeparam>
/// <param name="transformer">The time series pipeline in the form of a <see cref="ITransformer"/></param>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

< [](start = 105, length = 2)

add . #Resolved

/// Checkpoints <see cref="TimeSeriesPredictionFunction{TSrc, TDst}"/> to disk with the updated
/// state.
/// </summary>
/// <param name="env">Usually <see cref="MLContext"/></param>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

< [](start = 60, length = 2)

add . #Resolved

using Microsoft.ML.Core.Data;
using Microsoft.ML.TimeSeries;
using System.IO;
using Microsoft.ML.Data;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctrl+r+g, aka sort them :) #Resolved

// Checkpoint the model.
var modelPath = "temp.zip";
engine.CheckPoint(ml, modelPath);

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you trying to show up here, but I'm not sure it would be clear to user.

Can you do following:
Train model.
Score it with data A.
Checkpoint at some moment.
Score it with data B.
Load model with check point.
Score it with data B.

To show up what we preserve state.

Right now you have following:
Train model.
Score it with data A.
Checkpoint at some moment.
Load model with check point.
Score it with data B.

It's great, but I think it just bring confusion to the sample, like why we checkpoint and load, why we doing this? In my suggestion I think it clearly shows what you can preserve state, and show you reason why you want to do that.
#Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change in IID Change Point detector example.


In reply to: 238025595 [](ancestors = 238025595)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi
Copy link
Member

sfilipi commented Dec 1, 2018

using System.Linq;

the samples are a bit too long. The API samples should illustrate just the API usage; not a full end-to-end example of how to do time-series.

I think as they are , without prediction they are better. You can add a full sample with prediction to the machinelearning-samples repo.

If asked, i would not merge this PR at all, and contribute the full work to that repo. #Resolved


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/IidSpikeDetectorTransform.cs:2 in 96aa6ec. [](commit_id = 96aa6ec, deletion_comment = False)

@codemzs
Copy link
Member Author

codemzs commented Dec 3, 2018

using System.Linq;

I'll sync offline with you on this.


In reply to: 443404897 [](ancestors = 443404897)


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/IidSpikeDetectorTransform.cs:2 in 96aa6ec. [](commit_id = 96aa6ec, deletion_comment = False)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codemzs codemzs merged commit f40fb02 into dotnet:master Dec 3, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants