Skip to content

XML documentation for Time Series #3444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 21, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions docs/api-reference/time-series-scorer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
### Anomaly Scorer
Once the raw score at a timestamp is computed, it is fed to the anomaly scorer component to calculate the final anomaly score at that timestamp.
There are two statistics involved in this scorer, p-value and martingale score.

#### Spike detection based on p-value
The p-value score indicates the p-value of the current computed raw score according to a distribution of raw scores.
Here, the distribution is estimated based on the most recent raw score values up to certain depth back in the history.
More specifically, this distribution is estimated using [kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation)
with the Gaussian [kernels](https://en.wikipedia.org/wiki/Kernel_(statistics)#In_non-parametric_statistics) of adaptive bandwidth.
The p-value score is always in $[0, 1]$, and the lower its value, the more likely the current point is an outlier (also known as a spike).
If the p-value score exceeds $1 - \frac{\text{confidence}}{100}$, the associated timestamp may get a non-zero alert value in spike detection, which means a spike point is detected.
Note that $\text{confidence}$ is defined in the signatures of [DetectChangePointBySsa](xref:Microsoft.ML.TimeSeriesCatalog.DetectChangePointBySsa(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.ErrorFunction,Microsoft.ML.Transforms.TimeSeries.MartingaleType,System.Double))
and [DetectIidChangePoint](xref:Microsoft.ML.TimeSeriesCatalog.DetectIidChangePoint(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.MartingaleType,System.Double)).


#### Change point detection based on martingale score
The martingale score is an extra level of scoring that is built upon the p-value scores.
The idea is based on the [Exchangeability Martingales](https://arxiv.org/pdf/1204.3251.pdf) that detect a change of distribution over a stream of i.i.d. values.
In short, the value of the martingale score starts increasing significantly when a sequence of small p-values detected in a row; this indicates the change of the distribution of the underlying data generation process.
Thus, the martingale score is used for change point detection.
Given a sequence of most recently observed p-values, $p1, \dots, p_n$, the martingale score is computed as:? $s(p1, \dots, p_n) = \prod_{i=1}^n \beta(p_i)$.
There are two choices of $\beta$: $\beta(p) = e p^{\epsilon - 1}$ for $0 < \epsilon < 1$ or $\beta(p) = \int_{0}^1 \epsilon p^{\epsilon - 1} d\epsilon$.

If the martingle score exceeds $s(q_1, \dots, q_n)$ where $q_i=1 - \frac{\text{confidence}}{100}$, the associated timestamp may get a non-zero alert value for change point detection.
Note that $\text{confidence}$ is defined in the signatures of [DetectChangePointBySsa](xref:Microsoft.ML.TimeSeriesCatalog.DetectChangePointBySsa(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.ErrorFunction,Microsoft.ML.Transforms.TimeSeries.MartingaleType,System.Double)) or
[DetectIidChangePoint](xref:Microsoft.ML.TimeSeriesCatalog.DetectIidChangePoint(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.MartingaleType,System.Double)).
24 changes: 12 additions & 12 deletions src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ namespace Microsoft.ML
public static class TimeSeriesCatalog
{
/// <summary>
/// Create a new instance of <see cref="IidChangePointEstimator"/> that detects a change of in an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables"> independent identically distributed (i.i.d.)</a> time series.
/// Detection is based on adaptive kernel density estimations and martingale scores.
/// Create <see cref="IidChangePointEstimator"/>, which predicts change points in an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables">independent identically distributed (i.i.d.)</a>
/// time series based on adaptive kernel density estimations and martingale scores.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
/// Column is a vector of type double and size 4. The vector contains Alert, Raw Score, P-Value and Martingale score as first four values.</param>
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// <param name="inputColumnName">Name of column to transform. The column data must be <see cref="System.Single"/>. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// <param name="confidence">The confidence for change point detection in the range [0, 100].</param>
/// <param name="changeHistoryLength">The length of the sliding window on p-values for computing the martingale score.</param>
/// <param name="martingale">The martingale used for scoring.</param>
Expand All @@ -34,13 +34,13 @@ public static IidChangePointEstimator DetectIidChangePoint(this TransformsCatalo
=> new IidChangePointEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, confidence, changeHistoryLength, inputColumnName, martingale, eps);

/// <summary>
/// Create a new instance of <see cref="IidSpikeEstimator"/> that detects a spike in an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables">independent identically distributed (i.i.d.)</a> time series.
/// Detection is based on adaptive kernel density estimations and martingale scores.
/// Create <see cref="IidSpikeEstimator"/>, which predicts spikes in
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables">independent identically distributed (i.i.d.)</a>
/// time series based on adaptive kernel density estimations and martingale scores.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/></param>.
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// <param name="inputColumnName">Name of column to transform. The column data must be <see cref="System.Single"/>. The column data must be <see cref="System.Single"/>. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// <param name="confidence">The confidence for spike detection in the range [0, 100].</param>
/// <param name="pvalueHistoryLength">The size of the sliding window for computing the p-value.</param>
/// <param name="side">The argument that determines whether to detect positive or negative anomalies, or both.</param>
Expand All @@ -56,13 +56,13 @@ public static IidSpikeEstimator DetectIidSpike(this TransformsCatalog catalog, s
=> new IidSpikeEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, confidence, pvalueHistoryLength, inputColumnName, side);

/// <summary>
/// Create a new instance of <see cref="SsaChangePointEstimator"/> for detecting a change in a time series signal
/// Create <see cref="SsaChangePointEstimator"/>, which predicts change points in time series
/// using <a href="https://en.wikipedia.org/wiki/Singular_spectrum_analysis">Singular Spectrum Analysis (SSA)</a>.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.
/// Column is a vector of type double and size 4. The vector contains Alert, Raw Score, P-Value and Martingale score as first four values.</param>
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// <param name="inputColumnName">Name of column to transform. The column data must be <see cref="System.Single"/>. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.</param>
/// <param name="confidence">The confidence for change point detection in the range [0, 100].</param>
/// <param name="trainingWindowSize">The number of points from the beginning of the sequence used for training.</param>
/// <param name="changeHistoryLength">The size of the sliding window for computing the p-value.</param>
Expand Down Expand Up @@ -94,12 +94,12 @@ public static SsaChangePointEstimator DetectChangePointBySsa(this TransformsCata
});

/// <summary>
/// Create a new instance of <see cref="SsaSpikeEstimator"/> for detecting a spike in a time series signal
/// Create <see cref="SsaSpikeEstimator"/>, which predicts spikes in time series
/// using <a href="https://en.wikipedia.org/wiki/Singular_spectrum_analysis">Singular Spectrum Analysis (SSA)</a>.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="outputColumnName">Name of the column resulting from the transformation of <paramref name="inputColumnName"/>.</param>
/// <param name="inputColumnName">Name of column to transform. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.
/// <param name="inputColumnName">Name of column to transform. The column data must be <see cref="System.Single"/>. If set to <see langword="null"/>, the value of the <paramref name="outputColumnName"/> will be used as source.
/// <param name="confidence">The confidence for spike detection in the range [0, 100].</param>
/// <param name="pvalueHistoryLength">The size of the sliding window for computing the p-value.</param>
/// <param name="trainingWindowSize">The number of points from the beginning of the sequence used for training.</param>
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.TimeSeries/IidAnomalyDetectionBase.cs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ public class IidAnomalyDetectionBaseWrapper : IStatefulTransformer, ICanSaveMode
bool ITransformer.IsRowToRowMapper => ((ITransformer)InternalTransform).IsRowToRowMapper;

/// <summary>
/// Creates a clone of the transfomer. Used for taking the snapshot of the state.
/// Creates a clone of the transformer. Used for taking the snapshot of the state.
/// </summary>
/// <returns></returns>
IStatefulTransformer IStatefulTransformer.Clone() => InternalTransform.Clone();
Expand Down
35 changes: 32 additions & 3 deletions src/Microsoft.ML.TimeSeries/IidChangePointDetector.cs
Original file line number Diff line number Diff line change
Expand Up @@ -191,10 +191,39 @@ private static IRowMapper Create(IHostEnvironment env, ModelLoadContext ctx, Dat
}

/// <summary>
/// The <see cref="IEstimator{ITransformer}"/> for detecting a signal change on an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables"> independent identically distributed (i.i.d.)</a> time series.
/// Detection is based on adaptive kernel density estimation and martingales.
/// The <see cref="IEstimator{TTransformer}"/> to detect a signal change on an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables"> independent identically distributed (i.i.d.)</a>
/// time series based on adaptive kernel density estimation and martingales.
/// </summary>
/// <remarks>
/// <format type="text/markdown"><![CDATA[
/// To create this estimator, use [DetectIidChangePoint](xref:Microsoft.ML.TimeSeriesCatalog.DetectIidChangePoint(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.MartingaleType,System.Double)).
///
/// ### Input and Output Columns
/// There is only input column and its type is <xref:System.Single>.
///
/// | Output Column Name | Column Type | Description|
/// | -- | -- | -- |
/// | All input columns | Any | All input columns would pass by without being modified.
/// | `Prediction` | Known-sized vector of <xref:System.Single> | It contains alert level (non-zero value means a change point), score, p-value, and martingale value.
///
/// ### Estimator Characteristics
/// | | |
/// | -- | -- |
/// | Machine learning task | Time series analysis |
/// | Is normalization required? | No |
/// | Is caching required? | No |
/// | Required NuGet in addition to Microsoft.ML | Microsoft.ML.TimeSeries |
///
/// ### Training Algorithm Details
/// This trainer assumes that data points collected in the considered time series are independently sampled from the same distribution (independent identically distributed).
/// Thus, the value at the current timestamp can be viewed as the predicted value, raw score, at the next timestamp in expectation.
///
/// [!include[io](~/../docs/samples/docs/api-reference/time-series-scorer.md)]
/// ]]>
/// </format>
/// </remarks>
/// <seealso cref="Microsoft.ML.TimeSeriesCatalog.DetectIidChangePoint(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.MartingaleType,System.Double)" />
public sealed class IidChangePointEstimator : TrivialEstimator<IidChangePointDetector>
{
/// <summary>
Expand Down
35 changes: 32 additions & 3 deletions src/Microsoft.ML.TimeSeries/IidSpikeDetector.cs
Original file line number Diff line number Diff line change
Expand Up @@ -171,10 +171,39 @@ private static IRowMapper Create(IHostEnvironment env, ModelLoadContext ctx, Dat
}

/// <summary>
/// The <see cref="IEstimator{ITransformer}"/> for detecting a signal spike on an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables"> independent identically distributed (i.i.d.)</a> time series.
/// Detection is based on adaptive kernel density estimation.
/// The <see cref="IEstimator{TTransformer}"/> to detect a signal spike on an
/// <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables"> independent identically distributed (i.i.d.)</a>
/// time series based on adaptive kernel density estimation.
/// </summary>
/// <remarks>
/// <format type="text/markdown"><![CDATA[
/// To create this estimator, use [DetectIidSpike](xref:Microsoft.ML.TimeSeriesCatalog.DetectIidSpike(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.AnomalySide)).
///
/// ### Input and Output Columns
/// There is only input column and its type is <xref:System.Single>.
///
/// | Output Column Name | Column Type | Description|
/// | -- | -- | -- |
/// | All input columns | Any | All input columns would pass by without being modified.
/// | `Prediction` | Known-sized vector of <xref:System.Single> | It contains alert level (non-zero value means a change point), score, and p-value.
///
/// ### Estimator Characteristics
/// | | |
/// | -- | -- |
/// | Machine learning task | Time series analysis |
/// | Is normalization required? | No |
/// | Is caching required? | No |
/// | Required NuGet in addition to Microsoft.ML | Microsoft.ML.TimeSeries |
///
/// ### Training Algorithm Details
/// This trainer assumes that data points collected in the considered time series are independently sampled from the same distribution (independent identically distributed).
/// Thus, the value at the current timestamp can be viewed as the predicted value, raw score, at the next timestamp in expectation.
///
/// [!include[io](~/../docs/samples/docs/api-reference/time-series-scorer.md)]
/// ]]>
/// </format>
/// </remarks>
/// <seealso cref="Microsoft.ML.TimeSeriesCatalog.DetectIidSpike(Microsoft.ML.TransformsCatalog,System.String,System.String,System.Int32,System.Int32,Microsoft.ML.Transforms.TimeSeries.AnomalySide)" />
public sealed class IidSpikeEstimator : TrivialEstimator<IidSpikeDetector>
{
/// <summary>
Expand Down
4 changes: 2 additions & 2 deletions src/Microsoft.ML.TimeSeries/SsaAnomalyDetectionBase.cs
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ public class SsaAnomalyDetectionBaseWrapper : IStatefulTransformer, ICanSaveMode
bool ITransformer.IsRowToRowMapper => ((ITransformer)InternalTransform).IsRowToRowMapper;

/// <summary>
/// Creates a clone of the transfomer. Used for taking the snapshot of the state.
/// Creates a clone of the transformer. Used for taking the snapshot of the state.
/// </summary>
/// <returns></returns>
IStatefulTransformer IStatefulTransformer.Clone() => InternalTransform.Clone();
Expand Down Expand Up @@ -340,7 +340,7 @@ private protected override void InitializeAnomalyDetector()

private protected override double ComputeRawAnomalyScore(ref Single input, FixedSizeQueue<Single> windowedBuffer, long iteration)
{
// Get the prediction for the next point opn the series
// Get the prediction for the next point in the series
Single expectedValue = 0;
_model.PredictNext(ref expectedValue);

Expand Down
Loading