Skip to content

Updated doc comments and renamed public types #5153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 26, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/api-reference/time-series-root-cause-localization.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
At Mircosoft, we develop a decision tree based root cause localization method which helps to find out the root causes for an anomaly incident at a specific timestamp incrementally.
At Microsoft, we have developed a decision tree based root cause localization method which helps to find out the root causes for an anomaly incident at a specific timestamp incrementally.

## Multi-Dimensional Root Cause Localization
It's a common case that one measure is collected with many dimensions (*e.g.*, Province, ISP) whose values are categorical(*e.g.*, Beijing or Shanghai for dimension Province). When a measure's value deviates from its expected value, this measure encounters anomalies. In such case, operators would like to localize the root cause dimension combinations rapidly and accurately. Multi-dimensional root cause localization is critical to troubleshoot and mitigate such case.
It's a common case that one measure is collected with many dimensions (*e.g.*, Province, ISP) whose values are categorical(*e.g.*, Beijing or Shanghai for dimension Province). When a measure's value deviates from its expected value, this measure encounters anomalies. In such case, users would like to localize the root cause dimension combinations rapidly and accurately. Multi-dimensional root cause localization is critical to troubleshoot and mitigate such case.

## Algorithm

Expand All @@ -13,7 +13,7 @@ The decision tree based root cause localization method is unsupervised, which me

### Decision Tree

[Decision tree](https://en.wikipedia.org/wiki/Decision_tree) algorithm chooses the highest information gain to split or construct a decision tree.  We use it to choose the dimension which contributes the most to the anomaly. Following are some concepts used in decision tree.
The [Decision tree](https://en.wikipedia.org/wiki/Decision_tree) algorithm chooses the highest information gain to split or construct a decision tree.  We use it to choose the dimension which contributes the most to the anomaly. Below are some concepts used in decision trees.

#### Information Entropy

Expand All @@ -30,7 +30,7 @@ $$Gain(D, a) = Ent(D) - \sum_{v=1}^{|V|} \frac{|D^V|}{|D |} Ent(D^v) $$

Where $Ent(D^v)$ is the entropy of set points in D for which dimension $a$ is equal to $v$, $|D|$ is the total number of points in dataset $D$. $|D^V|$ is the total number of points in dataset $D$ for which dimension $a$ is equal to $v$.

For all aggregated dimensions, we calculate the information for each dimension. The greater the reduction in this uncertainty, the more information is gained about D from dimension $a$.
For all aggregated dimensions, we calculate the information for each dimension. The greater the reduction in this uncertainty, the more information is gained about $D$ from dimension $a$.

#### Entropy Gain Ratio

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ namespace Samples.Dynamic
{
public static class LocalizeRootCause
{
// In the root cause detection input, this string identifies an aggregation as opposed to a dimension value"
private static string AGG_SYMBOL = "##SUM##";
public static void Example()
{
Expand Down Expand Up @@ -34,63 +35,63 @@ public static void Example()
//Score: 0.26670448876705927, Path: DataCenter, Direction: Up, Dimension:[Country, UK] [DeviceType, ##SUM##] [DataCenter, DC1]
}

private static List<Point> GetPoints()
private static List<TimeSeriesPoint> GetPoints()
{
List<Point> points = new List<Point>();
List<TimeSeriesPoint> points = new List<TimeSeriesPoint>();

Dictionary<string, Object> dic1 = new Dictionary<string, Object>();
dic1.Add("Country", "UK");
dic1.Add("DeviceType", "Laptop");
dic1.Add("DataCenter", "DC1");
points.Add(new Point(200, 100, true, dic1));
points.Add(new TimeSeriesPoint(200, 100, true, dic1));

Dictionary<string, Object> dic2 = new Dictionary<string, Object>();
dic2.Add("Country", "UK");
dic2.Add("DeviceType", "Mobile");
dic2.Add("DataCenter", "DC1");
points.Add(new Point(1000, 100, true, dic2));
points.Add(new TimeSeriesPoint(1000, 100, true, dic2));

Dictionary<string, Object> dic3 = new Dictionary<string, Object>();
dic3.Add("Country", "UK");
dic3.Add("DeviceType", AGG_SYMBOL);
dic3.Add("DataCenter", "DC1");
points.Add(new Point(1200, 200, true, dic3));
points.Add(new TimeSeriesPoint(1200, 200, true, dic3));

Dictionary<string, Object> dic4 = new Dictionary<string, Object>();
dic4.Add("Country", "UK");
dic4.Add("DeviceType", "Laptop");
dic4.Add("DataCenter", "DC2");
points.Add(new Point(100, 100, false, dic4));
points.Add(new TimeSeriesPoint(100, 100, false, dic4));

Dictionary<string, Object> dic5 = new Dictionary<string, Object>();
dic5.Add("Country", "UK");
dic5.Add("DeviceType", "Mobile");
dic5.Add("DataCenter", "DC2");
points.Add(new Point(200, 200, false, dic5));
points.Add(new TimeSeriesPoint(200, 200, false, dic5));

Dictionary<string, Object> dic6 = new Dictionary<string, Object>();
dic6.Add("Country", "UK");
dic6.Add("DeviceType", AGG_SYMBOL);
dic6.Add("DataCenter", "DC2");
points.Add(new Point(300, 300, false, dic6));
points.Add(new TimeSeriesPoint(300, 300, false, dic6));

Dictionary<string, Object> dic7 = new Dictionary<string, Object>();
dic7.Add("Country", "UK");
dic7.Add("DeviceType", AGG_SYMBOL);
dic7.Add("DataCenter", AGG_SYMBOL);
points.Add(new Point(1500, 500, true, dic7));
points.Add(new TimeSeriesPoint(1500, 500, true, dic7));

Dictionary<string, Object> dic8 = new Dictionary<string, Object>();
dic8.Add("Country", "UK");
dic8.Add("DeviceType", "Laptop");
dic8.Add("DataCenter", AGG_SYMBOL);
points.Add(new Point(300, 200, true, dic8));
points.Add(new TimeSeriesPoint(300, 200, true, dic8));

Dictionary<string, Object> dic9 = new Dictionary<string, Object>();
dic9.Add("Country", "UK");
dic9.Add("DeviceType", "Mobile");
dic9.Add("DataCenter", AGG_SYMBOL);
points.Add(new Point(1200, 300, true, dic9));
points.Add(new TimeSeriesPoint(1200, 300, true, dic9));

return points;
}
Expand Down
14 changes: 9 additions & 5 deletions src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ public static SsaSpikeEstimator DetectSpikeBySsa(this TransformsCatalog catalog,
/// <param name="windowSize">The size of the sliding window for computing spectral residual.</param>
/// <param name="backAddWindowSize">The number of points to add back of training window. No more than <paramref name="windowSize"/>, usually keep default value.</param>
/// <param name="lookaheadWindowSize">The number of pervious points used in prediction. No more than <paramref name="windowSize"/>, usually keep default value.</param>
/// <param name="averageingWindowSize">The size of sliding window to generate a saliency map for the series. No more than <paramref name="windowSize"/>, usually keep default value.</param>
/// <param name="averagingWindowSize">The size of sliding window to generate a saliency map for the series. No more than <paramref name="windowSize"/>, usually keep default value.</param>
/// <param name="judgementWindowSize">The size of sliding window to calculate the anomaly score for each data point. No more than <paramref name="windowSize"/>.</param>
/// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Should be in (0,1)</param>
/// <example>
Expand All @@ -145,8 +145,8 @@ public static SsaSpikeEstimator DetectSpikeBySsa(this TransformsCatalog catalog,
/// </format>
/// </example>
public static SrCnnAnomalyEstimator DetectAnomalyBySrCnn(this TransformsCatalog catalog, string outputColumnName, string inputColumnName,
int windowSize = 64, int backAddWindowSize = 5, int lookaheadWindowSize = 5, int averageingWindowSize = 3, int judgementWindowSize = 21, double threshold = 0.3)
=> new SrCnnAnomalyEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, windowSize, backAddWindowSize, lookaheadWindowSize, averageingWindowSize, judgementWindowSize, threshold, inputColumnName);
int windowSize = 64, int backAddWindowSize = 5, int lookaheadWindowSize = 5, int averagingWindowSize = 3, int judgementWindowSize = 21, double threshold = 0.3)
=> new SrCnnAnomalyEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, windowSize, backAddWindowSize, lookaheadWindowSize, averagingWindowSize, judgementWindowSize, threshold, inputColumnName);

/// <summary>
/// Create <see cref="SrCnnEntireAnomalyDetector"/>, which detects timeseries anomalies for entire input using SRCNN algorithm.
Expand All @@ -156,7 +156,7 @@ public static SrCnnAnomalyEstimator DetectAnomalyBySrCnn(this TransformsCatalog
/// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>.
/// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param>
/// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param>
/// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param>
/// <param name="threshold">The threshold to determine an anomaly. An anomaly is detected when the calculated SR raw score for a given point is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3.</param>
/// <param name="batchSize">Divide the input data into batches to fit srcnn model.
/// When set to -1, use the whole input to fit model instead of batch by batch, when set to a positive integer, use this number as batch size.
/// Must be -1 or a positive integer no less than 12. Default value is 1024.</param>
Expand All @@ -165,6 +165,7 @@ public static SrCnnAnomalyEstimator DetectAnomalyBySrCnn(this TransformsCatalog
/// When set to AnomalyOnly, the output vector would be a 3-element Double vector of (IsAnomaly, RawScore, Mag).
/// When set to AnomalyAndExpectedValue, the output vector would be a 4-element Double vector of (IsAnomaly, RawScore, Mag, ExpectedValue).
/// When set to AnomalyAndMargin, the output vector would be a 7-element Double vector of (IsAnomaly, AnomalyScore, Mag, ExpectedValue, BoundaryUnit, UpperBoundary, LowerBoundary).
/// The RawScore is output by SR to determine whether a point is an anomaly or not, under AnomalyAndMargin mode, when a point is an anomaly, an AnomalyScore will be calculated according to sensitivity setting.
/// Default value is AnomalyOnly.</param>
/// <example>
/// <format type="text/markdown">
Expand All @@ -182,7 +183,10 @@ public static IDataView DetectEntireAnomalyBySrCnn(this AnomalyDetectionCatalog
/// </summary>
/// <param name="catalog">The anomaly detection catalog.</param>
/// <param name="src">Root cause's input. The data is an instance of <see cref="Microsoft.ML.TimeSeries.RootCauseLocalizationInput"/>.</param>
/// <param name="beta">Beta is a weight parameter for user to choose. It is used when score is calculated for each root cause item. The range of beta should be in [0,1]. For a larger beta, root cause point which has a large difference between value and expected value will get a high score. On the contrary, for a small beta, root cause items which has a high relative change will get a high score.</param>
/// <param name="beta">Beta is a weight parameter for user to choose.
/// It is used when score is calculated for each root cause item. The range of beta should be in [0,1].
/// For a larger beta, root cause items which have a large difference between value and expected value will get a high score.
/// For a small beta, root cause items which have a high relative change will get a low score.</param>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
Expand Down
Loading