Skip to content

Commit 3cbd97a

Browse files
authored
Updated doc comments and renamed public types (#5153)
* Updated doc comments * Added comment in samples * Renamed variables * Updated paper link * Updated parameter description and comments * Addressed review comments
1 parent ad07320 commit 3cbd97a

File tree

7 files changed

+89
-84
lines changed

7 files changed

+89
-84
lines changed

docs/api-reference/time-series-root-cause-localization.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
At Mircosoft, we develop a decision tree based root cause localization method which helps to find out the root causes for an anomaly incident at a specific timestamp incrementally.
1+
At Microsoft, we have developed a decision tree based root cause localization method which helps to find out the root causes for an anomaly incident at a specific timestamp incrementally.
22

33
## Multi-Dimensional Root Cause Localization
4-
It's a common case that one measure is collected with many dimensions (*e.g.*, Province, ISP) whose values are categorical(*e.g.*, Beijing or Shanghai for dimension Province). When a measure's value deviates from its expected value, this measure encounters anomalies. In such case, operators would like to localize the root cause dimension combinations rapidly and accurately. Multi-dimensional root cause localization is critical to troubleshoot and mitigate such case.
4+
It's a common case that one measure is collected with many dimensions (*e.g.*, Province, ISP) whose values are categorical(*e.g.*, Beijing or Shanghai for dimension Province). When a measure's value deviates from its expected value, this measure encounters anomalies. In such case, users would like to localize the root cause dimension combinations rapidly and accurately. Multi-dimensional root cause localization is critical to troubleshoot and mitigate such case.
55

66
## Algorithm
77

@@ -13,7 +13,7 @@ The decision tree based root cause localization method is unsupervised, which me
1313

1414
### Decision Tree
1515

16-
[Decision tree](https://en.wikipedia.org/wiki/Decision_tree) algorithm chooses the highest information gain to split or construct a decision tree.  We use it to choose the dimension which contributes the most to the anomaly. Following are some concepts used in decision tree.
16+
The [Decision tree](https://en.wikipedia.org/wiki/Decision_tree) algorithm chooses the highest information gain to split or construct a decision tree.  We use it to choose the dimension which contributes the most to the anomaly. Below are some concepts used in decision trees.
1717

1818
#### Information Entropy
1919

@@ -30,7 +30,7 @@ $$Gain(D, a) = Ent(D) - \sum_{v=1}^{|V|} \frac{|D^V|}{|D |} Ent(D^v) $$
3030

3131
Where $Ent(D^v)$ is the entropy of set points in D for which dimension $a$ is equal to $v$, $|D|$ is the total number of points in dataset $D$. $|D^V|$ is the total number of points in dataset $D$ for which dimension $a$ is equal to $v$.
3232

33-
For all aggregated dimensions, we calculate the information for each dimension. The greater the reduction in this uncertainty, the more information is gained about D from dimension $a$.
33+
For all aggregated dimensions, we calculate the information for each dimension. The greater the reduction in this uncertainty, the more information is gained about $D$ from dimension $a$.
3434

3535
#### Entropy Gain Ratio
3636

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/TimeSeries/LocalizeRootCause.cs

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ namespace Samples.Dynamic
77
{
88
public static class LocalizeRootCause
99
{
10+
// In the root cause detection input, this string identifies an aggregation as opposed to a dimension value"
1011
private static string AGG_SYMBOL = "##SUM##";
1112
public static void Example()
1213
{
@@ -34,63 +35,63 @@ public static void Example()
3435
//Score: 0.26670448876705927, Path: DataCenter, Direction: Up, Dimension:[Country, UK] [DeviceType, ##SUM##] [DataCenter, DC1]
3536
}
3637

37-
private static List<Point> GetPoints()
38+
private static List<TimeSeriesPoint> GetPoints()
3839
{
39-
List<Point> points = new List<Point>();
40+
List<TimeSeriesPoint> points = new List<TimeSeriesPoint>();
4041

4142
Dictionary<string, Object> dic1 = new Dictionary<string, Object>();
4243
dic1.Add("Country", "UK");
4344
dic1.Add("DeviceType", "Laptop");
4445
dic1.Add("DataCenter", "DC1");
45-
points.Add(new Point(200, 100, true, dic1));
46+
points.Add(new TimeSeriesPoint(200, 100, true, dic1));
4647

4748
Dictionary<string, Object> dic2 = new Dictionary<string, Object>();
4849
dic2.Add("Country", "UK");
4950
dic2.Add("DeviceType", "Mobile");
5051
dic2.Add("DataCenter", "DC1");
51-
points.Add(new Point(1000, 100, true, dic2));
52+
points.Add(new TimeSeriesPoint(1000, 100, true, dic2));
5253

5354
Dictionary<string, Object> dic3 = new Dictionary<string, Object>();
5455
dic3.Add("Country", "UK");
5556
dic3.Add("DeviceType", AGG_SYMBOL);
5657
dic3.Add("DataCenter", "DC1");
57-
points.Add(new Point(1200, 200, true, dic3));
58+
points.Add(new TimeSeriesPoint(1200, 200, true, dic3));
5859

5960
Dictionary<string, Object> dic4 = new Dictionary<string, Object>();
6061
dic4.Add("Country", "UK");
6162
dic4.Add("DeviceType", "Laptop");
6263
dic4.Add("DataCenter", "DC2");
63-
points.Add(new Point(100, 100, false, dic4));
64+
points.Add(new TimeSeriesPoint(100, 100, false, dic4));
6465

6566
Dictionary<string, Object> dic5 = new Dictionary<string, Object>();
6667
dic5.Add("Country", "UK");
6768
dic5.Add("DeviceType", "Mobile");
6869
dic5.Add("DataCenter", "DC2");
69-
points.Add(new Point(200, 200, false, dic5));
70+
points.Add(new TimeSeriesPoint(200, 200, false, dic5));
7071

7172
Dictionary<string, Object> dic6 = new Dictionary<string, Object>();
7273
dic6.Add("Country", "UK");
7374
dic6.Add("DeviceType", AGG_SYMBOL);
7475
dic6.Add("DataCenter", "DC2");
75-
points.Add(new Point(300, 300, false, dic6));
76+
points.Add(new TimeSeriesPoint(300, 300, false, dic6));
7677

7778
Dictionary<string, Object> dic7 = new Dictionary<string, Object>();
7879
dic7.Add("Country", "UK");
7980
dic7.Add("DeviceType", AGG_SYMBOL);
8081
dic7.Add("DataCenter", AGG_SYMBOL);
81-
points.Add(new Point(1500, 500, true, dic7));
82+
points.Add(new TimeSeriesPoint(1500, 500, true, dic7));
8283

8384
Dictionary<string, Object> dic8 = new Dictionary<string, Object>();
8485
dic8.Add("Country", "UK");
8586
dic8.Add("DeviceType", "Laptop");
8687
dic8.Add("DataCenter", AGG_SYMBOL);
87-
points.Add(new Point(300, 200, true, dic8));
88+
points.Add(new TimeSeriesPoint(300, 200, true, dic8));
8889

8990
Dictionary<string, Object> dic9 = new Dictionary<string, Object>();
9091
dic9.Add("Country", "UK");
9192
dic9.Add("DeviceType", "Mobile");
9293
dic9.Add("DataCenter", AGG_SYMBOL);
93-
points.Add(new Point(1200, 300, true, dic9));
94+
points.Add(new TimeSeriesPoint(1200, 300, true, dic9));
9495

9596
return points;
9697
}

src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ public static SsaSpikeEstimator DetectSpikeBySsa(this TransformsCatalog catalog,
134134
/// <param name="windowSize">The size of the sliding window for computing spectral residual.</param>
135135
/// <param name="backAddWindowSize">The number of points to add back of training window. No more than <paramref name="windowSize"/>, usually keep default value.</param>
136136
/// <param name="lookaheadWindowSize">The number of pervious points used in prediction. No more than <paramref name="windowSize"/>, usually keep default value.</param>
137-
/// <param name="averageingWindowSize">The size of sliding window to generate a saliency map for the series. No more than <paramref name="windowSize"/>, usually keep default value.</param>
137+
/// <param name="averagingWindowSize">The size of sliding window to generate a saliency map for the series. No more than <paramref name="windowSize"/>, usually keep default value.</param>
138138
/// <param name="judgementWindowSize">The size of sliding window to calculate the anomaly score for each data point. No more than <paramref name="windowSize"/>.</param>
139139
/// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Should be in (0,1)</param>
140140
/// <example>
@@ -145,8 +145,8 @@ public static SsaSpikeEstimator DetectSpikeBySsa(this TransformsCatalog catalog,
145145
/// </format>
146146
/// </example>
147147
public static SrCnnAnomalyEstimator DetectAnomalyBySrCnn(this TransformsCatalog catalog, string outputColumnName, string inputColumnName,
148-
int windowSize = 64, int backAddWindowSize = 5, int lookaheadWindowSize = 5, int averageingWindowSize = 3, int judgementWindowSize = 21, double threshold = 0.3)
149-
=> new SrCnnAnomalyEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, windowSize, backAddWindowSize, lookaheadWindowSize, averageingWindowSize, judgementWindowSize, threshold, inputColumnName);
148+
int windowSize = 64, int backAddWindowSize = 5, int lookaheadWindowSize = 5, int averagingWindowSize = 3, int judgementWindowSize = 21, double threshold = 0.3)
149+
=> new SrCnnAnomalyEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, windowSize, backAddWindowSize, lookaheadWindowSize, averagingWindowSize, judgementWindowSize, threshold, inputColumnName);
150150

151151
/// <summary>
152152
/// Create <see cref="SrCnnEntireAnomalyDetector"/>, which detects timeseries anomalies for entire input using SRCNN algorithm.
@@ -156,7 +156,7 @@ public static SrCnnAnomalyEstimator DetectAnomalyBySrCnn(this TransformsCatalog
156156
/// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>.
157157
/// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param>
158158
/// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param>
159-
/// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param>
159+
/// <param name="threshold">The threshold to determine an anomaly. An anomaly is detected when the calculated SR raw score for a given point is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3.</param>
160160
/// <param name="batchSize">Divide the input data into batches to fit srcnn model.
161161
/// When set to -1, use the whole input to fit model instead of batch by batch, when set to a positive integer, use this number as batch size.
162162
/// Must be -1 or a positive integer no less than 12. Default value is 1024.</param>
@@ -165,6 +165,7 @@ public static SrCnnAnomalyEstimator DetectAnomalyBySrCnn(this TransformsCatalog
165165
/// When set to AnomalyOnly, the output vector would be a 3-element Double vector of (IsAnomaly, RawScore, Mag).
166166
/// When set to AnomalyAndExpectedValue, the output vector would be a 4-element Double vector of (IsAnomaly, RawScore, Mag, ExpectedValue).
167167
/// When set to AnomalyAndMargin, the output vector would be a 7-element Double vector of (IsAnomaly, AnomalyScore, Mag, ExpectedValue, BoundaryUnit, UpperBoundary, LowerBoundary).
168+
/// The RawScore is output by SR to determine whether a point is an anomaly or not, under AnomalyAndMargin mode, when a point is an anomaly, an AnomalyScore will be calculated according to sensitivity setting.
168169
/// Default value is AnomalyOnly.</param>
169170
/// <example>
170171
/// <format type="text/markdown">
@@ -182,7 +183,10 @@ public static IDataView DetectEntireAnomalyBySrCnn(this AnomalyDetectionCatalog
182183
/// </summary>
183184
/// <param name="catalog">The anomaly detection catalog.</param>
184185
/// <param name="src">Root cause's input. The data is an instance of <see cref="Microsoft.ML.TimeSeries.RootCauseLocalizationInput"/>.</param>
185-
/// <param name="beta">Beta is a weight parameter for user to choose. It is used when score is calculated for each root cause item. The range of beta should be in [0,1]. For a larger beta, root cause point which has a large difference between value and expected value will get a high score. On the contrary, for a small beta, root cause items which has a high relative change will get a high score.</param>
186+
/// <param name="beta">Beta is a weight parameter for user to choose.
187+
/// It is used when score is calculated for each root cause item. The range of beta should be in [0,1].
188+
/// For a larger beta, root cause items which have a large difference between value and expected value will get a high score.
189+
/// For a small beta, root cause items which have a high relative change will get a low score.</param>
186190
/// <example>
187191
/// <format type="text/markdown">
188192
/// <![CDATA[

0 commit comments

Comments
 (0)