Updated doc comments and renamed public types #5153

harishsk · 2020-05-22T17:39:39Z

@gvashishtha Left a few doc comments on the earlier anomaly and time series PRs. I am addressing them here. The comments I could not address, I have copied them here.
Please review the changes and help me with the questions.

Since this is part of the public API, I have also taken the liberty to rename the following types and names:

Name AggType to AggregateType
Name AggSymbol to AggregateSymbol
Type Point to TimeSeriesPoint

Please review whether these renames are appropriate.

harishsk · 2020-05-22T17:42:45Z

src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs

@@ -156,7 +156,7 @@ public static class TimeSeriesCatalog
        /// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>.
        /// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param>
        /// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param>
-        /// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param>
+        /// <param name="threshold">The threshold to determine anomaly. Scores larger than the threshold are considered as anomalies. This value must fall between [0,1]. Default value is 0.3.</param>


Comment from @gvashishtha on the closed PR:
"Definition of threshold refers to "score," which I don't see defined anywhere"

Can you please add a note in comments on how to explain score here? I can make the fix. #Resolved

The suggestion from msftbl looks good to me. There is a similar comment in the function above #Resolved

harishsk · 2020-05-22T17:52:09Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/TimeSeries/LocalizeRootCause.cs

@@ -7,6 +7,7 @@ namespace Samples.Dynamic
 {
    public static class LocalizeRootCause
    {
+        // This is the string defined as the aggregation symbol in the AnomalyDimension and point dimension.


Comment from @gvashishtha in the other PR:

What is AGG_SYMBOL for here? I notice that on line 19, you have both AGG_SYMBOL and AggregateType.Sum, and that some of the points have AGG_SYMBOL passed in instead of strings like "DC1." Can you add a few comments explaining what AGG_SYMBOL is and why it is used?

I have copied the comment from RootCauseLocalizationType.cs but it is still not fully clear. Can you please share what the comment in both places should be?
#Resolved

Suggestion: "In the root cause detection input, it identifies an aggregation as opposed to a dimension value" #Resolved

codecov · 2020-05-22T19:19:54Z

Codecov Report

Merging #5153 into master will increase coverage by 0.00%.
The diff coverage is 81.81%.

@@           Coverage Diff           @@
##           master    #5153   +/-   ##
=======================================
  Coverage   75.79%   75.79%           
=======================================
  Files         993      993           
  Lines      180955   180955           
  Branches    19486    19486           
=======================================
+ Hits       137151   137157    +6     
+ Misses      38514    38510    -4     
+ Partials     5290     5288    -2

Flag	Coverage Δ
#Debug	`75.79% <81.81%> (+<0.01%)`	⬆️
#production	`71.71% <76.47%> (+<0.01%)`	⬆️
#test	`88.89% <100.00%> (ø)`

Impacted Files	Coverage Δ
...rc/Microsoft.ML.TimeSeries/SRCNNAnomalyDetector.cs	`30.90% <ø> (ø)`
...crosoft.ML.TimeSeries/RootCauseLocalizationType.cs	`51.19% <70.00%> (ø)`
src/Microsoft.ML.TimeSeries/RootCauseAnalyzer.cs	`56.44% <78.26%> (ø)`
src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs	`95.00% <100.00%> (ø)`
...crosoft.ML.TimeSeries.Tests/TimeSeriesDirectApi.cs	`99.53% <100.00%> (ø)`
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs	`89.58% <0.00%> (+0.32%)`	⬆️
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs	`62.25% <0.00%> (+2.64%)`	⬆️

mstfbl

Related to Anomaly Detection, this line below can change, as the paper is published here:

machinelearning/src/Microsoft.ML.TimeSeries/SRCNNAnomalyDetector.cs

Line 224 in e3ca7e0

/// * Link to the KDD 2019 paper will be updated after it goes public.

mstfbl · 2020-05-22T20:51:12Z

src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs

@@ -156,7 +156,7 @@ public static class TimeSeriesCatalog
        /// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>.
        /// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param>
        /// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param>
-        /// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param>
+        /// <param name="threshold">The threshold to determine anomaly. Scores larger than the threshold are considered as anomalies. This value must fall between [0,1]. Default value is 0.3.</param>


I believe "score" here is the anomaly score calculated for each data point. It is the circled value here, where the anomaly score is calculated for each time-series chucks:

This is the paper of this SRCNN anomaly detection model. As such I propose the following change:

Suggested change

/// <param name="threshold">The threshold to determine anomaly. Scores larger than the threshold are considered as anomalies. This value must fall between [0,1]. Default value is 0.3.</param>

/// <param name="threshold">The threshold to determine an anomaly. An anomaly is detected when the calculated anomaly score for a given time-series chunk is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3.</param>

``` #Resolved

mengaims · 2020-05-25T08:52:44Z

src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs

@@ -156,7 +156,7 @@ public static class TimeSeriesCatalog
        /// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>.
        /// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param>
        /// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param>
-        /// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param>
+        /// <param name="threshold">The threshold to determine an anomaly. An anomaly is detected when the calculated anomaly score for a given time-series chunk is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3.</param>


/// The threshold to determine an anomaly. An anomaly is detected when the calculated anomaly score for a given time-series chunk is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3. [](start = 8, length = 263)

The "score" here refers to "raw score", this is the score output by SR for each point. And there is an "anomaly score" under AnomalyAndMargin mode, this is a score calculated according to user's sensitivity setting when a point is detected as an anomaly by SR. So I would suggest this line to be:
/// The threshold to determine an anomaly. An anomaly is detected when the calculated SR raw score for a given point is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3. #Resolved

mengaims · 2020-05-25T08:56:06Z

    /// When set to AnomalyAndMargin, the output vector would be a 7-element Double vector of (IsAnomaly, AnomalyScore, Mag, ExpectedValue, BoundaryUnit, UpperBoundary, LowerBoundary).

We could add a line here to explain the difference between RawScore and AnomalyScore: The RawScore is output by SR to determine whether a point is an anomaly or not, under AnomalyAndMargin mode, when a point is an anomaly, an AnomalyScore will be calculated according to sensitivity setting. #Resolved

Refers to: src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs:167 in 20b72f7. [](commit_id = 20b72f7, deletion_comment = False)

mengaims

harishsk requested review from mengaims and suxi-ms May 22, 2020 17:39

harishsk requested a review from a team as a code owner May 22, 2020 17:39

harishsk requested review from gvashishtha and natke May 22, 2020 17:40

harishsk commented May 22, 2020

View reviewed changes

harishsk changed the title ~~Updated doc comments~~ Updated doc comments and renamed public types May 22, 2020

harishsk requested a review from yaeldekel May 22, 2020 19:58

mstfbl reviewed May 22, 2020

View reviewed changes

harishsk added 5 commits May 24, 2020 09:19

Updated doc comments

e630240

Added comment in samples

1e0edd0

Renamed variables

0fb9dc9

Updated paper link

5af4d81

Updated parameter description and comments

20b72f7

mengaims reviewed May 25, 2020

View reviewed changes

Addressed review comments

5e9904a

mengaims approved these changes May 26, 2020

View reviewed changes

mstfbl approved these changes May 26, 2020

View reviewed changes

harishsk merged commit 3cbd97a into dotnet:master May 26, 2020

ghost locked as resolved and limited conversation to collaborators Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updated doc comments and renamed public types #5153

Updated doc comments and renamed public types #5153

Uh oh!

harishsk commented May 22, 2020 •

edited

Loading

Uh oh!

harishsk May 22, 2020 •

edited

Loading

Uh oh!

klausmh May 23, 2020 •

edited by harishsk

Loading

Uh oh!

harishsk May 22, 2020 •

edited

Loading

Uh oh!

klausmh May 23, 2020 •

edited by harishsk

Loading

Uh oh!

codecov bot commented May 22, 2020 •

edited

Loading

Uh oh!

mstfbl left a comment

Uh oh!

mstfbl May 22, 2020 •

edited by harishsk

Loading

Uh oh!

mengaims May 25, 2020 •

edited

Loading

Uh oh!

mengaims commented May 25, 2020 •

edited

Loading

Uh oh!

mengaims left a comment

Uh oh!

Uh oh!

	/// <param name="threshold">The threshold to determine anomaly. Scores larger than the threshold are considered as anomalies. This value must fall between [0,1]. Default value is 0.3.</param>
	/// <param name="threshold">The threshold to determine an anomaly. An anomaly is detected when the calculated anomaly score for a given time-series chunk is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3.</param>
	``` #Resolved

Updated doc comments and renamed public types #5153

Updated doc comments and renamed public types #5153

Uh oh!

Conversation

harishsk commented May 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harishsk May 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klausmh May 23, 2020 • edited by harishsk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harishsk May 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klausmh May 23, 2020 • edited by harishsk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented May 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mstfbl left a comment

Choose a reason for hiding this comment

Uh oh!

mstfbl May 22, 2020 • edited by harishsk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mengaims May 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mengaims commented May 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mengaims left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

harishsk commented May 22, 2020 •

edited

Loading

harishsk May 22, 2020 •

edited

Loading

klausmh May 23, 2020 •

edited by harishsk

Loading

harishsk May 22, 2020 •

edited

Loading

klausmh May 23, 2020 •

edited by harishsk

Loading

codecov bot commented May 22, 2020 •

edited

Loading

mstfbl May 22, 2020 •

edited by harishsk

Loading

mengaims May 25, 2020 •

edited

Loading

mengaims commented May 25, 2020 •

edited

Loading