-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Updated doc comments and renamed public types #5153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -156,7 +156,7 @@ public static class TimeSeriesCatalog | |||
/// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>. | |||
/// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param> | |||
/// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param> | |||
/// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param> | |||
/// <param name="threshold">The threshold to determine anomaly. Scores larger than the threshold are considered as anomalies. This value must fall between [0,1]. Default value is 0.3.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment from @gvashishtha on the closed PR:
"Definition of threshold refers to "score," which I don't see defined anywhere"
Can you please add a note in comments on how to explain score here? I can make the fix. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion from msftbl looks good to me. There is a similar comment in the function above #Resolved
@@ -7,6 +7,7 @@ namespace Samples.Dynamic | |||
{ | |||
public static class LocalizeRootCause | |||
{ | |||
// This is the string defined as the aggregation symbol in the AnomalyDimension and point dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment from @gvashishtha in the other PR:
What is AGG_SYMBOL for here? I notice that on line 19, you have both AGG_SYMBOL and AggregateType.Sum, and that some of the points have AGG_SYMBOL passed in instead of strings like "DC1."
Can you add a few comments explaining what AGG_SYMBOL is and why it is used?
I have copied the comment from RootCauseLocalizationType.cs but it is still not fully clear. Can you please share what the comment in both places should be?
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: "In the root cause detection input, it identifies an aggregation as opposed to a dimension value" #Resolved
Codecov Report
@@ Coverage Diff @@
## master #5153 +/- ##
=======================================
Coverage 75.79% 75.79%
=======================================
Files 993 993
Lines 180955 180955
Branches 19486 19486
=======================================
+ Hits 137151 137157 +6
+ Misses 38514 38510 -4
+ Partials 5290 5288 -2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to Anomaly Detection, this line below can change, as the paper is published here:
/// * Link to the KDD 2019 paper will be updated after it goes public. |
@@ -156,7 +156,7 @@ public static class TimeSeriesCatalog | |||
/// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>. | |||
/// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param> | |||
/// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param> | |||
/// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param> | |||
/// <param name="threshold">The threshold to determine anomaly. Scores larger than the threshold are considered as anomalies. This value must fall between [0,1]. Default value is 0.3.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe "score" here is the anomaly score calculated for each data point. It is the circled value here, where the anomaly score is calculated for each time-series chucks:
This is the paper of this SRCNN anomaly detection model. As such I propose the following change:
/// <param name="threshold">The threshold to determine anomaly. Scores larger than the threshold are considered as anomalies. This value must fall between [0,1]. Default value is 0.3.</param> | |
/// <param name="threshold">The threshold to determine an anomaly. An anomaly is detected when the calculated anomaly score for a given time-series chunk is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3.</param> | |
``` #Resolved |
@@ -156,7 +156,7 @@ public static class TimeSeriesCatalog | |||
/// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>. | |||
/// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param> | |||
/// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param> | |||
/// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param> | |||
/// <param name="threshold">The threshold to determine an anomaly. An anomaly is detected when the calculated anomaly score for a given time-series chunk is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The threshold to determine an anomaly. An anomaly is detected when the calculated anomaly score for a given time-series chunk is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3. [](start = 8, length = 263)
The "score" here refers to "raw score", this is the score output by SR for each point. And there is an "anomaly score" under AnomalyAndMargin mode, this is a score calculated according to user's sensitivity setting when a point is detected as an anomaly by SR. So I would suggest this line to be:
/// The threshold to determine an anomaly. An anomaly is detected when the calculated SR raw score for a given point is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3. #Resolved
We could add a line here to explain the difference between RawScore and AnomalyScore: The RawScore is output by SR to determine whether a point is an anomaly or not, under AnomalyAndMargin mode, when a point is an anomaly, an AnomalyScore will be calculated according to sensitivity setting. #Resolved Refers to: src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs:167 in 20b72f7. [](commit_id = 20b72f7, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gvashishtha Left a few doc comments on the earlier anomaly and time series PRs. I am addressing them here. The comments I could not address, I have copied them here.
Please review the changes and help me with the questions.
Since this is part of the public API, I have also taken the liberty to rename the following types and names:
AggType
toAggregateType
AggSymbol
toAggregateSymbol
Point
toTimeSeriesPoint
Please review whether these renames are appropriate.