Skip to content

Updated doc comments and renamed public types #5153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 26, 2020
Merged

Updated doc comments and renamed public types #5153

merged 6 commits into from
May 26, 2020

Conversation

harishsk
Copy link
Contributor

@harishsk harishsk commented May 22, 2020

@gvashishtha Left a few doc comments on the earlier anomaly and time series PRs. I am addressing them here. The comments I could not address, I have copied them here.
Please review the changes and help me with the questions.

Since this is part of the public API, I have also taken the liberty to rename the following types and names:

  • Name AggType to AggregateType
  • Name AggSymbol to AggregateSymbol
  • Type Point to TimeSeriesPoint

Please review whether these renames are appropriate.

@harishsk harishsk requested review from mengaims and suxi-ms May 22, 2020 17:39
@harishsk harishsk requested a review from a team as a code owner May 22, 2020 17:39
@harishsk harishsk requested review from gvashishtha and natke May 22, 2020 17:40
@@ -156,7 +156,7 @@ public static class TimeSeriesCatalog
/// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>.
/// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param>
/// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param>
/// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param>
/// <param name="threshold">The threshold to determine anomaly. Scores larger than the threshold are considered as anomalies. This value must fall between [0,1]. Default value is 0.3.</param>
Copy link
Contributor Author

@harishsk harishsk May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from @gvashishtha on the closed PR:
"Definition of threshold refers to "score," which I don't see defined anywhere"

Can you please add a note in comments on how to explain score here? I can make the fix. #Resolved

Copy link
Contributor

@klausmh klausmh May 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion from msftbl looks good to me. There is a similar comment in the function above #Resolved

@@ -7,6 +7,7 @@ namespace Samples.Dynamic
{
public static class LocalizeRootCause
{
// This is the string defined as the aggregation symbol in the AnomalyDimension and point dimension.
Copy link
Contributor Author

@harishsk harishsk May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from @gvashishtha in the other PR:

What is AGG_SYMBOL for here? I notice that on line 19, you have both AGG_SYMBOL and AggregateType.Sum, and that some of the points have AGG_SYMBOL passed in instead of strings like "DC1."

Can you add a few comments explaining what AGG_SYMBOL is and why it is used?

I have copied the comment from RootCauseLocalizationType.cs but it is still not fully clear. Can you please share what the comment in both places should be?
#Resolved

Copy link
Contributor

@klausmh klausmh May 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "In the root cause detection input, it identifies an aggregation as opposed to a dimension value" #Resolved

@harishsk harishsk changed the title Updated doc comments Updated doc comments and renamed public types May 22, 2020
@codecov
Copy link

codecov bot commented May 22, 2020

Codecov Report

Merging #5153 into master will increase coverage by 0.00%.
The diff coverage is 81.81%.

@@           Coverage Diff           @@
##           master    #5153   +/-   ##
=======================================
  Coverage   75.79%   75.79%           
=======================================
  Files         993      993           
  Lines      180955   180955           
  Branches    19486    19486           
=======================================
+ Hits       137151   137157    +6     
+ Misses      38514    38510    -4     
+ Partials     5290     5288    -2     
Flag Coverage Δ
#Debug 75.79% <81.81%> (+<0.01%) ⬆️
#production 71.71% <76.47%> (+<0.01%) ⬆️
#test 88.89% <100.00%> (ø)
Impacted Files Coverage Δ
...rc/Microsoft.ML.TimeSeries/SRCNNAnomalyDetector.cs 30.90% <ø> (ø)
...crosoft.ML.TimeSeries/RootCauseLocalizationType.cs 51.19% <70.00%> (ø)
src/Microsoft.ML.TimeSeries/RootCauseAnalyzer.cs 56.44% <78.26%> (ø)
src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs 95.00% <100.00%> (ø)
...crosoft.ML.TimeSeries.Tests/TimeSeriesDirectApi.cs 99.53% <100.00%> (ø)
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 89.58% <0.00%> (+0.32%) ⬆️
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs 62.25% <0.00%> (+2.64%) ⬆️

@harishsk harishsk requested a review from yaeldekel May 22, 2020 19:58
Copy link
Contributor

@mstfbl mstfbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to Anomaly Detection, this line below can change, as the paper is published here:

/// * Link to the KDD 2019 paper will be updated after it goes public.

@@ -156,7 +156,7 @@ public static class TimeSeriesCatalog
/// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>.
/// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param>
/// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param>
/// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param>
/// <param name="threshold">The threshold to determine anomaly. Scores larger than the threshold are considered as anomalies. This value must fall between [0,1]. Default value is 0.3.</param>
Copy link
Contributor

@mstfbl mstfbl May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe "score" here is the anomaly score calculated for each data point. It is the circled value here, where the anomaly score is calculated for each time-series chucks:
anomaly_score
This is the paper of this SRCNN anomaly detection model. As such I propose the following change:

Suggested change
/// <param name="threshold">The threshold to determine anomaly. Scores larger than the threshold are considered as anomalies. This value must fall between [0,1]. Default value is 0.3.</param>
/// <param name="threshold">The threshold to determine an anomaly. An anomaly is detected when the calculated anomaly score for a given time-series chunk is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3.</param>
``` #Resolved

@@ -156,7 +156,7 @@ public static class TimeSeriesCatalog
/// <param name="outputColumnName">Name of the column resulting from data processing of <paramref name="inputColumnName"/>.
/// The column data is a vector of <see cref="System.Double"/>. The length of this vector varies depending on <paramref name="detectMode"/>.</param>
/// <param name="inputColumnName">Name of column to process. The column data must be <see cref="System.Double"/>.</param>
/// <param name="threshold">The threshold to determine anomaly, score larger than the threshold is considered as anomaly. Must be in [0,1]. Default value is 0.3.</param>
/// <param name="threshold">The threshold to determine an anomaly. An anomaly is detected when the calculated anomaly score for a given time-series chunk is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3.</param>
Copy link
Contributor

@mengaims mengaims May 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// The threshold to determine an anomaly. An anomaly is detected when the calculated anomaly score for a given time-series chunk is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3. [](start = 8, length = 263)

The "score" here refers to "raw score", this is the score output by SR for each point. And there is an "anomaly score" under AnomalyAndMargin mode, this is a score calculated according to user's sensitivity setting when a point is detected as an anomaly by SR. So I would suggest this line to be:
/// The threshold to determine an anomaly. An anomaly is detected when the calculated SR raw score for a given point is more than the set threshold. This threshold must fall between [0,1], and its default value is 0.3. #Resolved

@mengaims
Copy link
Contributor

mengaims commented May 25, 2020

    /// When set to AnomalyAndMargin, the output vector would be a 7-element Double vector of (IsAnomaly, AnomalyScore, Mag, ExpectedValue, BoundaryUnit, UpperBoundary, LowerBoundary).

We could add a line here to explain the difference between RawScore and AnomalyScore: The RawScore is output by SR to determine whether a point is an anomaly or not, under AnomalyAndMargin mode, when a point is an anomaly, an AnomalyScore will be calculated according to sensitivity setting. #Resolved


Refers to: src/Microsoft.ML.TimeSeries/ExtensionsCatalog.cs:167 in 20b72f7. [](commit_id = 20b72f7, deletion_comment = False)

Copy link
Contributor

@mengaims mengaims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@harishsk harishsk merged commit 3cbd97a into dotnet:master May 26, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants