-
Notifications
You must be signed in to change notification settings - Fork 1.9k
add root cause localization transformer #4925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 32 commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
d5ee205
add root cause localization transformer
suxi-ms f727a79
add test cases
suxi-ms 92de1dc
revert sln changes
suxi-ms 798289c
add evaluation
suxi-ms f2e128d
temp save for internal review
suxi-ms 51569e3
rename function
suxi-ms 59c6e89
temp save bottom up points for switch desktop
suxi-ms 29216e0
update from laptop
suxi-ms 69da330
save for add test
suxi-ms e1c5432
add root cause localization algorithm
suxi-ms 3a1d1c5
add root cause localization algorithm
suxi-ms 8f97602
print score, path and directions in sample
suxi-ms 48123f4
merge with master
suxi-ms c47302f
extract root cause analyzer
suxi-ms b07ad28
refine code
suxi-ms c729877
merge with master
suxi-ms ebbdb0d
update for algorithm
suxi-ms 0d43b0d
add evaluatin
suxi-ms 5778eed
some refine for code
suxi-ms c9ed044
fix some typo
suxi-ms e440f25
remove unused code
suxi-ms feba6f4
reformat code
suxi-ms 686831c
updates
suxi-ms ddc8a36
update from review
suxi-ms 475ee8a
read double for beta
suxi-ms 8d874ca
remove SignatureDataTransform constructor
suxi-ms 0674ab3
update
suxi-ms 4c5b8fb
update
suxi-ms 08d607c
remove white space
suxi-ms c688233
refine internal logic
suxi-ms 98637db
update
suxi-ms 4ff2ed1
update
suxi-ms c22ad50
updated test
suxi-ms ea7ddbe
update score
suxi-ms 547aef2
update variable name
suxi-ms 8d17c3c
add some comments
suxi-ms 66b614a
refine internal function
suxi-ms 12e7e18
handle for infinity and nan
suxi-ms e213615
rename the algorithm by removing DT
suxi-ms 30915cd
Update src/Microsoft.ML.TimeSeries/RootCauseAnalyzer.cs
suxi-ms fda4ec7
fix type
suxi-ms 620ef58
add an else branch when delta is negative
suxi-ms ae5722f
Merge branch 'master' of https://github.com/suxi-ms/machinelearning
suxi-ms 7f89fea
update model signature
suxi-ms 42dcbc2
update rca interface by removing transformer
suxi-ms 9893fad
add more documents
suxi-ms c831e43
update
suxi-ms 16f5b33
update
suxi-ms 9cd8739
update the constructor
suxi-ms f80c200
update comments
suxi-ms 7c1c348
fix typo
suxi-ms File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
47 changes: 47 additions & 0 deletions
47
docs/api-reference/time-series-root-cause-localization-dt.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
At Mircosoft, we develop a decision tree based root cause localization method which helps to find out the root causes for an anomaly incident incrementally. | ||
|
||
## Multi-Dimensional Root Cause Localization | ||
It's a common case that one measure are collected with many dimensions (*e.g.*, Province, ISP) whose values are categorical(*e.g.*, Beijing or Shanghai for dimension Province). When a measure's value deviates from its expected value, this measure encounters anomalies. In such case, operators would like to localize the root cause dimension combinations rapidly and accurately. Multi-dimensional root cause localization is critical to troubleshoot and mitigate such case. | ||
|
||
## Algorithm | ||
|
||
The decision based root cause localization method is unsupervised, which means training step is no needed. It consists of the following major steps: | ||
(1) Find best dimension which divides the anomaly and unanomaly data based on decision tree according to entropy gain and entropy gain ratio. | ||
(2) Find the top anomaly points for the selected best dimension. | ||
|
||
### Decision Tree | ||
|
||
[Decision tree](https://en.wikipedia.org/wiki/Decision_tree) algorithm chooses the highest information gain to split or construct a decision tree. We use it to choose the dimension which contributes the most to the anomaly. Following are some concepts used in decision tree. | ||
|
||
#### Information Entropy | ||
|
||
Information [entropy](https://en.wikipedia.org/wiki/Entropy_(information_theory)) is a measure of disorder or uncertainty. You can think of it as a measure of purity as well.The less the value , the more pure of data D. | ||
|
||
$$Ent(D) = - \sum_{k=1}^{|y|} p_k\log_2(p_k) $$ | ||
|
||
where $p_k$ represents the probability of an element in dataset. In our case, there are only two classed, the anomaly points and the normaly points. $|y|$ is the count of total anomalies. | ||
|
||
#### Information Gain | ||
[Information gain](https://en.wikipedia.org/wiki/Information_gain_in_decision_trees) is a metric to measure the reduction of this disorder in our target class given additional information about it. Mathematically it can be written as: | ||
|
||
$$Gain(D, a) = Ent(D) - \sum_{v=1}^{|V|} \frac{|D^V|}{|D |} Ent(D^v) $$ | ||
|
||
Where $Ent(D^v)$ is the entropy of set points in D for which dimension $a$ is equal to $v$, $|D|$ is the total number of points in dataset $D$. $|D^V|$ is the total number of points in dataset $D$ for which dimension $a$ is equal to $v$. | ||
|
||
For all aggregated dimensions, we calculate the information for each dimension. The greater the reduction in this uncertainty, the more information is gained about D from dimension $a$. | ||
|
||
#### Entropy Gain Ratio | ||
|
||
Information gain is biased toward variables with large number of distinct values. A modification is [information gain ratio](https://en.wikipedia.org/wiki/Information_gain_ratio), which reduces its bias. | ||
|
||
$$Ratio(D, a) = \frac{Gain(D,a)} {IV(a)} $$ | ||
|
||
where intrinsic value(IV) is the entropy of split (with respect to dimension $a$ on focus). | ||
|
||
$$IV(a) = -\sum_{v=1}^V\frac{|D^v|} {|D|} \log_2 \frac{|D^v|} {|D|} $$ | ||
|
||
In out strategy, firstly, for all the aggration dimensions, we loop all the dimensions to find the dimension who's entropy gain is above mean entropy gain ration, then from the filtered dimensions, we select the dimension with highest entropy ratio as the best dimension. In the meanwhile, dimensions for which the anomaly value count is only one, we include it when calculation. | ||
|
||
> [!Note] | ||
> 1. As our algorithm depends on the data you input, so if the input points is incorrect or incomplete, the calculated result will be unexpected. | ||
> 2. Currently, the algorithm localize the root cause incrementally, which means at most one dimension with the values are detected. If you want to find out all the dimension that contributes to the anomaly, you can call this API recursively. |
159 changes: 159 additions & 0 deletions
159
docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/TimeSeries/LocalizeRootCauseByDT.cs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
using System; | ||
using System.Collections.Generic; | ||
using Microsoft.ML; | ||
using Microsoft.ML.TimeSeries; | ||
|
||
namespace Samples.Dynamic | ||
{ | ||
public static class LocalizeRootCauseByDT | ||
{ | ||
private static string AGG_SYMBOL = "##SUM##"; | ||
public static void Example() | ||
{ | ||
// Create a new ML context, for ML.NET operations. It can be used for | ||
// exception tracking and logging, as well as the source of randomness. | ||
var mlContext = new MLContext(); | ||
|
||
// Create an empty list as the dataset. The 'DTRootCauseLocalization' API does not | ||
// require training data as the estimator ('DTRootCauseLocalizationEstimator') | ||
// created by 'DTRootCauseLocalization' API is not a trainable estimator. The | ||
// empty list is only needed to pass input schema to the pipeline. | ||
var emptySamples = new List<RootCauseLocalizationData>(); | ||
|
||
// Convert sample list to an empty IDataView. | ||
var emptyDataView = mlContext.Data.LoadFromEnumerable(emptySamples); | ||
|
||
// A pipeline for localizeing root cause. | ||
var localizePipeline = mlContext.Transforms.LocalizeRootCauseByDT(nameof(RootCauseLocalizationTransformedData.RootCause), nameof(RootCauseLocalizationData.Input)); | ||
|
||
// Fit to data. | ||
var localizeTransformer = localizePipeline.Fit(emptyDataView); | ||
|
||
// Create the prediction engine to get the root cause result from the input data. | ||
var predictionEngine = mlContext.Model.CreatePredictionEngine<RootCauseLocalizationData, | ||
RootCauseLocalizationTransformedData>(localizeTransformer); | ||
|
||
// Call the prediction API. | ||
DateTime timestamp = GetTimestamp(); | ||
var data = new RootCauseLocalizationData(timestamp, GetAnomalyDimension(), new List<MetricSlice>() { new MetricSlice(timestamp, GetPoints()) }, AggregateType.Sum, AGG_SYMBOL); | ||
|
||
var prediction = predictionEngine.Predict(data); | ||
|
||
// Print the localization result. | ||
int count = 0; | ||
foreach (RootCauseItem item in prediction.RootCause.Items) | ||
{ | ||
count++; | ||
Console.WriteLine($"Root cause item #{count} ..."); | ||
Console.WriteLine($"Score: {item.Score}, Path: {String.Join(" ",item.Path)}, Direction: {item.Direction}, Dimension:{String.Join(" ", item.Dimension)}"); | ||
} | ||
|
||
//Item #1 ... | ||
//Score: 1, Path: DataCenter, Direction: Up, Dimension:[Country, UK] [DeviceType, ##SUM##] [DataCenter, DC1] | ||
} | ||
|
||
private static List<Point> GetPoints() | ||
{ | ||
List<Point> points = new List<Point>(); | ||
|
||
Dictionary<string, Object> dic1 = new Dictionary<string, Object>(); | ||
dic1.Add("Country", "UK"); | ||
dic1.Add("DeviceType", "Laptop"); | ||
dic1.Add("DataCenter", "DC1"); | ||
points.Add(new Point(200, 100, true, dic1)); | ||
|
||
Dictionary<string, Object> dic2 = new Dictionary<string, Object>(); | ||
dic2.Add("Country", "UK"); | ||
dic2.Add("DeviceType", "Mobile"); | ||
dic2.Add("DataCenter", "DC1"); | ||
points.Add(new Point(1000, 100, true, dic2)); | ||
|
||
Dictionary<string, Object> dic3 = new Dictionary<string, Object>(); | ||
dic3.Add("Country", "UK"); | ||
dic3.Add("DeviceType", AGG_SYMBOL); | ||
dic3.Add("DataCenter", "DC1"); | ||
points.Add(new Point(1200, 200, true, dic3)); | ||
|
||
Dictionary<string, Object> dic4 = new Dictionary<string, Object>(); | ||
dic4.Add("Country", "UK"); | ||
dic4.Add("DeviceType", "Laptop"); | ||
dic4.Add("DataCenter", "DC2"); | ||
points.Add(new Point(100, 100, false, dic4)); | ||
|
||
Dictionary<string, Object> dic5 = new Dictionary<string, Object>(); | ||
dic5.Add("Country", "UK"); | ||
dic5.Add("DeviceType", "Mobile"); | ||
dic5.Add("DataCenter", "DC2"); | ||
points.Add(new Point(200, 200, false, dic5)); | ||
|
||
Dictionary<string, Object> dic6 = new Dictionary<string, Object>(); | ||
dic6.Add("Country", "UK"); | ||
dic6.Add("DeviceType", AGG_SYMBOL); | ||
dic6.Add("DataCenter", "DC2"); | ||
points.Add(new Point(300, 300, false, dic6)); | ||
|
||
Dictionary<string, Object> dic7 = new Dictionary<string, Object>(); | ||
dic7.Add("Country", "UK"); | ||
dic7.Add("DeviceType", AGG_SYMBOL); | ||
dic7.Add("DataCenter", AGG_SYMBOL); | ||
points.Add(new Point(1500, 500, true, dic7)); | ||
|
||
Dictionary<string, Object> dic8 = new Dictionary<string, Object>(); | ||
dic8.Add("Country", "UK"); | ||
dic8.Add("DeviceType", "Laptop"); | ||
dic8.Add("DataCenter", AGG_SYMBOL); | ||
points.Add(new Point(300, 200, true, dic8)); | ||
|
||
Dictionary<string, Object> dic9 = new Dictionary<string, Object>(); | ||
dic9.Add("Country", "UK"); | ||
dic9.Add("DeviceType", "Mobile"); | ||
dic9.Add("DataCenter", AGG_SYMBOL); | ||
points.Add(new Point(1200, 300, true, dic9)); | ||
|
||
return points; | ||
} | ||
|
||
private static Dictionary<string, Object> GetAnomalyDimension() | ||
{ | ||
Dictionary<string, Object> dim = new Dictionary<string, Object>(); | ||
dim.Add("Country", "UK"); | ||
dim.Add("DeviceType", AGG_SYMBOL); | ||
dim.Add("DataCenter", AGG_SYMBOL); | ||
|
||
return dim; | ||
} | ||
|
||
private static DateTime GetTimestamp() | ||
{ | ||
return new DateTime(2020, 3, 23, 0, 0, 0); | ||
} | ||
|
||
private class RootCauseLocalizationData | ||
{ | ||
[RootCauseLocalizationInputType] | ||
public RootCauseLocalizationInput Input { get; set; } | ||
|
||
public RootCauseLocalizationData() | ||
{ | ||
Input = null; | ||
} | ||
|
||
public RootCauseLocalizationData(DateTime anomalyTimestamp, Dictionary<string, Object> anomalyDimensions, List<MetricSlice> slices, AggregateType aggregateType, string aggregateSymbol) | ||
{ | ||
Input = new RootCauseLocalizationInput(anomalyTimestamp, anomalyDimensions, slices, aggregateType, | ||
aggregateSymbol); | ||
} | ||
} | ||
|
||
private class RootCauseLocalizationTransformedData | ||
{ | ||
[RootCauseType()] | ||
public RootCause RootCause { get; set; } | ||
|
||
public RootCauseLocalizationTransformedData() | ||
{ | ||
RootCause = null; | ||
} | ||
} | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: localizing #Resolved
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated #Resolved