Skip to content

Commit 2ed30fd

Browse files
talborenGlebBerjoskinshaharglVladimirFilonovMatvey-Kuk
authored
feat: incidents (#1388)
Signed-off-by: Tal <[email protected]> Signed-off-by: Matvey Kukuy <[email protected]> Signed-off-by: Vladimir Filonov <[email protected]> Co-authored-by: GlebBerjoskin <[email protected]> Co-authored-by: Shahar Glazner <[email protected]> Co-authored-by: Vladimir Filonov <[email protected]> Co-authored-by: Matvey Kukuy <[email protected]> Co-authored-by: Matvey Kukuy <[email protected]>
1 parent ce96e1f commit 2ed30fd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+3625
-224
lines changed

.github/workflows/test-pr-e2e.yml

+4
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@ on:
88
- 'keep-ui/**'
99
- 'tests/**'
1010

11+
concurrency:
12+
group: ${{ github.workflow }}-${{ github.head_ref }}
13+
cancel-in-progress: true
14+
1115
env:
1216
PYTHON_VERSION: 3.11
1317
STORAGE_MANAGER_DIRECTORY: /tmp/storage-manager

.github/workflows/test-pr.yml

+3
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@ on:
66
pull_request:
77
paths:
88
- 'keep/**'
9+
concurrency:
10+
group: ${{ github.workflow }}-${{ github.head_ref }}
11+
cancel-in-progress: true
912
# MySQL server and Elasticsearch for testing
1013
env:
1114
PYTHON_VERSION: 3.11

LICENSE

+5-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
1-
MIT License
2-
31
Copyright (c) 2024 Keep
42

3+
Portions of this software are licensed as follows:
4+
5+
* All content that resides under the "ee/" directory of this repository, if that directory exists, is licensed under the license defined in "ee/LICENSE".
6+
* Content outside of the above mentioned directories or restrictions above is available under the "MIT" license as defined below.
7+
58
Permission is hereby granted, free of charge, to any person obtaining a copy
69
of this software and associated documentation files (the "Software"), to deal
710
in the Software without restriction, including without limitation the rights

docker/Dockerfile.dev.api

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,4 @@ ENV PATH="/venv/bin:${PATH}"
2323
ENV VIRTUAL_ENV="/venv"
2424

2525

26-
ENTRYPOINT ["gunicorn", "keep.api.api:get_app", "--bind" , "0.0.0.0:8080" , "--workers", "1" , "-k" , "uvicorn.workers.UvicornWorker", "-c", "./keep/api/config.py", "--reload"]
26+
CMD ["gunicorn", "keep.api.api:get_app", "--bind" , "0.0.0.0:8080" , "--workers", "1" , "-k" , "uvicorn.workers.UvicornWorker", "-c", "./keep/api/config.py", "--reload"]

ee/LICENSE

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
The Keep Enterprise Edition (EE) license (the Enterprise License)
2+
Copyright (c) 2024-present Keep Alerting LTD
3+
4+
With regard to the Keep Software:
5+
6+
This software and associated documentation files (the "Software") may only be
7+
used in production, if you (and any entity that you represent) have agreed to,
8+
and are in compliance with, the Keep Subscription Terms of Service, available
9+
(if not available, it's impossible to comply)
10+
at https://www.keephq.dev/terms-of-service (the "The Enterprise Terms”), or other
11+
agreement governing the use of the Software, as agreed by you and Keep,
12+
and otherwise have a valid Keep Enterprise Edition subscription for the
13+
correct number of user seats. Subject to the foregoing sentence, you are free to
14+
modify this Software and publish patches to the Software. You agree that Keep
15+
and/or its licensors (as applicable) retain all right, title and interest in and
16+
to all such modifications and/or patches, and all such modifications and/or
17+
patches may only be used, copied, modified, displayed, distributed, or otherwise
18+
exploited with a valid Keep Enterprise Edition subscription for the correct
19+
number of user seats. You agree that Keep and/or its licensors (as applicable) retain
20+
all right, title and interest in and to all such modifications. You are not
21+
granted any other rights beyond what is expressly stated herein. Subject to the
22+
foregoing, it is forbidden to copy, merge, publish, distribute, sublicense,
23+
and/or sell the Software.
24+
25+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
26+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
27+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
28+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
29+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
30+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
31+
SOFTWARE.
32+
33+
For all third party components incorporated into the Keep Software, those
34+
components are licensed under the original license provided by the owner of the
35+
applicable component.

ee/experimental/__init__.py

Whitespace-only changes.

ee/experimental/incident_utils.py

+148
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
import numpy as np
2+
import pandas as pd
3+
import networkx as nx
4+
5+
from typing import List
6+
7+
from keep.api.models.db.alert import Alert
8+
9+
10+
def mine_incidents(alerts: List[Alert], incident_sliding_window_size: int=6*24*60*60, statistic_sliding_window_size: int=60*60,
11+
jaccard_threshold: float=0.0, fingerprint_threshold: int=1):
12+
"""
13+
Mine incidents from alerts.
14+
"""
15+
16+
alert_dict = {
17+
'fingerprint': [alert.fingerprint for alert in alerts],
18+
'timestamp': [alert.timestamp for alert in alerts],
19+
}
20+
alert_df = pd.DataFrame(alert_dict)
21+
mined_incidents = shape_incidents(alert_df, 'fingerprint', incident_sliding_window_size, statistic_sliding_window_size,
22+
jaccard_threshold, fingerprint_threshold)
23+
24+
return [
25+
{
26+
"incident_fingerprint": incident['incident_fingerprint'],
27+
"alerts": [alert for alert in alerts if alert.fingerprint in incident['alert_fingerprints']],
28+
}
29+
for incident in mined_incidents
30+
]
31+
32+
33+
def get_batched_alert_counts(alerts: pd.DataFrame, unique_alert_identifier: str, sliding_window_size: int) -> np.ndarray:
34+
"""
35+
Get the number of alerts in a sliding window.
36+
"""
37+
38+
resampled_alert_counts = alerts.set_index('timestamp').resample(
39+
f'{sliding_window_size//2}s')[unique_alert_identifier].value_counts().unstack(fill_value=0)
40+
rolling_counts = resampled_alert_counts.rolling(
41+
window=f'{sliding_window_size}s', min_periods=1).sum()
42+
alert_counts = rolling_counts.to_numpy()
43+
44+
return alert_counts
45+
46+
47+
def get_batched_alert_occurrences(alerts: pd.DataFrame, unique_alert_identifier: str, sliding_window_size: int) -> np.ndarray:
48+
"""
49+
Get the occurrence of alerts in a sliding window.
50+
"""
51+
52+
alert_counts = get_batched_alert_counts(
53+
alerts, unique_alert_identifier, sliding_window_size)
54+
alert_occurences = np.where(alert_counts > 0, 1, 0)
55+
56+
return alert_occurences
57+
58+
59+
def get_jaccard_scores(P_a: np.ndarray, P_aa: np.ndarray) -> np.ndarray:
60+
"""
61+
Calculate the Jaccard similarity scores between alerts.
62+
"""
63+
64+
P_a_matrix = P_a[:, None] + P_a
65+
union_matrix = P_a_matrix - P_aa
66+
67+
with np.errstate(divide='ignore', invalid='ignore'):
68+
jaccard_matrix = np.where(union_matrix != 0, P_aa / union_matrix, 0)
69+
70+
np.fill_diagonal(jaccard_matrix, 1)
71+
72+
return jaccard_matrix
73+
74+
75+
def get_alert_jaccard_matrix(alerts: pd.DataFrame, unique_alert_identifier: str, sliding_window_size: int) -> np.ndarray:
76+
"""
77+
Calculate the Jaccard similarity scores between alerts.
78+
"""
79+
80+
alert_occurrences = get_batched_alert_occurrences(
81+
alerts, unique_alert_identifier, sliding_window_size)
82+
alert_probabilities = np.mean(alert_occurrences, axis=0)
83+
joint_alert_occurrences = np.dot(alert_occurrences.T, alert_occurrences)
84+
pairwise_alert_probabilities = joint_alert_occurrences / \
85+
alert_occurrences.shape[0]
86+
87+
return get_jaccard_scores(alert_probabilities, pairwise_alert_probabilities)
88+
89+
90+
def build_graph_from_occurrence(occurrence_row: pd.DataFrame, jaccard_matrix: np.ndarray, unique_alert_identifiers: List[str],
91+
jaccard_threshold: float = 0.05) -> nx.Graph:
92+
"""
93+
Build a weighted graph using alert occurrence matrix and Jaccard coefficients.
94+
"""
95+
96+
present_indices = np.where(occurrence_row > 0)[0]
97+
98+
G = nx.Graph()
99+
100+
for idx in present_indices:
101+
alert_desc = unique_alert_identifiers[idx]
102+
G.add_node(alert_desc)
103+
104+
for i in present_indices:
105+
for j in present_indices:
106+
if i != j and jaccard_matrix[i, j] >= jaccard_threshold:
107+
alert_i = unique_alert_identifiers[i]
108+
alert_j = unique_alert_identifiers[j]
109+
G.add_edge(alert_i, alert_j, weight=jaccard_matrix[i, j])
110+
111+
return G
112+
113+
def shape_incidents(alerts: pd.DataFrame, unique_alert_identifier: str, incident_sliding_window_size: int, statistic_sliding_window_size: int,
114+
jaccard_threshold: float = 0.2, fingerprint_threshold: int = 5) -> List[dict]:
115+
"""
116+
Shape incidents from alerts.
117+
"""
118+
119+
incidents = []
120+
incident_number = 0
121+
122+
resampled_alert_counts = alerts.set_index('timestamp').resample(
123+
f'{incident_sliding_window_size//2}s')[unique_alert_identifier].value_counts().unstack(fill_value=0)
124+
jaccard_matrix = get_alert_jaccard_matrix(
125+
alerts, unique_alert_identifier, statistic_sliding_window_size)
126+
127+
for idx in range(resampled_alert_counts.shape[0]):
128+
graph = build_graph_from_occurrence(
129+
resampled_alert_counts.iloc[idx], jaccard_matrix, resampled_alert_counts.columns, jaccard_threshold=jaccard_threshold)
130+
max_component = max(nx.connected_components(graph), key=len)
131+
132+
min_starts_at = resampled_alert_counts.index[idx]
133+
max_starts_at = min_starts_at + \
134+
pd.Timedelta(seconds=incident_sliding_window_size)
135+
136+
local_alerts = alerts[(alerts['timestamp'] >= min_starts_at) & (
137+
alerts['timestamp'] <= max_starts_at)]
138+
local_alerts = local_alerts[local_alerts[unique_alert_identifier].isin(
139+
max_component)]
140+
141+
if len(max_component) > fingerprint_threshold:
142+
143+
incidents.append({
144+
'incident_fingerprint': f'Incident #{incident_number}',
145+
'alert_fingerprints': local_alerts[unique_alert_identifier].unique().tolist(),
146+
})
147+
148+
return incidents

keep-ui/app/ai/ai.tsx

+149
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
"use client";
2+
import { Card, List, ListItem, Title, Subtitle } from "@tremor/react";
3+
import { useAIStats } from "utils/hooks/useAIStats";
4+
import { useSession } from "next-auth/react";
5+
import { getApiURL } from "utils/apiUrl";
6+
import { toast } from "react-toastify";
7+
import { useEffect, useState, useRef, FormEvent } from "react";
8+
9+
export default function Ai() {
10+
const { data: aistats, isLoading } = useAIStats();
11+
const { data: session } = useSession();
12+
const [text, setText] = useState("");
13+
const [newText, setNewText] = useState("Mine incidents");
14+
const [animate, setAnimate] = useState(false);
15+
const onlyOnce = useRef(false);
16+
17+
useEffect(() => {
18+
let index = 0;
19+
20+
const interval = setInterval(() => {
21+
setText(newText.slice(0, index + 1));
22+
index++;
23+
24+
if (index === newText.length) {
25+
clearInterval(interval);
26+
}
27+
}, 100);
28+
29+
return () => {
30+
clearInterval(interval);
31+
};
32+
}, [newText]);
33+
34+
const mineIncidents = async (e: FormEvent) => {
35+
e.preventDefault();
36+
setAnimate(true);
37+
setNewText("Mining 🚀🚀🚀 ...");
38+
const apiUrl = getApiURL();
39+
const response = await fetch(`${apiUrl}/incidents/mine`, {
40+
method: "POST",
41+
headers: {
42+
Authorization: `Bearer ${session?.accessToken}`,
43+
"Content-Type": "application/json",
44+
},
45+
body: JSON.stringify({
46+
}),
47+
});
48+
if (!response.ok) {
49+
toast.error(
50+
"Failed to mine incidents, please contact us if this issue persists."
51+
);
52+
}
53+
setAnimate(false);
54+
setNewText("Mine incidents");
55+
};
56+
57+
return (
58+
<main className="p-4 md:p-10 mx-auto max-w-full">
59+
<div className="flex justify-between items-center">
60+
<div>
61+
<Title>AI Correlation</Title>
62+
<Subtitle>
63+
Correlating alerts to incidents based on past alerts, incidents, and
64+
the other data.
65+
</Subtitle>
66+
</div>
67+
</div>
68+
<Card className="mt-10 p-4 md:p-10 mx-auto">
69+
<div>
70+
<div className="prose-2xl">👋 You are almost there!</div>
71+
AI Correlation is coming soon. Make sure you have enough data collected to prepare.
72+
<div className="max-w-md mt-10 flex justify-items-start justify-start">
73+
<List>
74+
<ListItem>
75+
<span>
76+
Connect an incident source to dump incidents, or create 10
77+
incidents manually
78+
</span>
79+
<span>
80+
{aistats?.incidents_count &&
81+
aistats?.incidents_count >= 10 ? (
82+
<div></div>
83+
) : (
84+
<div></div>
85+
)}
86+
</span>
87+
</ListItem>
88+
<ListItem>
89+
<span>Collect 100 alerts</span>
90+
<span>
91+
{aistats?.alerts_count && aistats?.alerts_count >= 100 ? (
92+
<div></div>
93+
) : (
94+
<div></div>
95+
)}
96+
</span>
97+
</ListItem>
98+
<ListItem>
99+
<span>Collect alerts for more than 3 days</span>
100+
<span>
101+
{aistats?.first_alert_datetime && new Date(aistats.first_alert_datetime) < new Date(Date.now() - 3 * 24 * 60 * 60 * 1000) ? (
102+
<div></div>
103+
) : (
104+
<div></div>
105+
)}
106+
</span>
107+
</ListItem>
108+
</List>
109+
</div>
110+
{(aistats?.is_mining_enabled && <button
111+
className={
112+
(animate && "animate-pulse") +
113+
" w-full text-white mt-10 pt-2 pb-2 pr-2 rounded-xl transition-all duration-500 bg-gradient-to-tl from-amber-800 via-amber-600 to-amber-400 bg-size-200 bg-pos-0 hover:bg-pos-100"
114+
}
115+
onClick={ mineIncidents }
116+
><div className="flex flex-row p-2">
117+
<div className="p-2">
118+
{animate && <svg
119+
className="animate-spin h-6 w-6 text-white"
120+
xmlns="http://www.w3.org/2000/svg"
121+
fill="none"
122+
viewBox="0 0 24 24"
123+
>
124+
<circle
125+
className="opacity-25"
126+
cx="12"
127+
cy="12"
128+
r="10"
129+
stroke="currentColor"
130+
stroke-width="4"
131+
></circle>
132+
<path
133+
className="opacity-75"
134+
fill="currentColor"
135+
d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"
136+
></path>
137+
</svg>}
138+
{!animate && <svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" strokeWidth={1.5} stroke="currentColor" className="w-6 h-6">
139+
<path strokeLinecap="round" strokeLinejoin="round" d="M4.26 10.147a60.438 60.438 0 0 0-.491 6.347A48.62 48.62 0 0 1 12 20.904a48.62 48.62 0 0 1 8.232-4.41 60.46 60.46 0 0 0-.491-6.347m-15.482 0a50.636 50.636 0 0 0-2.658-.813A59.906 59.906 0 0 1 12 3.493a59.903 59.903 0 0 1 10.399 5.84c-.896.248-1.783.52-2.658.814m-15.482 0A50.717 50.717 0 0 1 12 13.489a50.702 50.702 0 0 1 7.74-3.342M6.75 15a.75.75 0 1 0 0-1.5.75.75 0 0 0 0 1.5Zm0 0v-3.675A55.378 55.378 0 0 1 12 8.443m-7.007 11.55A5.981 5.981 0 0 0 6.75 15.75v-1.5" />
140+
</svg>}
141+
</div>
142+
<div className="pt-2">{text}</div>
143+
</div>
144+
</button>)}
145+
</div>
146+
</Card>
147+
</main>
148+
);
149+
}

keep-ui/app/ai/model.ts

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
export interface AIStats {
2+
alerts_count: number;
3+
incidents_count: number;
4+
first_alert_datetime?: Date;
5+
is_mining_enabled: boolean;
6+
}

keep-ui/app/ai/page.tsx

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
import AI from "./ai";
2+
3+
export default function Page() {
4+
return <AI />;
5+
}
6+
7+
export const metadata = {
8+
title: "Keep - AI Correlation",
9+
description:
10+
"Correlate Alerts and Incidents with AI to identify patterns and trends.",
11+
};

0 commit comments

Comments
 (0)