@@ -188,20 +188,15 @@ or flight stats for any of the featured destination or origin airports.
188
188
189
189
190
190
[[example-clientips]]
191
- ==== Finding suspicious client IPs by using scripted metrics
191
+ ==== Finding suspicious client IPs
192
192
193
- With {transforms}, you can use
194
- {ref}/search-aggregations-metrics-scripted-metric-aggregation.html[scripted
195
- metric aggregations] on your data. These aggregations are flexible and make
196
- it possible to perform very complex processing. Let's use scripted metrics to
197
- identify suspicious client IPs in the web log sample dataset.
198
-
199
- We transform the data such that the new index contains the sum of bytes and the
200
- number of distinct URLs, agents, incoming requests by location, and geographic
201
- destinations for each client IP. We also use a scripted field to count the
202
- specific types of HTTP responses that each client IP receives. Ultimately, the
203
- example below transforms web log data into an entity centric index where the
204
- entity is `clientip`.
193
+ In this example, we use the web log sample dataset to identify suspicious client
194
+ IPs. We transform the data such that the new index contains the sum of bytes and
195
+ the number of distinct URLs, agents, incoming requests by location, and
196
+ geographic destinations for each client IP. We also use filter aggregations to
197
+ count the specific types of HTTP responses that each client IP receives.
198
+ Ultimately, the example below transforms web log data into an entity centric
199
+ index where the entity is `clientip`.
205
200
206
201
[source,console]
207
202
----------------------------------
@@ -230,30 +225,17 @@ PUT _transform/suspicious_client_ips
230
225
"agent_dc": { "cardinality": { "field": "agent.keyword" }},
231
226
"geo.dest_dc": { "cardinality": { "field": "geo.dest" }},
232
227
"responses.total": { "value_count": { "field": "timestamp" }},
233
- "responses.counts": { <4>
234
- "scripted_metric": {
235
- "init_script": "state.responses = ['error':0L,'success':0L,'other':0L]",
236
- "map_script": """
237
- def code = doc['response.keyword'].value;
238
- if (code.startsWith('5') || code.startsWith('4')) {
239
- state.responses.error += 1 ;
240
- } else if(code.startsWith('2')) {
241
- state.responses.success += 1;
242
- } else {
243
- state.responses.other += 1;
244
- }
245
- """,
246
- "combine_script": "state.responses",
247
- "reduce_script": """
248
- def counts = ['error': 0L, 'success': 0L, 'other': 0L];
249
- for (responses in states) {
250
- counts.error += responses['error'];
251
- counts.success += responses['success'];
252
- counts.other += responses['other'];
253
- }
254
- return counts;
255
- """
256
- }
228
+ "success" : { <4>
229
+ "filter": {
230
+ "term": { "response" : "200"}}
231
+ },
232
+ "error404" : {
233
+ "filter": {
234
+ "term": { "response" : "404"}}
235
+ },
236
+ "error503" : {
237
+ "filter": {
238
+ "term": { "response" : "503"}}
257
239
},
258
240
"timestamp.min": { "min": { "field": "timestamp" }},
259
241
"timestamp.max": { "max": { "field": "timestamp" }},
@@ -277,11 +259,13 @@ PUT _transform/suspicious_client_ips
277
259
to synchronize the source and destination indices. The worst case
278
260
ingestion delay is 60 seconds.
279
261
<3> The data is grouped by the `clientip` field.
280
- <4> This `scripted_metric` performs a distributed operation on the web log data
281
- to count specific types of HTTP responses (error, success, and other).
262
+ <4> Filter aggregation that counts the occurrences of successful (`200`)
263
+ responses in the `response` field. The following two aggregations (`error404`
264
+ and `error503`) count the error responses by error codes.
282
265
<5> This `bucket_script` calculates the duration of the `clientip` access based
283
266
on the results of the aggregation.
284
267
268
+
285
269
After you create the {transform}, you must start it:
286
270
287
271
[source,console]
@@ -290,6 +274,7 @@ POST _transform/suspicious_client_ips/_start
290
274
----------------------------------
291
275
// TEST[skip:setup kibana sample data]
292
276
277
+
293
278
Shortly thereafter, the first results should be available in the destination
294
279
index:
295
280
@@ -299,6 +284,7 @@ GET sample_weblogs_by_clientip/_search
299
284
----------------------------------
300
285
// TEST[skip:setup kibana sample data]
301
286
287
+
302
288
The search result shows you data like this for each client IP:
303
289
304
290
[source,js]
@@ -313,22 +299,20 @@ The search result shows you data like this for each client IP:
313
299
"src_dc" : 2.0,
314
300
"dest_dc" : 2.0
315
301
},
302
+ "success" : 2,
303
+ "error404" : 0,
304
+ "error503" : 0,
316
305
"clientip" : "0.72.176.46",
317
306
"agent_dc" : 2.0,
318
307
"bytes_sum" : 4422.0,
319
308
"responses" : {
320
- "total" : 2.0,
321
- "counts" : {
322
- "other" : 0,
323
- "success" : 2,
324
- "error" : 0
325
- }
309
+ "total" : 2.0
326
310
},
327
311
"url_dc" : 2.0,
328
312
"timestamp" : {
329
313
"duration_ms" : 5.2191698E8,
330
- "min" : "2019-11-25T07 :51:57.333Z",
331
- "max" : "2019-12-01T08 :50:34.313Z"
314
+ "min" : "2020-03-16T07 :51:57.333Z",
315
+ "max" : "2020-03-22T08 :50:34.313Z"
332
316
}
333
317
}
334
318
}
@@ -337,11 +321,12 @@ The search result shows you data like this for each client IP:
337
321
// NOTCONSOLE
338
322
339
323
NOTE: Like other Kibana sample data sets, the web log sample dataset contains
340
- timestamps relative to when you installed it, including timestamps in the future.
341
- The {ctransform} will pick up the data points once they are in the past. If you
342
- installed the web log sample dataset some time ago, you can uninstall and
324
+ timestamps relative to when you installed it, including timestamps in the
325
+ future. The {ctransform} will pick up the data points once they are in the past.
326
+ If you installed the web log sample dataset some time ago, you can uninstall and
343
327
reinstall it and the timestamps will change.
344
328
329
+
345
330
This {transform} makes it easier to answer questions such as:
346
331
347
332
* Which client IPs are transferring the most amounts of data?
0 commit comments