Skip to content

Commit 00eaa0e

Browse files
committed
[DOCS] Changes scripted metric to filter aggs in transforms example (#54167)
1 parent 6e025c1 commit 00eaa0e

File tree

1 file changed

+35
-50
lines changed

1 file changed

+35
-50
lines changed

docs/reference/transform/examples.asciidoc

+35-50
Original file line numberDiff line numberDiff line change
@@ -188,20 +188,15 @@ or flight stats for any of the featured destination or origin airports.
188188

189189

190190
[[example-clientips]]
191-
==== Finding suspicious client IPs by using scripted metrics
191+
==== Finding suspicious client IPs
192192

193-
With {transforms}, you can use
194-
{ref}/search-aggregations-metrics-scripted-metric-aggregation.html[scripted
195-
metric aggregations] on your data. These aggregations are flexible and make
196-
it possible to perform very complex processing. Let's use scripted metrics to
197-
identify suspicious client IPs in the web log sample dataset.
198-
199-
We transform the data such that the new index contains the sum of bytes and the
200-
number of distinct URLs, agents, incoming requests by location, and geographic
201-
destinations for each client IP. We also use a scripted field to count the
202-
specific types of HTTP responses that each client IP receives. Ultimately, the
203-
example below transforms web log data into an entity centric index where the
204-
entity is `clientip`.
193+
In this example, we use the web log sample dataset to identify suspicious client
194+
IPs. We transform the data such that the new index contains the sum of bytes and
195+
the number of distinct URLs, agents, incoming requests by location, and
196+
geographic destinations for each client IP. We also use filter aggregations to
197+
count the specific types of HTTP responses that each client IP receives.
198+
Ultimately, the example below transforms web log data into an entity centric
199+
index where the entity is `clientip`.
205200

206201
[source,console]
207202
----------------------------------
@@ -230,30 +225,17 @@ PUT _transform/suspicious_client_ips
230225
"agent_dc": { "cardinality": { "field": "agent.keyword" }},
231226
"geo.dest_dc": { "cardinality": { "field": "geo.dest" }},
232227
"responses.total": { "value_count": { "field": "timestamp" }},
233-
"responses.counts": { <4>
234-
"scripted_metric": {
235-
"init_script": "state.responses = ['error':0L,'success':0L,'other':0L]",
236-
"map_script": """
237-
def code = doc['response.keyword'].value;
238-
if (code.startsWith('5') || code.startsWith('4')) {
239-
state.responses.error += 1 ;
240-
} else if(code.startsWith('2')) {
241-
state.responses.success += 1;
242-
} else {
243-
state.responses.other += 1;
244-
}
245-
""",
246-
"combine_script": "state.responses",
247-
"reduce_script": """
248-
def counts = ['error': 0L, 'success': 0L, 'other': 0L];
249-
for (responses in states) {
250-
counts.error += responses['error'];
251-
counts.success += responses['success'];
252-
counts.other += responses['other'];
253-
}
254-
return counts;
255-
"""
256-
}
228+
"success" : { <4>
229+
"filter": {
230+
"term": { "response" : "200"}}
231+
},
232+
"error404" : {
233+
"filter": {
234+
"term": { "response" : "404"}}
235+
},
236+
"error503" : {
237+
"filter": {
238+
"term": { "response" : "503"}}
257239
},
258240
"timestamp.min": { "min": { "field": "timestamp" }},
259241
"timestamp.max": { "max": { "field": "timestamp" }},
@@ -277,11 +259,13 @@ PUT _transform/suspicious_client_ips
277259
to synchronize the source and destination indices. The worst case
278260
ingestion delay is 60 seconds.
279261
<3> The data is grouped by the `clientip` field.
280-
<4> This `scripted_metric` performs a distributed operation on the web log data
281-
to count specific types of HTTP responses (error, success, and other).
262+
<4> Filter aggregation that counts the occurrences of successful (`200`)
263+
responses in the `response` field. The following two aggregations (`error404`
264+
and `error503`) count the error responses by error codes.
282265
<5> This `bucket_script` calculates the duration of the `clientip` access based
283266
on the results of the aggregation.
284267

268+
285269
After you create the {transform}, you must start it:
286270

287271
[source,console]
@@ -290,6 +274,7 @@ POST _transform/suspicious_client_ips/_start
290274
----------------------------------
291275
// TEST[skip:setup kibana sample data]
292276

277+
293278
Shortly thereafter, the first results should be available in the destination
294279
index:
295280

@@ -299,6 +284,7 @@ GET sample_weblogs_by_clientip/_search
299284
----------------------------------
300285
// TEST[skip:setup kibana sample data]
301286

287+
302288
The search result shows you data like this for each client IP:
303289

304290
[source,js]
@@ -313,22 +299,20 @@ The search result shows you data like this for each client IP:
313299
"src_dc" : 2.0,
314300
"dest_dc" : 2.0
315301
},
302+
"success" : 2,
303+
"error404" : 0,
304+
"error503" : 0,
316305
"clientip" : "0.72.176.46",
317306
"agent_dc" : 2.0,
318307
"bytes_sum" : 4422.0,
319308
"responses" : {
320-
"total" : 2.0,
321-
"counts" : {
322-
"other" : 0,
323-
"success" : 2,
324-
"error" : 0
325-
}
309+
"total" : 2.0
326310
},
327311
"url_dc" : 2.0,
328312
"timestamp" : {
329313
"duration_ms" : 5.2191698E8,
330-
"min" : "2019-11-25T07:51:57.333Z",
331-
"max" : "2019-12-01T08:50:34.313Z"
314+
"min" : "2020-03-16T07:51:57.333Z",
315+
"max" : "2020-03-22T08:50:34.313Z"
332316
}
333317
}
334318
}
@@ -337,11 +321,12 @@ The search result shows you data like this for each client IP:
337321
// NOTCONSOLE
338322

339323
NOTE: Like other Kibana sample data sets, the web log sample dataset contains
340-
timestamps relative to when you installed it, including timestamps in the future.
341-
The {ctransform} will pick up the data points once they are in the past. If you
342-
installed the web log sample dataset some time ago, you can uninstall and
324+
timestamps relative to when you installed it, including timestamps in the
325+
future. The {ctransform} will pick up the data points once they are in the past.
326+
If you installed the web log sample dataset some time ago, you can uninstall and
343327
reinstall it and the timestamps will change.
344328

329+
345330
This {transform} makes it easier to answer questions such as:
346331

347332
* Which client IPs are transferring the most amounts of data?

0 commit comments

Comments
 (0)