@@ -54,18 +54,18 @@ POST _transform/_preview
54
54
----------------------------------
55
55
// TEST[skip:setup kibana sample data]
56
56
57
- <1> This is the destination index for the {dataframe }. It is ignored by
57
+ <1> This is the destination index for the {transform }. It is ignored by
58
58
`_preview`.
59
- <2> Two `group_by` fields have been selected. This means the {dataframe } will
60
- contain a unique row per `user` and `customer_id` combination. Within this
61
- dataset both these fields are unique. By including both in the {dataframe } it
59
+ <2> Two `group_by` fields have been selected. This means the {transform } will
60
+ contain a unique row per `user` and `customer_id` combination. Within this
61
+ dataset both these fields are unique. By including both in the {transform } it
62
62
gives more context to the final results.
63
63
64
64
NOTE: In the example above, condensed JSON formatting has been used for easier
65
65
readability of the pivot object.
66
66
67
- The preview {transforms} API enables you to see the layout of the
68
- {dataframe } in advance, populated with some sample values. For example:
67
+ The preview {transforms} API enables you to see the layout of the
68
+ {transform } in advance, populated with some sample values. For example:
69
69
70
70
[source,js]
71
71
----------------------------------
@@ -86,7 +86,7 @@ The preview {transforms} API enables you to see the layout of the
86
86
----------------------------------
87
87
// NOTCONSOLE
88
88
89
- This {dataframe } makes it easier to answer questions such as:
89
+ This {transform } makes it easier to answer questions such as:
90
90
91
91
* Which customers spend the most?
92
92
@@ -154,7 +154,7 @@ POST _transform/_preview
154
154
// TEST[skip:setup kibana sample data]
155
155
156
156
<1> Filter the source data to select only flights that were not cancelled.
157
- <2> This is the destination index for the {dataframe }. It is ignored by
157
+ <2> This is the destination index for the {transform }. It is ignored by
158
158
`_preview`.
159
159
<3> The data is grouped by the `Carrier` field which contains the airline name.
160
160
<4> This `bucket_script` performs calculations on the results that are returned
@@ -181,7 +181,7 @@ carrier:
181
181
----------------------------------
182
182
// NOTCONSOLE
183
183
184
- This {dataframe } makes it easier to answer questions such as:
184
+ This {transform } makes it easier to answer questions such as:
185
185
186
186
* Which air carrier has the most delays as a percentage of flight time?
187
187
@@ -207,21 +207,20 @@ entity is `clientip`.
207
207
208
208
[source,console]
209
209
----------------------------------
210
- POST _transform/_preview
210
+ PUT _transform/suspicious_client_ips
211
211
{
212
212
"source": {
213
- "index": "kibana_sample_data_logs",
214
- "query": { <1>
215
- "range" : {
216
- "timestamp" : {
217
- "gte" : "now-30d/d"
218
- }
219
- }
220
- }
213
+ "index": "kibana_sample_data_logs"
221
214
},
222
- "dest" : { <2 >
215
+ "dest" : { <1 >
223
216
"index" : "sample_weblogs_by_clientip"
224
- },
217
+ },
218
+ "sync" : { <2>
219
+ "time": {
220
+ "field": "timestamp",
221
+ "delay": "60s"
222
+ }
223
+ },
225
224
"pivot": {
226
225
"group_by": { <3>
227
226
"clientip": { "terms": { "field": "clientip" } }
@@ -275,58 +274,82 @@ POST _transform/_preview
275
274
----------------------------------
276
275
// TEST[skip:setup kibana sample data]
277
276
278
- <1> This range query limits the {transform} to documents that are within the
279
- last 30 days at the point in time the {transform} checkpoint is processed. For
280
- batch {transforms} this occurs once.
281
- <2> This is the destination index for the {dataframe}. It is ignored by
282
- `_preview`.
283
- <3> The data is grouped by the `clientip` field.
284
- <4> This `scripted_metric` performs a distributed operation on the web log data
277
+ <1> This is the destination index for the {transform}.
278
+ <2> Configures the {transform} to run continuously. It uses the `timestamp` field
279
+ to synchronize the source and destination indices. The worst case
280
+ ingestion delay is 60 seconds.
281
+ <3> The data is grouped by the `clientip` field.
282
+ <4> This `scripted_metric` performs a distributed operation on the web log data
285
283
to count specific types of HTTP responses (error, success, and other).
286
- <5> This `bucket_script` calculates the duration of the `clientip` access based
284
+ <5> This `bucket_script` calculates the duration of the `clientip` access based
287
285
on the results of the aggregation.
288
286
289
- The preview shows you that the new index would contain data like this for each
290
- client IP:
287
+ After you create the {transform}, you must start it:
288
+
289
+ [source,console]
290
+ ----------------------------------
291
+ POST _transform/suspicious_client_ips/_start
292
+ ----------------------------------
293
+ // TEST[skip:setup kibana sample data]
294
+
295
+ Shortly thereafter, the first results should be available in the destination
296
+ index:
297
+
298
+ [source,console]
299
+ ----------------------------------
300
+ GET sample_weblogs_by_clientip/_search
301
+ ----------------------------------
302
+ // TEST[skip:setup kibana sample data]
303
+
304
+ The search result shows you data like this for each client IP:
291
305
292
306
[source,js]
293
307
----------------------------------
294
- {
295
- "preview" : [
296
- {
297
- "geo" : {
298
- "src_dc" : 12.0,
299
- "dest_dc" : 9.0
300
- },
301
- "clientip" : "0.72.176.46",
302
- "agent_dc" : 3.0,
303
- "responses" : {
304
- "total" : 14.0,
305
- "counts" : {
306
- "other" : 0,
307
- "success" : 14,
308
- "error" : 0
308
+ "hits" : [
309
+ {
310
+ "_index" : "sample_weblogs_by_clientip",
311
+ "_id" : "MOeHH_cUL5urmartKj-b5UQAAAAAAAAA",
312
+ "_score" : 1.0,
313
+ "_source" : {
314
+ "geo" : {
315
+ "src_dc" : 2.0,
316
+ "dest_dc" : 2.0
317
+ },
318
+ "clientip" : "0.72.176.46",
319
+ "agent_dc" : 2.0,
320
+ "bytes_sum" : 4422.0,
321
+ "responses" : {
322
+ "total" : 2.0,
323
+ "counts" : {
324
+ "other" : 0,
325
+ "success" : 2,
326
+ "error" : 0
327
+ }
328
+ },
329
+ "url_dc" : 2.0,
330
+ "timestamp" : {
331
+ "duration_ms" : 5.2191698E8,
332
+ "min" : "2019-11-25T07:51:57.333Z",
333
+ "max" : "2019-12-01T08:50:34.313Z"
334
+ }
309
335
}
310
- },
311
- "bytes_sum" : 74808.0,
312
- "timestamp" : {
313
- "duration_ms" : 4.919943239E9,
314
- "min" : "2019-06-17T07:51:57.333Z",
315
- "max" : "2019-08-13T06:31:00.572Z"
316
- },
317
- "url_dc" : 11.0
318
- },
319
- ...
320
- }
321
- ----------------------------------
336
+ }
337
+ ]
338
+ ----------------------------------
322
339
// NOTCONSOLE
323
340
324
- This {dataframe} makes it easier to answer questions such as:
341
+ NOTE: Like other Kibana sample data sets, the web log sample dataset contains
342
+ timestamps relative to when you installed it, including timestamps in the future.
343
+ The {ctransform} will pick up the data points once they are in the past. If you
344
+ installed the web log sample dataset some time ago, you can uninstall and
345
+ reinstall it and the timestamps will change.
346
+
347
+ This {transform} makes it easier to answer questions such as:
325
348
326
349
* Which client IPs are transferring the most amounts of data?
327
350
328
351
* Which client IPs are interacting with a high number of different URLs?
329
-
352
+
330
353
* Which client IPs have high error rates?
331
-
354
+
332
355
* Which client IPs are interacting with a high number of destination countries?
0 commit comments