Skip to content

Commit b17cfc9

Browse files
author
Hendrik Muhs
committed
[Transform][DOCS]rewrite client ip example to use continuous transform (elastic#49822)
adapt the transform example for suspicious client ips to use continuous transform
1 parent cd3744c commit b17cfc9

File tree

1 file changed

+83
-60
lines changed

1 file changed

+83
-60
lines changed

docs/reference/transform/examples.asciidoc

+83-60
Original file line numberDiff line numberDiff line change
@@ -54,18 +54,18 @@ POST _transform/_preview
5454
----------------------------------
5555
// TEST[skip:setup kibana sample data]
5656

57-
<1> This is the destination index for the {dataframe}. It is ignored by
57+
<1> This is the destination index for the {transform}. It is ignored by
5858
`_preview`.
59-
<2> Two `group_by` fields have been selected. This means the {dataframe} will
60-
contain a unique row per `user` and `customer_id` combination. Within this
61-
dataset both these fields are unique. By including both in the {dataframe} it
59+
<2> Two `group_by` fields have been selected. This means the {transform} will
60+
contain a unique row per `user` and `customer_id` combination. Within this
61+
dataset both these fields are unique. By including both in the {transform} it
6262
gives more context to the final results.
6363

6464
NOTE: In the example above, condensed JSON formatting has been used for easier
6565
readability of the pivot object.
6666

67-
The preview {transforms} API enables you to see the layout of the
68-
{dataframe} in advance, populated with some sample values. For example:
67+
The preview {transforms} API enables you to see the layout of the
68+
{transform} in advance, populated with some sample values. For example:
6969

7070
[source,js]
7171
----------------------------------
@@ -86,7 +86,7 @@ The preview {transforms} API enables you to see the layout of the
8686
----------------------------------
8787
// NOTCONSOLE
8888

89-
This {dataframe} makes it easier to answer questions such as:
89+
This {transform} makes it easier to answer questions such as:
9090

9191
* Which customers spend the most?
9292

@@ -154,7 +154,7 @@ POST _transform/_preview
154154
// TEST[skip:setup kibana sample data]
155155

156156
<1> Filter the source data to select only flights that were not cancelled.
157-
<2> This is the destination index for the {dataframe}. It is ignored by
157+
<2> This is the destination index for the {transform}. It is ignored by
158158
`_preview`.
159159
<3> The data is grouped by the `Carrier` field which contains the airline name.
160160
<4> This `bucket_script` performs calculations on the results that are returned
@@ -181,7 +181,7 @@ carrier:
181181
----------------------------------
182182
// NOTCONSOLE
183183

184-
This {dataframe} makes it easier to answer questions such as:
184+
This {transform} makes it easier to answer questions such as:
185185

186186
* Which air carrier has the most delays as a percentage of flight time?
187187

@@ -207,21 +207,20 @@ entity is `clientip`.
207207

208208
[source,console]
209209
----------------------------------
210-
POST _transform/_preview
210+
PUT _transform/suspicious_client_ips
211211
{
212212
"source": {
213-
"index": "kibana_sample_data_logs",
214-
"query": { <1>
215-
"range" : {
216-
"timestamp" : {
217-
"gte" : "now-30d/d"
218-
}
219-
}
220-
}
213+
"index": "kibana_sample_data_logs"
221214
},
222-
"dest" : { <2>
215+
"dest" : { <1>
223216
"index" : "sample_weblogs_by_clientip"
224-
},
217+
},
218+
"sync" : { <2>
219+
"time": {
220+
"field": "timestamp",
221+
"delay": "60s"
222+
}
223+
},
225224
"pivot": {
226225
"group_by": { <3>
227226
"clientip": { "terms": { "field": "clientip" } }
@@ -275,58 +274,82 @@ POST _transform/_preview
275274
----------------------------------
276275
// TEST[skip:setup kibana sample data]
277276

278-
<1> This range query limits the {transform} to documents that are within the
279-
last 30 days at the point in time the {transform} checkpoint is processed. For
280-
batch {transforms} this occurs once.
281-
<2> This is the destination index for the {dataframe}. It is ignored by
282-
`_preview`.
283-
<3> The data is grouped by the `clientip` field.
284-
<4> This `scripted_metric` performs a distributed operation on the web log data
277+
<1> This is the destination index for the {transform}.
278+
<2> Configures the {transform} to run continuously. It uses the `timestamp` field
279+
to synchronize the source and destination indices. The worst case
280+
ingestion delay is 60 seconds.
281+
<3> The data is grouped by the `clientip` field.
282+
<4> This `scripted_metric` performs a distributed operation on the web log data
285283
to count specific types of HTTP responses (error, success, and other).
286-
<5> This `bucket_script` calculates the duration of the `clientip` access based
284+
<5> This `bucket_script` calculates the duration of the `clientip` access based
287285
on the results of the aggregation.
288286

289-
The preview shows you that the new index would contain data like this for each
290-
client IP:
287+
After you create the {transform}, you must start it:
288+
289+
[source,console]
290+
----------------------------------
291+
POST _transform/suspicious_client_ips/_start
292+
----------------------------------
293+
// TEST[skip:setup kibana sample data]
294+
295+
Shortly thereafter, the first results should be available in the destination
296+
index:
297+
298+
[source,console]
299+
----------------------------------
300+
GET sample_weblogs_by_clientip/_search
301+
----------------------------------
302+
// TEST[skip:setup kibana sample data]
303+
304+
The search result shows you data like this for each client IP:
291305

292306
[source,js]
293307
----------------------------------
294-
{
295-
"preview" : [
296-
{
297-
"geo" : {
298-
"src_dc" : 12.0,
299-
"dest_dc" : 9.0
300-
},
301-
"clientip" : "0.72.176.46",
302-
"agent_dc" : 3.0,
303-
"responses" : {
304-
"total" : 14.0,
305-
"counts" : {
306-
"other" : 0,
307-
"success" : 14,
308-
"error" : 0
308+
"hits" : [
309+
{
310+
"_index" : "sample_weblogs_by_clientip",
311+
"_id" : "MOeHH_cUL5urmartKj-b5UQAAAAAAAAA",
312+
"_score" : 1.0,
313+
"_source" : {
314+
"geo" : {
315+
"src_dc" : 2.0,
316+
"dest_dc" : 2.0
317+
},
318+
"clientip" : "0.72.176.46",
319+
"agent_dc" : 2.0,
320+
"bytes_sum" : 4422.0,
321+
"responses" : {
322+
"total" : 2.0,
323+
"counts" : {
324+
"other" : 0,
325+
"success" : 2,
326+
"error" : 0
327+
}
328+
},
329+
"url_dc" : 2.0,
330+
"timestamp" : {
331+
"duration_ms" : 5.2191698E8,
332+
"min" : "2019-11-25T07:51:57.333Z",
333+
"max" : "2019-12-01T08:50:34.313Z"
334+
}
309335
}
310-
},
311-
"bytes_sum" : 74808.0,
312-
"timestamp" : {
313-
"duration_ms" : 4.919943239E9,
314-
"min" : "2019-06-17T07:51:57.333Z",
315-
"max" : "2019-08-13T06:31:00.572Z"
316-
},
317-
"url_dc" : 11.0
318-
},
319-
...
320-
}
321-
----------------------------------
336+
}
337+
]
338+
----------------------------------
322339
// NOTCONSOLE
323340

324-
This {dataframe} makes it easier to answer questions such as:
341+
NOTE: Like other Kibana sample data sets, the web log sample dataset contains
342+
timestamps relative to when you installed it, including timestamps in the future.
343+
The {ctransform} will pick up the data points once they are in the past. If you
344+
installed the web log sample dataset some time ago, you can uninstall and
345+
reinstall it and the timestamps will change.
346+
347+
This {transform} makes it easier to answer questions such as:
325348

326349
* Which client IPs are transferring the most amounts of data?
327350

328351
* Which client IPs are interacting with a high number of different URLs?
329-
352+
330353
* Which client IPs have high error rates?
331-
354+
332355
* Which client IPs are interacting with a high number of destination countries?

0 commit comments

Comments
 (0)