-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Painless: Add an Ingest Script Processor Example #32302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 13 commits
842fa15
cd5a573
0e069a8
52b4530
390288e
746aae2
74cb056
e133c85
caea6d4
ea9f050
fcba42b
3e306f3
b45ebb2
f22bee3
494a936
533e714
f8934a2
618e1f6
d264f85
76cd36d
8cde957
a09ee39
216a578
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
[[painless-context-examples]] | ||
=== Context examples | ||
|
||
Use the commands below to setup an Elasticsearch cluster to execute the example | ||
scripts for each context where an example is provided. Each example must be | ||
executed in the order of the table in the | ||
<<painless-contexts, previous section>> to work correctly. | ||
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about, "To run the examples, you need to index the sample seat data into Elasticsearch. The examples must be run sequentially to work correctly." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
. Download the | ||
https://download.elastic.co/demos/painless/contexts/seats.json[seat data] | ||
for use in the examples. The data is a fictional set of plays and their | ||
locations and times where each individual document represents a single seat in | ||
a theater for a single play. | ||
+ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general, I try to avoid "meta data" like "for use in the examples" unless it's really needed for clarification. If you set the stage in the intro, "for use in the examples" is redundant. Since you've already presented this as example data, there's really no need to call it out as "fictional". I'd just say, "this data set contains booking information for a collection of plays. Each document represents a single seat for a play at a particular theater on a specific date and time." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. |
||
Each document contains the following values: | ||
+ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We generally talk about the "fields" in a document, rather than "values". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. |
||
`theatre` ({es_version}/keyword.html[`keyword`]):: | ||
The name of the theater the play is in. | ||
`play` ({es_version}/text.html[`text`]):: | ||
The name of the play. | ||
`actors` ({es_version}/text.html[`text`]):: | ||
A list of actors in the play. | ||
`row` ({es_version}/number.html[`integer`]):: | ||
The row of the seat. | ||
`number` ({es_version}/number.html[`integer`]):: | ||
The number of the seat within a row. | ||
`cost` ({es_version}/number.html[`double`]):: | ||
The cost of the ticket for the seat. | ||
`sold` ({es_version}/boolean.html[`boolean`]):: | ||
Whether or not the seat is sold. | ||
`datetime` ({es_version}/date.html[`date`]):: | ||
The date and time of the play as a date object. | ||
`date` ({es_version}/keyword.html[`keyword`]):: | ||
The date of the play as a keyword. | ||
`time` ({es_version}/keyword.html[`keyword`]):: | ||
The time of the play as a keyword. | ||
|
||
. {es_version}/running-elasticsearch.html[Run] an elasticsearch cluster. The | ||
examples assume the cluster is running locally. Modify the examples as necessary | ||
to execute against a remote cluster. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we normally say, "Start Elasticsearch." and show the command. I'd make the bit about the examples assuming localhost as a note. Maybe, "NOTE: These examples assume Elasticsearch and Kibana are running locally. To use the Console editor in a remote Kibana instance, click the settings icon and enter the Console URL. To submit a cURL request to a remote Elasticsearch instance, you'll need to edit the request." FYI, We have a "wishlist" item to be able to specify the Elasticsearch URL for copy as curl like we do for the console. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. And I hope that "wishlist" item comes true! |
||
. Create {es_version}/mapping.html[mappings] using the following console | ||
command: | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd just say, "Create mappings for the sample data:" There are actually large debates about that style, but to my mind "using the following console command" is redundant when you immediately show the following console command. My personal style is: Tell them what to do. Show them how to do it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. And agreed, just slowly working on my tech writing skills :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd say, "with a remote Kibana instance" and flip the last sentence around, "To submit a cURL request to a remote Elasticsearch instance, edit the request URL." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
[source,js] | ||
---- | ||
PUT /seats | ||
{ | ||
"mappings": { | ||
"seat": { | ||
"properties": { | ||
"theatre": { "type": "keyword" }, | ||
"play": { "type": "text" }, | ||
"actors": { "type": "text" }, | ||
"row": { "type": "integer" }, | ||
"number": { "type": "integer" }, | ||
"cost": { "type": "double" }, | ||
"sold": { "type": "boolean" }, | ||
"datetime": { "type": "date" }, | ||
"date": { "type": "keyword" }, | ||
"time": { "type": "keyword" } | ||
} | ||
} | ||
} | ||
} | ||
---- | ||
// CONSOLE | ||
|
||
. Execute the <<painless-ingest-processor-context, ingest>> example. This is a | ||
requirement to upload the seat data for use in all the examples. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general, we try to use "run" rather than "execute". Here, I'd say, "Run the ingest example to upload the seat data:" And then show them how to do it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd just say, "This uploads the seat data that you need to run the examples." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed. |
||
. Upload the seat data using the following console command: | ||
+ | ||
[source,js] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems...redundant with the previous step? And it feels contradictory to say "console command" and then explicitly limit it to a curl command? If we can show it as a Console command, we should. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The ingest command doesn't actually upload the data, it just sets up an ingest pipeline for incoming data to be processed so this is still a required step. Is that what you meant with the question? And, yes I wasn't thinking much about the consistency between the difference of console and curl. Thank you for pointing that out. |
||
---- | ||
curl -XPOST localhost:9200/seats/seat/_bulk?pipeline=seats -H "Content-Type: application/x-ndjson" --data-binary "@/<local-file-path>/seats.json" | ||
---- | ||
// NOTCONSOLE | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,7 +27,7 @@ to modify documents upon insertion. | |
{ref}/mapping-type-field.html[`ctx['_type']`]:: | ||
Modify this to change the type for the current document. | ||
|
||
`ctx` (`Map`, read-only):: | ||
`ctx` (`Map`):: | ||
Modify the values in the `Map/List` structure to add, modify, or delete | ||
the fields of a document. | ||
|
||
|
@@ -38,4 +38,145 @@ void:: | |
|
||
*API* | ||
|
||
The standard <<painless-api-reference, Painless API>> is available. | ||
The standard <<painless-api-reference, Painless API>> is available. | ||
|
||
*Example* | ||
|
||
To use this example first follow the steps outlined in | ||
<<painless-context-examples, context examples>>. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd be more explicit... "To run this example, you need to <<painless-context-examples, ingest the sample data>>. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. |
||
The original seat data gives a date `String` with the format `YYYY-MM-DD` where | ||
the second digit of each the month and the day is optional and gives a time | ||
`String` with the format `HH:MM*` where the second digit of each the hours and | ||
the minutes is optional and the star represents a `String` of either `AM` or | ||
`PM`. The following ingest script processor converts the given date and time | ||
`Strings` into a useful format for storage into a date field. Upon completion of | ||
the script each document will have its `datetime` field value filled in. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Strictly editorial...I'd say: The seat data contains:
The following ingest script processes the date and time Strings and stores the result in a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's much better. Fixed. |
||
The following script is used as the ingest script processor: | ||
|
||
[source,Painless] | ||
---- | ||
<1> String[] split(String s, char d) { | ||
int count = 0; | ||
|
||
<2> for (char c : s.toCharArray()) { | ||
if (c == d) { | ||
++count; | ||
} | ||
} | ||
|
||
<3> if (count == 0) { | ||
return new String[] {s}; | ||
} | ||
|
||
<4> String[] r = new String[count + 1]; | ||
int i0 = 0, i1 = 0; | ||
count = 0; | ||
|
||
<5> for (char c : s.toCharArray()) { | ||
if (c == d) { | ||
r[count++] = s.substring(i0, i1); | ||
i0 = i1 + 1; | ||
} | ||
|
||
++i1; | ||
} | ||
|
||
<6> r[count] = s.substring(i0, i1); | ||
|
||
return r; | ||
} | ||
|
||
<7> String[] dateSplit = split(ctx.date, (char)"-"); | ||
String year = dateSplit[0].trim(); | ||
String month = dateSplit[1].trim(); | ||
|
||
<8> if (month.length() == 1) { | ||
month = "0" + month; | ||
} | ||
|
||
String day = dateSplit[2].trim(); | ||
|
||
<9> if (day.length() == 1) { | ||
day = "0" + day; | ||
} | ||
|
||
<10> boolean pm = ctx.time.substring(ctx.time.length() - 2).equals("PM"); | ||
<11> String[] timeSplit = split( | ||
ctx.time.substring(0, ctx.time.length() - 2), (char)":"); | ||
int hours = Integer.parseInt(timeSplit[0].trim()); | ||
int minutes = Integer.parseInt(timeSplit[1].trim()); | ||
|
||
<12> if (pm) { | ||
hours += 12; | ||
} | ||
|
||
<13> String dts = year + "-" + month + "-" + day + "T" + | ||
(hours < 10 ? "0" + hours : "" + hours) + ":" + | ||
(minutes < 10 ? "0" + minutes : "" + minutes) + | ||
":00+08:00"; | ||
|
||
<14> ZonedDateTime dt = ZonedDateTime.parse( | ||
dts, DateTimeFormatter.ISO_OFFSET_DATE_TIME); | ||
<15> ctx.datetime = dt.getLong(ChronoField.INSTANT_SECONDS)*1000L; | ||
---- | ||
<1> Creates a `split` <<painless-functions, function>> to split a | ||
<<string-type, `String`>> type value using a <<primitive-types, `char`>> | ||
type value as the delimiter. This is useful for handling the necessity of | ||
pulling out the individual pieces of the date and time `Strings` from the | ||
original seat data since . | ||
<2> The first pass through each `char` in the `String` collects how many new | ||
`Strings` the original is split into. | ||
<3> Returns the original `String` if there are no instances of the delimiting | ||
`char`. | ||
<4> Creates an <<array-type, array type>> value to collect the split `Strings` | ||
into based on the number of `char` delimiters found in the first pass. | ||
<5> The second pass through each `char` in the `String` collects each split | ||
substring into an array type value of `Strings`. | ||
<6> Collects the last substring into the array type value of `Strings`. | ||
<7> Uses the `split` function to separate the date `String` from the seat data | ||
into year, month, and day `Strings`. Note the use of a `String` type value | ||
to `char` type value <<string-character-casting, cast>> as part of the | ||
second argument since character literals do not exist. | ||
<8> Appends the <<string-literals, string literal>> `"0"` value to a single | ||
digit month since the format of the seat data allows for this case. | ||
<9> Appends the <<string-literals, string literal>> `"0"` value to a single | ||
digit day since the format of the seat data allows for this case. | ||
<10> Sets the <<primitive-types, `boolean type`>> | ||
<<painless-variables, variable>> to `true` if the time `String` is a time | ||
in the afternoon or evening. | ||
<11> Uses the `split` function to separate the time `String` from the seat data | ||
into hours and minutes `Strings`. Note the use of the `substring` method to | ||
remove the `AM` or `PM` portion of the time `String`. Also note the use of | ||
a `String` type value to `char` type value | ||
<<string-character-casting, cast>> as part of the second argument since | ||
character literals do not exist. | ||
<12> If the time `String` is an afternoon or evening value adds the | ||
<<integer-literals, integer literal>> `12` to the existing hours to move to | ||
a 24-hour based time. | ||
<13> Builds a new time `String` that is parsable using existing API methods. | ||
<14> Creates a `ZonedDateTime` <<reference-types, reference type>> value by using | ||
the API method `parse` to parse the new time `String`. | ||
<15> Sets the datetime field `datetime` to the number of milliseconds retrieved | ||
from the API method `getLong`. Note the use of the `ctx` ingest processor | ||
context variable to set the field `datetime`. Manipulate each document's | ||
fields with the `ctx` variable as each document is indexed. | ||
|
||
Use the following curl command to create the ingest script processor: | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As above--conflating console vs cURL. I generally phrase this as, "Submit the following request to xyz:" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed. |
||
[source,js] | ||
---- | ||
PUT /_ingest/pipeline/seats | ||
{ | ||
"description": "update datetime for seats", | ||
"processors": [ | ||
{ | ||
"script": { | ||
"source": "String[] split(String s, char d) { int count = 0; for (char c : s.toCharArray()) { if (c == d) { ++count; } } if (count == 0) { return new String[] {s}; } String[] r = new String[count + 1]; int i0 = 0, i1 = 0; count = 0; for (char c : s.toCharArray()) { if (c == d) { r[count++] = s.substring(i0, i1); i0 = i1 + 1; } ++i1; } r[count] = s.substring(i0, i1); return r; } String[] dateSplit = split(ctx.date, (char)\"-\"); String year = dateSplit[0].trim(); String month = dateSplit[1].trim(); if (month.length() == 1) { month = \"0\" + month; } String day = dateSplit[2].trim(); if (day.length() == 1) { day = \"0\" + day; } boolean pm = ctx.time.substring(ctx.time.length() - 2).equals(\"PM\"); String[] timeSplit = split(ctx.time.substring(0, ctx.time.length() - 2), (char)\":\"); int hours = Integer.parseInt(timeSplit[0].trim()); int minutes = Integer.parseInt(timeSplit[1].trim()); if (pm) { hours += 12; } String dts = year + \"-\" + month + \"-\" + day + \"T\" + (hours < 10 ? \"0\" + hours : \"\" + hours) + \":\" + (minutes < 10 ? \"0\" + minutes : \"\" + minutes) + \":00+08:00\"; ZonedDateTime dt = ZonedDateTime.parse(dts, DateTimeFormatter.ISO_OFFSET_DATE_TIME); ctx.datetime = dt.getLong(ChronoField.INSTANT_SECONDS)*1000L;" | ||
} | ||
} | ||
] | ||
} | ||
---- | ||
// CONSOLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd flip this around and say something like, "To run the examples, you first need to index the sample data into Elasticsearch:"
...and then step through that process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.