Skip to content

Painless: Add an Ingest Script Processor Example #32302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Aug 9, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions buildSrc/src/main/resources/checkstyle_suppressions.xml
Original file line number Diff line number Diff line change
Expand Up @@ -686,6 +686,7 @@
<suppress files="modules[/\\]lang-expression[/\\]src[/\\]main[/\\]java[/\\]org[/\\]elasticsearch[/\\]script[/\\]expression[/\\]ExpressionScriptEngine.java" checks="LineLength" />
<suppress files="modules[/\\]lang-expression[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]script[/\\]expression[/\\]MoreExpressionTests.java" checks="LineLength" />
<suppress files="modules[/\\]lang-expression[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]script[/\\]expression[/\\]StoredExpressionTests.java" checks="LineLength" />
<suppress files="modules[/\\]lang-painless[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]painless[/\\]ContextExampleTests.java" checks="LineLength" />
<suppress files="modules[/\\]reindex[/\\]src[/\\]main[/\\]java[/\\]org[/\\]elasticsearch[/\\]index[/\\]reindex[/\\]TransportUpdateByQueryAction.java" checks="LineLength" />
<suppress files="plugins[/\\]analysis-icu[/\\]src[/\\]main[/\\]java[/\\]org[/\\]elasticsearch[/\\]index[/\\]analysis[/\\]IcuCollationTokenFilterFactory.java" checks="LineLength" />
<suppress files="plugins[/\\]analysis-icu[/\\]src[/\\]main[/\\]java[/\\]org[/\\]elasticsearch[/\\]index[/\\]analysis[/\\]IcuFoldingTokenFilterFactory.java" checks="LineLength" />
Expand Down
6 changes: 4 additions & 2 deletions docs/painless/painless-contexts.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ specialized code may define new ways to use a Painless script.
|====
| Name | Painless Documentation
| Elasticsearch Documentation
| Ingest processor | <<painless-ingest-processor-context, Painless Documentation>>
| {ref}/script-processor.html[Elasticsearch Documentation]
| Update | <<painless-update-context, Painless Documentation>>
| {ref}/docs-update.html[Elasticsearch Documentation]
| Update by query | <<painless-update-by-query-context, Painless Documentation>>
Expand Down Expand Up @@ -44,12 +46,12 @@ specialized code may define new ways to use a Painless script.
| {ref}/search-aggregations-metrics-scripted-metric-aggregation.html[Elasticsearch Documentation]
| Bucket aggregation | <<painless-bucket-agg-context, Painless Documentation>>
| {ref}/search-aggregations-pipeline-bucket-script-aggregation.html[Elasticsearch Documentation]
| Ingest processor | <<painless-ingest-processor-context, Painless Documentation>>
| {ref}/script-processor.html[Elasticsearch Documentation]
| Watcher condition | <<painless-watcher-condition-context, Painless Documentation>>
| {xpack-ref}/condition-script.html[Elasticsearch Documentation]
| Watcher transform | <<painless-watcher-transform-context, Painless Documentation>>
| {xpack-ref}/transform-script.html[Elasticsearch Documentation]
|====

include::painless-contexts/painless-context-examples.asciidoc[]

include::painless-contexts/index.asciidoc[]
4 changes: 2 additions & 2 deletions docs/painless/painless-contexts/index.asciidoc
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
include::painless-ingest-processor-context.asciidoc[]

include::painless-update-context.asciidoc[]

include::painless-update-by-query-context.asciidoc[]
Expand Down Expand Up @@ -28,8 +30,6 @@ include::painless-metric-agg-reduce-context.asciidoc[]

include::painless-bucket-agg-context.asciidoc[]

include::painless-ingest-processor-context.asciidoc[]

include::painless-watcher-condition-context.asciidoc[]

include::painless-watcher-transform-context.asciidoc[]
80 changes: 80 additions & 0 deletions docs/painless/painless-contexts/painless-context-examples.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
[[painless-context-examples]]
=== Context examples

To run the examples, index the sample seat data into Elasticsearch. The examples
must be run sequentially to work correctly.

. Download the
https://download.elastic.co/demos/painless/contexts/seats.json[seat data]. This
data set contains booking information for a collection of plays. Each document
represents a single seat for a play at a particular theater on a specific date
and time.
+
Each document contains the following fields:
+
`theatre` ({ref}/keyword.html[`keyword`])::
The name of the theater the play is in.
`play` ({ref}/text.html[`text`])::
The name of the play.
`actors` ({ref}/text.html[`text`])::
A list of actors in the play.
`row` ({ref}/number.html[`integer`])::
The row of the seat.
`number` ({ref}/number.html[`integer`])::
The number of the seat within a row.
`cost` ({ref}/number.html[`double`])::
The cost of the ticket for the seat.
`sold` ({ref}/boolean.html[`boolean`])::
Whether or not the seat is sold.
`datetime` ({ref}/date.html[`date`])::
The date and time of the play as a date object.
`date` ({ref}/keyword.html[`keyword`])::
The date of the play as a keyword.
`time` ({ref}/keyword.html[`keyword`])::
The time of the play as a keyword.

. {defguide}/running-elasticsearch.html[Start] Elasticsearch. Note these
examples assume Elasticsearch and Kibana are running locally. To use the Console
editor with a remote Kibana instance, click the settings icon and enter the
Console URL. To submit a cURL request to a remote Elasticsearch instance, edit
the request URL.

. Create {ref}/mapping.html[mappings] for the sample data:
+
[source,js]
----
PUT /seats
{
"mappings": {
"seat": {
"properties": {
"theatre": { "type": "keyword" },
"play": { "type": "text" },
"actors": { "type": "text" },
"row": { "type": "integer" },
"number": { "type": "integer" },
"cost": { "type": "double" },
"sold": { "type": "boolean" },
"datetime": { "type": "date" },
"date": { "type": "keyword" },
"time": { "type": "keyword" }
}
}
}
}
----
+
// CONSOLE

. Run the <<painless-ingest-processor-context, ingest processor context>>
example. This sets up a script ingest processor used on each document as the
seat data is indexed.

. Index the seat data:
+
[source,js]
----
curl -XPOST localhost:9200/seats/seat/_bulk?pipeline=seats -H "Content-Type: application/x-ndjson" --data-binary "@/<local-file-path>/seats.json"
----
// NOTCONSOLE

Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ to modify documents upon insertion.
{ref}/mapping-type-field.html[`ctx['_type']`]::
Modify this to change the type for the current document.

`ctx` (`Map`, read-only)::
`ctx` (`Map`)::
Modify the values in the `Map/List` structure to add, modify, or delete
the fields of a document.

Expand All @@ -38,4 +38,158 @@ void::

*API*

The standard <<painless-api-reference, Painless API>> is available.
The standard <<painless-api-reference, Painless API>> is available.

*Example*

To run this example, first follow the steps in
<<painless-context-examples, context examples>>.

The seat data contains:

* A date in the format `YYYY-MM-DD` where the second digit of both month and day
is optional.
* A time in the format HH:MM* where the second digit of both hours and minutes
is optional. The star (*) represents either the `String` `AM` or `PM`.

The following ingest script processes the date and time `Strings` and stores the
result in a `datetime` field.

[source,Painless]
----
String[] split(String s, char d) { <1>
int count = 0;
for (char c : s.toCharArray()) { <2>
if (c == d) {
++count;
}
}
if (count == 0) {
return new String[] {s}; <3>
}
String[] r = new String[count + 1]; <4>
int i0 = 0, i1 = 0;
count = 0;
for (char c : s.toCharArray()) { <5>
if (c == d) {
r[count++] = s.substring(i0, i1);
i0 = i1 + 1;
}
++i1;
}
r[count] = s.substring(i0, i1); <6>
return r;
}
String[] dateSplit = split(ctx.date, (char)"-"); <7>
String year = dateSplit[0].trim();
String month = dateSplit[1].trim();
if (month.length() == 1) { <8>
month = "0" + month;
}
String day = dateSplit[2].trim();
if (day.length() == 1) { <9>
day = "0" + day;
}
boolean pm = ctx.time.substring(ctx.time.length() - 2).equals("PM"); <10>
String[] timeSplit = split(
ctx.time.substring(0, ctx.time.length() - 2), (char)":"); <11>
int hours = Integer.parseInt(timeSplit[0].trim());
int minutes = Integer.parseInt(timeSplit[1].trim());
if (pm) { <12>
hours += 12;
}
String dts = year + "-" + month + "-" + day + "T" +
(hours < 10 ? "0" + hours : "" + hours) + ":" +
(minutes < 10 ? "0" + minutes : "" + minutes) +
":00+08:00"; <13>
ZonedDateTime dt = ZonedDateTime.parse(
dts, DateTimeFormatter.ISO_OFFSET_DATE_TIME); <14>
ctx.datetime = dt.getLong(ChronoField.INSTANT_SECONDS)*1000L; <15>
----
<1> Creates a `split` <<painless-functions, function>> to split a
<<string-type, `String`>> type value using a <<primitive-types, `char`>>
type value as the delimiter. This is useful for handling the necessity of
pulling out the individual pieces of the date and time `Strings` from the
original seat data.
<2> The first pass through each `char` in the `String` collects how many new
`Strings` the original is split into.
<3> Returns the original `String` if there are no instances of the delimiting
`char`.
<4> Creates an <<array-type, array type>> value to collect the split `Strings`
into based on the number of `char` delimiters found in the first pass.
<5> The second pass through each `char` in the `String` collects each split
substring into an array type value of `Strings`.
<6> Collects the last substring into the array type value of `Strings`.
<7> Uses the `split` function to separate the date `String` from the seat data
into year, month, and day `Strings`.
Note::
* The use of a `String` type value to `char` type value
<<string-character-casting, cast>> as part of the second argument since
character literals do not exist.
* The use of the `ctx` ingest processor context variable to retrieve the
data from the `date` field.
<8> Appends the <<string-literals, string literal>> `"0"` value to a single
digit month since the format of the seat data allows for this case.
<9> Appends the <<string-literals, string literal>> `"0"` value to a single
digit day since the format of the seat data allows for this case.
<10> Sets the <<primitive-types, `boolean type`>>
<<painless-variables, variable>> to `true` if the time `String` is a time
in the afternoon or evening.
Note::
* The use of the `ctx` ingest processor context variable to retrieve the
data from the `time` field.
<11> Uses the `split` function to separate the time `String` from the seat data
into hours and minutes `Strings`.
Note::
* The use of the `substring` method to remove the `AM` or `PM` portion of
the time `String`.
* The use of a `String` type value to `char` type value
<<string-character-casting, cast>> as part of the second argument since
character literals do not exist.
* The use of the `ctx` ingest processor context variable to retrieve the
data from the `date` field.
<12> If the time `String` is an afternoon or evening value adds the
<<integer-literals, integer literal>> `12` to the existing hours to move to
a 24-hour based time.
<13> Builds a new time `String` that is parsable using existing API methods.
<14> Creates a `ZonedDateTime` <<reference-types, reference type>> value by using
the API method `parse` to parse the new time `String`.
<15> Sets the datetime field `datetime` to the number of milliseconds retrieved
from the API method `getLong`.
Note::
* The use of the `ctx` ingest processor context variable to set the field
`datetime`. Manipulate each document's fields with the `ctx` variable as
each document is indexed.

Submit the following request:

[source,js]
----
PUT /_ingest/pipeline/seats
{
"description": "update datetime for seats",
"processors": [
{
"script": {
"source": "String[] split(String s, char d) { int count = 0; for (char c : s.toCharArray()) { if (c == d) { ++count; } } if (count == 0) { return new String[] {s}; } String[] r = new String[count + 1]; int i0 = 0, i1 = 0; count = 0; for (char c : s.toCharArray()) { if (c == d) { r[count++] = s.substring(i0, i1); i0 = i1 + 1; } ++i1; } r[count] = s.substring(i0, i1); return r; } String[] dateSplit = split(ctx.date, (char)\"-\"); String year = dateSplit[0].trim(); String month = dateSplit[1].trim(); if (month.length() == 1) { month = \"0\" + month; } String day = dateSplit[2].trim(); if (day.length() == 1) { day = \"0\" + day; } boolean pm = ctx.time.substring(ctx.time.length() - 2).equals(\"PM\"); String[] timeSplit = split(ctx.time.substring(0, ctx.time.length() - 2), (char)\":\"); int hours = Integer.parseInt(timeSplit[0].trim()); int minutes = Integer.parseInt(timeSplit[1].trim()); if (pm) { hours += 12; } String dts = year + \"-\" + month + \"-\" + day + \"T\" + (hours < 10 ? \"0\" + hours : \"\" + hours) + \":\" + (minutes < 10 ? \"0\" + minutes : \"\" + minutes) + \":00+08:00\"; ZonedDateTime dt = ZonedDateTime.parse(dts, DateTimeFormatter.ISO_OFFSET_DATE_TIME); ctx.datetime = dt.getLong(ChronoField.INSTANT_SECONDS)*1000L;"
}
}
]
}
----
// CONSOLE
2 changes: 1 addition & 1 deletion docs/painless/painless-keywords.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Keywords are reserved tokens for built-in language features.

*Errors*

If a keyword is used as an <<painless-identifiers, identifier>>.
* If a keyword is used as an <<painless-identifiers, identifier>>.

*Keywords*

Expand Down
Loading