Skip to content

Commit 31eac83

Browse files
committed
Painless: Add an Ingest Script Processor Example (#32302)
This commit adds two pieces. The first is a small set of documentation providing instructions on how to get setup to run context examples. This will require a download similar to how Kibana works for some of the examples. The second is an ingest processor example using the downloaded data. More examples will follow as ideally one per PR. This also adds a set of tests to individually test each script as a unit test.
1 parent 6fa3f7a commit 31eac83

File tree

7 files changed

+555
-7
lines changed

7 files changed

+555
-7
lines changed

buildSrc/src/main/resources/checkstyle_suppressions.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -720,6 +720,7 @@
720720
<suppress files="modules[/\\]lang-expression[/\\]src[/\\]main[/\\]java[/\\]org[/\\]elasticsearch[/\\]script[/\\]expression[/\\]ExpressionScriptEngine.java" checks="LineLength" />
721721
<suppress files="modules[/\\]lang-expression[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]script[/\\]expression[/\\]MoreExpressionTests.java" checks="LineLength" />
722722
<suppress files="modules[/\\]lang-expression[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]script[/\\]expression[/\\]StoredExpressionTests.java" checks="LineLength" />
723+
<suppress files="modules[/\\]lang-painless[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]painless[/\\]ContextExampleTests.java" checks="LineLength" />
723724
<suppress files="modules[/\\]reindex[/\\]src[/\\]main[/\\]java[/\\]org[/\\]elasticsearch[/\\]index[/\\]reindex[/\\]TransportUpdateByQueryAction.java" checks="LineLength" />
724725
<suppress files="plugins[/\\]analysis-icu[/\\]src[/\\]main[/\\]java[/\\]org[/\\]elasticsearch[/\\]index[/\\]analysis[/\\]IcuCollationTokenFilterFactory.java" checks="LineLength" />
725726
<suppress files="plugins[/\\]analysis-icu[/\\]src[/\\]main[/\\]java[/\\]org[/\\]elasticsearch[/\\]index[/\\]analysis[/\\]IcuFoldingTokenFilterFactory.java" checks="LineLength" />

docs/painless/painless-contexts.asciidoc

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ specialized code may define new ways to use a Painless script.
1414
|====
1515
| Name | Painless Documentation
1616
| Elasticsearch Documentation
17+
| Ingest processor | <<painless-ingest-processor-context, Painless Documentation>>
18+
| {ref}/script-processor.html[Elasticsearch Documentation]
1719
| Update | <<painless-update-context, Painless Documentation>>
1820
| {ref}/docs-update.html[Elasticsearch Documentation]
1921
| Update by query | <<painless-update-by-query-context, Painless Documentation>>
@@ -44,12 +46,12 @@ specialized code may define new ways to use a Painless script.
4446
| {ref}/search-aggregations-metrics-scripted-metric-aggregation.html[Elasticsearch Documentation]
4547
| Bucket aggregation | <<painless-bucket-agg-context, Painless Documentation>>
4648
| {ref}/search-aggregations-pipeline-bucket-script-aggregation.html[Elasticsearch Documentation]
47-
| Ingest processor | <<painless-ingest-processor-context, Painless Documentation>>
48-
| {ref}/script-processor.html[Elasticsearch Documentation]
4949
| Watcher condition | <<painless-watcher-condition-context, Painless Documentation>>
5050
| {xpack-ref}/condition-script.html[Elasticsearch Documentation]
5151
| Watcher transform | <<painless-watcher-transform-context, Painless Documentation>>
5252
| {xpack-ref}/transform-script.html[Elasticsearch Documentation]
5353
|====
5454

55+
include::painless-contexts/painless-context-examples.asciidoc[]
56+
5557
include::painless-contexts/index.asciidoc[]

docs/painless/painless-contexts/index.asciidoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
include::painless-ingest-processor-context.asciidoc[]
2+
13
include::painless-update-context.asciidoc[]
24

35
include::painless-update-by-query-context.asciidoc[]
@@ -28,8 +30,6 @@ include::painless-metric-agg-reduce-context.asciidoc[]
2830

2931
include::painless-bucket-agg-context.asciidoc[]
3032

31-
include::painless-ingest-processor-context.asciidoc[]
32-
3333
include::painless-watcher-condition-context.asciidoc[]
3434

3535
include::painless-watcher-transform-context.asciidoc[]
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
[[painless-context-examples]]
2+
=== Context examples
3+
4+
To run the examples, index the sample seat data into Elasticsearch. The examples
5+
must be run sequentially to work correctly.
6+
7+
. Download the
8+
https://download.elastic.co/demos/painless/contexts/seats.json[seat data]. This
9+
data set contains booking information for a collection of plays. Each document
10+
represents a single seat for a play at a particular theater on a specific date
11+
and time.
12+
+
13+
Each document contains the following fields:
14+
+
15+
`theatre` ({ref}/keyword.html[`keyword`])::
16+
The name of the theater the play is in.
17+
`play` ({ref}/text.html[`text`])::
18+
The name of the play.
19+
`actors` ({ref}/text.html[`text`])::
20+
A list of actors in the play.
21+
`row` ({ref}/number.html[`integer`])::
22+
The row of the seat.
23+
`number` ({ref}/number.html[`integer`])::
24+
The number of the seat within a row.
25+
`cost` ({ref}/number.html[`double`])::
26+
The cost of the ticket for the seat.
27+
`sold` ({ref}/boolean.html[`boolean`])::
28+
Whether or not the seat is sold.
29+
`datetime` ({ref}/date.html[`date`])::
30+
The date and time of the play as a date object.
31+
`date` ({ref}/keyword.html[`keyword`])::
32+
The date of the play as a keyword.
33+
`time` ({ref}/keyword.html[`keyword`])::
34+
The time of the play as a keyword.
35+
36+
. {defguide}/running-elasticsearch.html[Start] Elasticsearch. Note these
37+
examples assume Elasticsearch and Kibana are running locally. To use the Console
38+
editor with a remote Kibana instance, click the settings icon and enter the
39+
Console URL. To submit a cURL request to a remote Elasticsearch instance, edit
40+
the request URL.
41+
42+
. Create {ref}/mapping.html[mappings] for the sample data:
43+
+
44+
[source,js]
45+
----
46+
PUT /seats
47+
{
48+
"mappings": {
49+
"seat": {
50+
"properties": {
51+
"theatre": { "type": "keyword" },
52+
"play": { "type": "text" },
53+
"actors": { "type": "text" },
54+
"row": { "type": "integer" },
55+
"number": { "type": "integer" },
56+
"cost": { "type": "double" },
57+
"sold": { "type": "boolean" },
58+
"datetime": { "type": "date" },
59+
"date": { "type": "keyword" },
60+
"time": { "type": "keyword" }
61+
}
62+
}
63+
}
64+
}
65+
----
66+
+
67+
// CONSOLE
68+
69+
. Run the <<painless-ingest-processor-context, ingest processor context>>
70+
example. This sets up a script ingest processor used on each document as the
71+
seat data is indexed.
72+
73+
. Index the seat data:
74+
+
75+
[source,js]
76+
----
77+
curl -XPOST localhost:9200/seats/seat/_bulk?pipeline=seats -H "Content-Type: application/x-ndjson" --data-binary "@/<local-file-path>/seats.json"
78+
----
79+
// NOTCONSOLE
80+

docs/painless/painless-contexts/painless-ingest-processor-context.asciidoc

Lines changed: 156 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ to modify documents upon insertion.
2727
{ref}/mapping-type-field.html[`ctx['_type']`]::
2828
Modify this to change the type for the current document.
2929

30-
`ctx` (`Map`, read-only)::
30+
`ctx` (`Map`)::
3131
Modify the values in the `Map/List` structure to add, modify, or delete
3232
the fields of a document.
3333

@@ -38,4 +38,158 @@ void::
3838

3939
*API*
4040

41-
The standard <<painless-api-reference, Painless API>> is available.
41+
The standard <<painless-api-reference, Painless API>> is available.
42+
43+
*Example*
44+
45+
To run this example, first follow the steps in
46+
<<painless-context-examples, context examples>>.
47+
48+
The seat data contains:
49+
50+
* A date in the format `YYYY-MM-DD` where the second digit of both month and day
51+
is optional.
52+
* A time in the format HH:MM* where the second digit of both hours and minutes
53+
is optional. The star (*) represents either the `String` `AM` or `PM`.
54+
55+
The following ingest script processes the date and time `Strings` and stores the
56+
result in a `datetime` field.
57+
58+
[source,Painless]
59+
----
60+
String[] split(String s, char d) { <1>
61+
int count = 0;
62+
63+
for (char c : s.toCharArray()) { <2>
64+
if (c == d) {
65+
++count;
66+
}
67+
}
68+
69+
if (count == 0) {
70+
return new String[] {s}; <3>
71+
}
72+
73+
String[] r = new String[count + 1]; <4>
74+
int i0 = 0, i1 = 0;
75+
count = 0;
76+
77+
for (char c : s.toCharArray()) { <5>
78+
if (c == d) {
79+
r[count++] = s.substring(i0, i1);
80+
i0 = i1 + 1;
81+
}
82+
83+
++i1;
84+
}
85+
86+
r[count] = s.substring(i0, i1); <6>
87+
88+
return r;
89+
}
90+
91+
String[] dateSplit = split(ctx.date, (char)"-"); <7>
92+
String year = dateSplit[0].trim();
93+
String month = dateSplit[1].trim();
94+
95+
if (month.length() == 1) { <8>
96+
month = "0" + month;
97+
}
98+
99+
String day = dateSplit[2].trim();
100+
101+
if (day.length() == 1) { <9>
102+
day = "0" + day;
103+
}
104+
105+
boolean pm = ctx.time.substring(ctx.time.length() - 2).equals("PM"); <10>
106+
String[] timeSplit = split(
107+
ctx.time.substring(0, ctx.time.length() - 2), (char)":"); <11>
108+
int hours = Integer.parseInt(timeSplit[0].trim());
109+
int minutes = Integer.parseInt(timeSplit[1].trim());
110+
111+
if (pm) { <12>
112+
hours += 12;
113+
}
114+
115+
String dts = year + "-" + month + "-" + day + "T" +
116+
(hours < 10 ? "0" + hours : "" + hours) + ":" +
117+
(minutes < 10 ? "0" + minutes : "" + minutes) +
118+
":00+08:00"; <13>
119+
120+
ZonedDateTime dt = ZonedDateTime.parse(
121+
dts, DateTimeFormatter.ISO_OFFSET_DATE_TIME); <14>
122+
ctx.datetime = dt.getLong(ChronoField.INSTANT_SECONDS)*1000L; <15>
123+
----
124+
<1> Creates a `split` <<painless-functions, function>> to split a
125+
<<string-type, `String`>> type value using a <<primitive-types, `char`>>
126+
type value as the delimiter. This is useful for handling the necessity of
127+
pulling out the individual pieces of the date and time `Strings` from the
128+
original seat data.
129+
<2> The first pass through each `char` in the `String` collects how many new
130+
`Strings` the original is split into.
131+
<3> Returns the original `String` if there are no instances of the delimiting
132+
`char`.
133+
<4> Creates an <<array-type, array type>> value to collect the split `Strings`
134+
into based on the number of `char` delimiters found in the first pass.
135+
<5> The second pass through each `char` in the `String` collects each split
136+
substring into an array type value of `Strings`.
137+
<6> Collects the last substring into the array type value of `Strings`.
138+
<7> Uses the `split` function to separate the date `String` from the seat data
139+
into year, month, and day `Strings`.
140+
Note::
141+
* The use of a `String` type value to `char` type value
142+
<<string-character-casting, cast>> as part of the second argument since
143+
character literals do not exist.
144+
* The use of the `ctx` ingest processor context variable to retrieve the
145+
data from the `date` field.
146+
<8> Appends the <<string-literals, string literal>> `"0"` value to a single
147+
digit month since the format of the seat data allows for this case.
148+
<9> Appends the <<string-literals, string literal>> `"0"` value to a single
149+
digit day since the format of the seat data allows for this case.
150+
<10> Sets the <<primitive-types, `boolean type`>>
151+
<<painless-variables, variable>> to `true` if the time `String` is a time
152+
in the afternoon or evening.
153+
Note::
154+
* The use of the `ctx` ingest processor context variable to retrieve the
155+
data from the `time` field.
156+
<11> Uses the `split` function to separate the time `String` from the seat data
157+
into hours and minutes `Strings`.
158+
Note::
159+
* The use of the `substring` method to remove the `AM` or `PM` portion of
160+
the time `String`.
161+
* The use of a `String` type value to `char` type value
162+
<<string-character-casting, cast>> as part of the second argument since
163+
character literals do not exist.
164+
* The use of the `ctx` ingest processor context variable to retrieve the
165+
data from the `date` field.
166+
<12> If the time `String` is an afternoon or evening value adds the
167+
<<integer-literals, integer literal>> `12` to the existing hours to move to
168+
a 24-hour based time.
169+
<13> Builds a new time `String` that is parsable using existing API methods.
170+
<14> Creates a `ZonedDateTime` <<reference-types, reference type>> value by using
171+
the API method `parse` to parse the new time `String`.
172+
<15> Sets the datetime field `datetime` to the number of milliseconds retrieved
173+
from the API method `getLong`.
174+
Note::
175+
* The use of the `ctx` ingest processor context variable to set the field
176+
`datetime`. Manipulate each document's fields with the `ctx` variable as
177+
each document is indexed.
178+
179+
Submit the following request:
180+
181+
[source,js]
182+
----
183+
PUT /_ingest/pipeline/seats
184+
{
185+
"description": "update datetime for seats",
186+
"processors": [
187+
{
188+
"script": {
189+
"source": "String[] split(String s, char d) { int count = 0; for (char c : s.toCharArray()) { if (c == d) { ++count; } } if (count == 0) { return new String[] {s}; } String[] r = new String[count + 1]; int i0 = 0, i1 = 0; count = 0; for (char c : s.toCharArray()) { if (c == d) { r[count++] = s.substring(i0, i1); i0 = i1 + 1; } ++i1; } r[count] = s.substring(i0, i1); return r; } String[] dateSplit = split(ctx.date, (char)\"-\"); String year = dateSplit[0].trim(); String month = dateSplit[1].trim(); if (month.length() == 1) { month = \"0\" + month; } String day = dateSplit[2].trim(); if (day.length() == 1) { day = \"0\" + day; } boolean pm = ctx.time.substring(ctx.time.length() - 2).equals(\"PM\"); String[] timeSplit = split(ctx.time.substring(0, ctx.time.length() - 2), (char)\":\"); int hours = Integer.parseInt(timeSplit[0].trim()); int minutes = Integer.parseInt(timeSplit[1].trim()); if (pm) { hours += 12; } String dts = year + \"-\" + month + \"-\" + day + \"T\" + (hours < 10 ? \"0\" + hours : \"\" + hours) + \":\" + (minutes < 10 ? \"0\" + minutes : \"\" + minutes) + \":00+08:00\"; ZonedDateTime dt = ZonedDateTime.parse(dts, DateTimeFormatter.ISO_OFFSET_DATE_TIME); ctx.datetime = dt.getLong(ChronoField.INSTANT_SECONDS)*1000L;"
190+
}
191+
}
192+
]
193+
}
194+
----
195+
// CONSOLE

docs/painless/painless-keywords.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Keywords are reserved tokens for built-in language features.
55

66
*Errors*
77

8-
If a keyword is used as an <<painless-identifiers, identifier>>.
8+
* If a keyword is used as an <<painless-identifiers, identifier>>.
99

1010
*Keywords*
1111

0 commit comments

Comments
 (0)