You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Drops the inline callouts from the docs. This is when you write `<1>`
anywhere but the end of a line. Asciidoctor doesn't support them and
we'd very much like to move to Asciidoctor to generate the docs because
it is being actively maintained.
<2> Hive column `date` mapped in {es} to `@timestamp`
86
-
<3> Hive column `url` mapped in {es} to `url_123`
84
+
<1> Hive column `date` mapped in {es} to `@timestamp`; Hive column `url` mapped in {es} to `url_123`
87
85
88
86
TIP: Hive is case **insensitive** while {es} is not. The loss of information can create invalid queries (as the column in Hive might not match the one in {es}). To avoid this, {eh} will always convert Hive column names to lower-case.
89
87
This being said, it is recommended to use the default Hive style and use upper-case names only for Hive commands and avoid mixed-case names.
<1> 'spark' artifact. Notice the `-20` part of the suffix which indicates the Spark version compatible with the artifact. Use `20` for Spark 2.0+ and `13` for Spark 1.3-1.6.
73
-
<2> Notice the `_2.10` suffix which indicates the Scala version compatible with the artifact. Currently it is the same as the version used by Spark itself.
72
+
<1> 'spark' artifact. Notice the `-20` part of the suffix which indicates the
73
+
Spark version compatible with the artifact. Use `20` for Spark 2.0+ and `13` for
74
+
Spark 1.3-1.6. Notice the `_2.10` suffix which indicates the Scala version
75
+
compatible with the artifact. Currently it is the same as the version used by
76
+
Spark itself.
74
77
75
78
The Spark connector framework is the most sensitive to version incompatibilities. For your convenience, a version compatibility matrix has been provided below:
76
79
[cols="2,2,10",options="header",]
@@ -89,7 +92,7 @@ The Spark connector framework is the most sensitive to version incompatibilities
<2> Pig column `date` mapped in {es} to `@timestamp`
171
-
<3> Pig column `uRL` mapped in {es} to `url`
170
+
<1> Pig column `date` mapped in {es} to `@timestamp`; Pig column `uRL` mapped in {es} to `url`
172
171
173
172
TIP: Since {eh} 2.1, the Pig schema case sensitivity is preserved to {es} and back.
174
173
@@ -185,11 +184,13 @@ A = LOAD 'src/test/resources/artists.dat' USING PigStorage()
185
184
-- transform data
186
185
B = FOREACH A GENERATE name, TOTUPLE(url, picture) AS links;
187
186
-- save the result to Elasticsearch
188
-
STORE B INTO 'radio/artists'<1> USING org.elasticsearch.hadoop.pig.EsStorage(<2>);
187
+
STORE B INTO 'radio/artists'<1>
188
+
USING org.elasticsearch.hadoop.pig.EsStorage(); <2>
189
189
----
190
190
191
191
<1> {es} resource (index and type) associated with the given storage
192
-
<2> additional configuration parameters can be passed here - in this case the defaults are used
192
+
<2> additional configuration parameters can be passed inside the `()` - in this
193
+
case the defaults are used
193
194
194
195
For cases where the id (or other metadata fields like +ttl+ or +timestamp+) of the document needs to be specified, one can do so by setting the appropriate <<cfg-mapping, mapping>>, namely +es.mapping.id+. Following the previous example, to indicate to {es} to use the field +id+ as the document id, update the +Storage+ configuration:
195
196
@@ -219,9 +220,9 @@ IMPORTANT: Make sure the data is properly encoded, in `UTF-8`. The field content
219
220
220
221
[source,sql]
221
222
----
222
-
A = LOAD '/resources/artists.json' USING PigStorage() AS (json:chararray<1>);"
223
+
A = LOAD '/resources/artists.json' USING PigStorage() AS (json:chararray);" <1>
223
224
STORE B INTO 'radio/artists'
224
-
USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true'<2>...);
225
+
USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true'...); <2>
225
226
----
226
227
227
228
<1> Load the (JSON) data as a single field (`json`)
@@ -235,8 +236,9 @@ One can index the data to a different resource, depending on the 'row' being rea
235
236
[source,sql]
236
237
----
237
238
A = LOAD 'src/test/resources/media.dat' USING PigStorage()
238
-
AS (name:chararray, type:chararray <1>, year: chararray);
239
-
STORE B INTO 'my-collection-{type}/doc'<2> USING org.elasticsearch.hadoop.pig.EsStorage();
239
+
AS (name:chararray, type:chararray, year: chararray); <1>
240
+
STORE B INTO 'my-collection-{type}/doc' <2>
241
+
USING org.elasticsearch.hadoop.pig.EsStorage();
240
242
----
241
243
242
244
<1> Tuple field used by the resource pattern. Any of the declared fields can be used.
@@ -262,8 +264,8 @@ the table declaration can be as follows:
262
264
263
265
[source,sql]
264
266
----
265
-
A = LOAD '/resources/media.json' USING PigStorage() AS (json:chararray<1>);"
266
-
STORE B INTO 'my-collection-{media_type}/doc'<2>
267
+
A = LOAD '/resources/media.json' USING PigStorage() AS (json:chararray);" <1>
268
+
STORE B INTO 'my-collection-{media_type}/doc'<2>
267
269
USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true');
268
270
----
269
271
@@ -278,8 +280,8 @@ As you would expect, loading the data is straight forward:
278
280
[source,sql]
279
281
----
280
282
-- execute Elasticsearch query and load data into Pig
281
-
A = LOAD 'radio/artists'<1>
282
-
USING org.elasticsearch.hadoop.pig.EsStorage('es.query=?me*'<2>);
283
+
A = LOAD 'radio/artists'<1>
284
+
USING org.elasticsearch.hadoop.pig.EsStorage('es.query=?me*'); <2>
0 commit comments