Skip to content

Commit ddf3bc2

Browse files
authored
Change how max_matches affects target_field option. (#47982)
Prior to this change the `target_field` would always be a json array field in the document being ingested. This to take into account that multiple enrich documents could be inserted into the `target_field`. However the default `max_matches` is `1`. Meaning that by default only a single enrich document would be added to `target_field` json array field. This commit changes this; if `max_matches` is set to `1` then the single document would be added as a json object to the `target_field` and if it is configured to a higher value then the enrich documents will be added as a json array (even if a single enrich document happens to be enriched).
1 parent 6ed7d69 commit ddf3bc2

File tree

9 files changed

+73
-75
lines changed

9 files changed

+73
-75
lines changed

docs/reference/ingest/apis/enrich/put-enrich-policy.asciidoc

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -322,15 +322,13 @@ The API returns the following response:
322322
"_seq_no": 55,
323323
"_primary_term": 1,
324324
"_source": {
325-
"geo_data": [
326-
{
327-
"location": {
328-
"type": "envelope",
329-
"coordinates": [[13.0, 53.0], [14.0, 52.0]]
330-
},
331-
"postal_code": "96598"
332-
}
333-
],
325+
"geo_data": {
326+
"location": {
327+
"type": "envelope",
328+
"coordinates": [[13.0, 53.0], [14.0, 52.0]]
329+
},
330+
"postal_code": "96598"
331+
},
334332
"first_name": "Mardy",
335333
"last_name": "Brown",
336334
"geo_location": "POINT (13.5 52.5)"

docs/reference/ingest/enrich.asciidoc

Lines changed: 30 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -167,42 +167,47 @@ PUT /_ingest/pipeline/user_lookup
167167
"enrich" : {
168168
"policy_name": "users-policy",
169169
"field" : "email",
170-
"target_field": "user"
170+
"target_field": "user",
171+
"max_matches": "1"
171172
}
172173
}
173174
]
174175
}
175176
----
176177
// TEST[continued]
177178

179+
Because the enrich policy type is `match`,
180+
the enrich processor matches incoming documents
181+
to documents in the enrich index
182+
based on match field values.
183+
The enrich processor then appends the enrich field data
184+
from matching documents in the enrich index
185+
to the target field of incoming documents.
186+
187+
Because the `max_matches` option for the enrich processor is `1`,
188+
the enrich processor appends the data from only the best matching document
189+
to each incoming document's target field as an object.
190+
191+
If the `max_matches` option were greater than `1`,
192+
the processor could append data from up to the `max_matches` number of documents
193+
to the target field as an array.
194+
195+
If the incoming document matches no documents in the enrich index,
196+
the processor appends no data.
197+
178198
You also can add other <<ingest-processors,processors>>
179199
to your ingest pipeline.
180200
You can use these processors to change or drop incoming documents
181201
based on your criteria.
182-
183202
See <<ingest-processors>> for a list of built-in processors.
184203

204+
185205
[float]
186206
[[ingest-enrich-docs]]
187207
==== Ingest and enrich documents
188208

189209
Index incoming documents using your ingest pipeline.
190210

191-
Because the enrich policy type is `match`,
192-
the enrich processor matches incoming documents
193-
to documents in the enrich index
194-
based on match field values.
195-
The processor then appends the enrich field data
196-
from any matching document in the enrich index
197-
to target field of the incoming document.
198-
199-
The enrich processor appends all data to the target field as an array.
200-
If the incoming document matches more than one document in the enrich index,
201-
the processor appends data from those documents to the array.
202-
203-
If the incoming document matches no documents in the enrich index,
204-
the processor appends no data.
205-
206211
The following <<docs-index_,index API>> request uses the ingest pipeline
207212
to index a document
208213
containing the `email` field
@@ -239,16 +244,14 @@ The API returns the following response:
239244
"_seq_no": 55,
240245
"_primary_term": 1,
241246
"_source": {
242-
"user": [
243-
{
244-
"email": "[email protected]",
245-
"first_name": "Mardy",
246-
"last_name": "Brown",
247-
"zip": 70116,
248-
"city": "New Orleans",
249-
"state": "LA"
250-
}
251-
],
247+
"user": {
248+
"email": "[email protected]",
249+
"first_name": "Mardy",
250+
"last_name": "Brown",
251+
"zip": 70116,
252+
"city": "New Orleans",
253+
"state": "LA"
254+
},
252255
"email": "[email protected]"
253256
}
254257
}

docs/reference/ingest/processors/enrich.asciidoc

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,18 +16,9 @@ check out the <<ingest-enriching-data,tutorial>> to get familiar with enrich pol
1616
| `field` | yes | - | The field in the input document that matches the policies match_field used to retrieve the enrichment data.
1717
| `target_field` | yes | - | The field that will be used for the enrichment data.
1818
| `ignore_missing` | no | false | If `true` and `field` does not exist, the processor quietly exits without modifying the document
19-
| `override` | no | true | If processor will update fields with pre-existing non-null-valued field. When set to `false`, such fields will not be touched.
20-
| `max_matches` | no | 1 | The maximum number of matched documents to include under the configured target field. In order to avoid documents getting too large, the maximum allowed value is 128.
21-
| `shape_relation` | no | `INTERSECTS` a| Spatial relation operator
22-
used to match the <<geo-shape,geo_shape>> of incoming documents
23-
to documents in the enrich index.
24-
+
25-
This option is only used for `geo_match` enrich policy types.
26-
+
27-
The <<spatial-strategy, geo_shape strategy>> mapping parameter determines
28-
which spatial relation operators are availlble.
29-
See <<_spatial_relations>>
30-
for operators and more information.
19+
| `override` | no | true | If processor will update fields with pre-existing non-null-valued field. When set to `false`, such fields will not be touched.
20+
| `max_matches` | no | 1 | The maximum number of matched documents to include under the configured target field. The `target_field` will be turned into a json array if `max_matches` is higher than 1, otherwise `target_field` will become a json object. In order to avoid documents getting too large, the maximum allowed value is 128.
21+
| `shape_relation` | no | `INTERSECTS` | A spatial relation operator used to match the <<geo-shape,geo_shape>> of incoming documents to documents in the enrich index. This option is only used for `geo_match` enrich policy types. The <<spatial-strategy, geo_shape strategy>> mapping parameter determines which spatial relation operators are available. See <<_spatial_relations>> for operators and more information.
3122

3223
include::common-options.asciidoc[]
3324
|======

x-pack/plugin/enrich/qa/common/src/main/java/org/elasticsearch/test/enrich/CommonEnrichRestTestCase.java

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -80,13 +80,12 @@ private void setupGenericLifecycleTest(boolean deletePipeilne) throws Exception
8080
// Check if document has been enriched
8181
Request getRequest = new Request("GET", "/my-index/_doc/1");
8282
Map<String, Object> response = toMap(client().performRequest(getRequest));
83-
List<?> entries = (List<?>) ((Map<?, ?>) response.get("_source")).get("entry");
84-
Map<?, ?> _source = (Map<?, ?>) entries.get(0);
85-
assertThat(_source.size(), equalTo(4));
86-
assertThat(_source.get("host"), equalTo("elastic.co"));
87-
assertThat(_source.get("tld"), equalTo("co"));
88-
assertThat(_source.get("globalRank"), equalTo(25));
89-
assertThat(_source.get("tldRank"), equalTo(7));
83+
Map<?, ?> entry = (Map<?, ?>) ((Map<?, ?>) response.get("_source")).get("entry");
84+
assertThat(entry.size(), equalTo(4));
85+
assertThat(entry.get("host"), equalTo("elastic.co"));
86+
assertThat(entry.get("tld"), equalTo("co"));
87+
assertThat(entry.get("globalRank"), equalTo(25));
88+
assertThat(entry.get("tldRank"), equalTo(7));
9089

9190
if (deletePipeilne) {
9291
// delete the pipeline so the policies can be deleted

x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/AbstractEnrichProcessor.java

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -96,12 +96,17 @@ public void execute(IngestDocument ingestDocument, BiConsumer<IngestDocument, Ex
9696
}
9797

9898
if (overrideEnabled || ingestDocument.hasField(targetField) == false) {
99-
List<Map<String, Object>> enrichDocuments = new ArrayList<>(searchHits.length);
100-
for (SearchHit searchHit : searchHits) {
101-
Map<String, Object> enrichDocument = searchHit.getSourceAsMap();
102-
enrichDocuments.add(enrichDocument);
99+
if (maxMatches == 1) {
100+
Map<String, Object> firstDocument = searchHits[0].getSourceAsMap();
101+
ingestDocument.setFieldValue(targetField, firstDocument);
102+
} else {
103+
List<Map<String, Object>> enrichDocuments = new ArrayList<>(searchHits.length);
104+
for (SearchHit searchHit : searchHits) {
105+
Map<String, Object> enrichDocument = searchHit.getSourceAsMap();
106+
enrichDocuments.add(enrichDocument);
107+
}
108+
ingestDocument.setFieldValue(targetField, enrichDocuments);
103109
}
104-
ingestDocument.setFieldValue(targetField, enrichDocuments);
105110
}
106111
handler.accept(ingestDocument, null);
107112
});

x-pack/plugin/enrich/src/test/java/org/elasticsearch/xpack/enrich/BasicEnrichTests.java

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -153,9 +153,11 @@ public void testIngestDataWithGeoMatchProcessor() {
153153

154154
GetResponse getResponse = client().get(new GetRequest("my-index", "_id")).actionGet();
155155
Map<String, Object> source = getResponse.getSourceAsMap();
156-
List<?> entries = (List<?>) source.get("enriched");
156+
Map<?, ?> entries = (Map) source.get("enriched");
157157
assertThat(entries, notNullValue());
158-
assertThat(entries.size(), equalTo(1));
158+
assertThat(entries.size(), equalTo(2));
159+
assertThat(entries.containsKey(matchField), is(true));
160+
assertThat(entries.get(enrichField), equalTo("94040"));
159161

160162
EnrichStatsAction.Response statsResponse =
161163
client().execute(EnrichStatsAction.INSTANCE, new EnrichStatsAction.Request()).actionGet();
@@ -204,7 +206,7 @@ public void testMultiplePolicies() {
204206
GetResponse getResponse = client().get(new GetRequest("my-index", Integer.toString(i))).actionGet();
205207
Map<String, Object> source = getResponse.getSourceAsMap();
206208
assertThat(source.size(), equalTo(2));
207-
assertThat(source.get("target"), equalTo(List.of(Map.of("key", "key", "value", "val" + i))));
209+
assertThat(source.get("target"), equalTo(Map.of("key", "key", "value", "val" + i)));
208210
}
209211
}
210212

x-pack/plugin/enrich/src/test/java/org/elasticsearch/xpack/enrich/EnrichMultiNodeIT.java

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -139,8 +139,7 @@ private static void enrich(List<String> keys, String coordinatingNode) {
139139
for (int i = 0; i < numDocs; i++) {
140140
GetResponse getResponse = client().get(new GetRequest("my-index", Integer.toString(i))).actionGet();
141141
Map<String, Object> source = getResponse.getSourceAsMap();
142-
List<?> entries = (List<?>) source.get("user");
143-
Map<?, ?> userEntry = (Map<?, ?>) entries.get(0);
142+
Map<?, ?> userEntry = (Map<?, ?>) source.get("user");
144143
assertThat(userEntry.size(), equalTo(DECORATE_FIELDS.length + 1));
145144
assertThat(keys.contains(userEntry.get(MATCH_FIELD)), is(true));
146145
for (String field : DECORATE_FIELDS) {

x-pack/plugin/enrich/src/test/java/org/elasticsearch/xpack/enrich/GeoMatchProcessorTests.java

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,13 @@ private void testBasicsForFieldValue(Object fieldValue, Geometry expectedGeometr
9090
assertThat(shapeQueryBuilder.shape(), equalTo(expectedGeometry));
9191

9292
// Check result
93-
List<?> entries = ingestDocument.getFieldValue("entry", List.class);
94-
Map<?, ?> entry = (Map<?, ?>) entries.get(0);
93+
Map<?, ?> entry;
94+
if (maxMatches == 1) {
95+
entry = ingestDocument.getFieldValue("entry", Map.class);
96+
} else {
97+
List<?> entries = ingestDocument.getFieldValue("entry", List.class);
98+
entry = (Map<?, ?>) entries.get(0);
99+
}
95100
assertThat(entry.size(), equalTo(2));
96101
assertThat(entry.get("zipcode"), equalTo(94040));
97102

x-pack/plugin/enrich/src/test/java/org/elasticsearch/xpack/enrich/MatchProcessorTests.java

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -206,10 +206,9 @@ public void testExistingNullFieldWithOverrideDisabled() throws Exception {
206206
}
207207

208208
public void testNumericValue() {
209-
int maxMatches = randomIntBetween(1, 8);
210209
MockSearchFunction mockSearch = mockedSearchFunction(Map.of(2, Map.of("globalRank", 451, "tldRank", 23, "tld", "co")));
211210
MatchProcessor processor =
212-
new MatchProcessor("_tag", mockSearch, "_name", "domain", "entry", false, true, "domain", maxMatches);
211+
new MatchProcessor("_tag", mockSearch, "_name", "domain", "entry", false, true, "domain", 1);
213212
IngestDocument ingestDocument =
214213
new IngestDocument("_index", "_id", "_routing", 1L, VersionType.INTERNAL, Map.of("domain", 2));
215214

@@ -227,20 +226,18 @@ public void testNumericValue() {
227226
assertThat(termQueryBuilder.value(), equalTo(2));
228227

229228
// Check result
230-
List<?> entries = ingestDocument.getFieldValue("entry", List.class);
231-
Map<?, ?> entry = (Map<?, ?>) entries.get(0);
229+
Map<?, ?> entry = ingestDocument.getFieldValue("entry", Map.class);
232230
assertThat(entry.size(), equalTo(3));
233231
assertThat(entry.get("globalRank"), equalTo(451));
234232
assertThat(entry.get("tldRank"), equalTo(23));
235233
assertThat(entry.get("tld"), equalTo("co"));
236234
}
237235

238236
public void testArray() {
239-
int maxMatches = randomIntBetween(1, 8);
240237
MockSearchFunction mockSearch =
241238
mockedSearchFunction(Map.of(List.of("1", "2"), Map.of("globalRank", 451, "tldRank", 23, "tld", "co")));
242239
MatchProcessor processor =
243-
new MatchProcessor("_tag", mockSearch, "_name", "domain", "entry", false, true, "domain", maxMatches);
240+
new MatchProcessor("_tag", mockSearch, "_name", "domain", "entry", false, true, "domain", 1);
244241
IngestDocument ingestDocument =
245242
new IngestDocument("_index", "_id", "_routing", 1L, VersionType.INTERNAL, Map.of("domain", List.of("1", "2")));
246243

@@ -260,8 +257,7 @@ public void testArray() {
260257
assertThat(termQueryBuilder.values().get(1), equalTo("2"));
261258

262259
// Check result
263-
List<?> entries = ingestDocument.getFieldValue("entry", List.class);
264-
Map<?, ?> entry = (Map<?, ?>) entries.get(0);
260+
Map<?, ?> entry = ingestDocument.getFieldValue("entry", Map.class);
265261
assertThat(entry.size(), equalTo(3));
266262
assertThat(entry.get("globalRank"), equalTo(451));
267263
assertThat(entry.get("tldRank"), equalTo(23));

0 commit comments

Comments
 (0)