Skip to content

Change how max_matches affects target_field option. #47982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 14, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 7 additions & 9 deletions docs/reference/ingest/apis/enrich/put-enrich-policy.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -322,15 +322,13 @@ The API returns the following response:
"_seq_no": 55,
"_primary_term": 1,
"_source": {
"geo_data": [
{
"location": {
"type": "envelope",
"coordinates": [[13.0, 53.0], [14.0, 52.0]]
},
"postal_code": "96598"
}
],
"geo_data": {
"location": {
"type": "envelope",
"coordinates": [[13.0, 53.0], [14.0, 52.0]]
},
"postal_code": "96598"
},
"first_name": "Mardy",
"last_name": "Brown",
"geo_location": "POINT (13.5 52.5)"
Expand Down
57 changes: 30 additions & 27 deletions docs/reference/ingest/enrich.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -167,42 +167,47 @@ PUT /_ingest/pipeline/user_lookup
"enrich" : {
"policy_name": "users-policy",
"field" : "email",
"target_field": "user"
"target_field": "user",
"max_matches": "1"
}
}
]
}
----
// TEST[continued]

Because the enrich policy type is `match`,
the enrich processor matches incoming documents
to documents in the enrich index
based on match field values.
The enrich processor then appends the enrich field data
from matching documents in the enrich index
to the target field of incoming documents.

Because the `max_matches` option for the enrich processor is `1`,
the enrich processor appends the data from only the best matching document
to each incoming document's target field as an object.

If the `max_matches` option were greater than `1`,
the processor could append data from up to the `max_matches` number of documents
to the target field as an array.

If the incoming document matches no documents in the enrich index,
the processor appends no data.

You also can add other <<ingest-processors,processors>>
to your ingest pipeline.
You can use these processors to change or drop incoming documents
based on your criteria.

See <<ingest-processors>> for a list of built-in processors.


[float]
[[ingest-enrich-docs]]
==== Ingest and enrich documents

Index incoming documents using your ingest pipeline.

Because the enrich policy type is `match`,
the enrich processor matches incoming documents
to documents in the enrich index
based on match field values.
The processor then appends the enrich field data
from any matching document in the enrich index
to target field of the incoming document.

The enrich processor appends all data to the target field as an array.
If the incoming document matches more than one document in the enrich index,
the processor appends data from those documents to the array.

If the incoming document matches no documents in the enrich index,
the processor appends no data.

The following <<docs-index_,index API>> request uses the ingest pipeline
to index a document
containing the `email` field
Expand Down Expand Up @@ -239,16 +244,14 @@ The API returns the following response:
"_seq_no": 55,
"_primary_term": 1,
"_source": {
"user": [
{
"email": "[email protected]",
"first_name": "Mardy",
"last_name": "Brown",
"zip": 70116,
"city": "New Orleans",
"state": "LA"
}
],
"user": {
"email": "[email protected]",
"first_name": "Mardy",
"last_name": "Brown",
"zip": 70116,
"city": "New Orleans",
"state": "LA"
},
"email": "[email protected]"
}
}
Expand Down
15 changes: 3 additions & 12 deletions docs/reference/ingest/processors/enrich.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,9 @@ check out the <<ingest-enriching-data,tutorial>> to get familiar with enrich pol
| `field` | yes | - | The field in the input document that matches the policies match_field used to retrieve the enrichment data.
| `target_field` | yes | - | The field that will be used for the enrichment data.
| `ignore_missing` | no | false | If `true` and `field` does not exist, the processor quietly exits without modifying the document
| `override` | no | true | If processor will update fields with pre-existing non-null-valued field. When set to `false`, such fields will not be touched.
| `max_matches` | no | 1 | The maximum number of matched documents to include under the configured target field. In order to avoid documents getting too large, the maximum allowed value is 128.
| `shape_relation` | no | `INTERSECTS` a| Spatial relation operator
used to match the <<geo-shape,geo_shape>> of incoming documents
to documents in the enrich index.
+
This option is only used for `geo_match` enrich policy types.
+
The <<spatial-strategy, geo_shape strategy>> mapping parameter determines
which spatial relation operators are availlble.
See <<_spatial_relations>>
for operators and more information.
| `override` | no | true | If processor will update fields with pre-existing non-null-valued field. When set to `false`, such fields will not be touched.
| `max_matches` | no | 1 | The maximum number of matched documents to include under the configured target field. The `target_field` will be turned into a json array if `max_matches` is higher than 1, otherwise `target_field` will become a json object. In order to avoid documents getting too large, the maximum allowed value is 128.
| `shape_relation` | no | `INTERSECTS` | A spatial relation operator used to match the <<geo-shape,geo_shape>> of incoming documents to documents in the enrich index. This option is only used for `geo_match` enrich policy types. The <<spatial-strategy, geo_shape strategy>> mapping parameter determines which spatial relation operators are available. See <<_spatial_relations>> for operators and more information.

include::common-options.asciidoc[]
|======
Original file line number Diff line number Diff line change
Expand Up @@ -80,13 +80,12 @@ private void setupGenericLifecycleTest(boolean deletePipeilne) throws Exception
// Check if document has been enriched
Request getRequest = new Request("GET", "/my-index/_doc/1");
Map<String, Object> response = toMap(client().performRequest(getRequest));
List<?> entries = (List<?>) ((Map<?, ?>) response.get("_source")).get("entry");
Map<?, ?> _source = (Map<?, ?>) entries.get(0);
assertThat(_source.size(), equalTo(4));
assertThat(_source.get("host"), equalTo("elastic.co"));
assertThat(_source.get("tld"), equalTo("co"));
assertThat(_source.get("globalRank"), equalTo(25));
assertThat(_source.get("tldRank"), equalTo(7));
Map<?, ?> entry = (Map<?, ?>) ((Map<?, ?>) response.get("_source")).get("entry");
assertThat(entry.size(), equalTo(4));
assertThat(entry.get("host"), equalTo("elastic.co"));
assertThat(entry.get("tld"), equalTo("co"));
assertThat(entry.get("globalRank"), equalTo(25));
assertThat(entry.get("tldRank"), equalTo(7));

if (deletePipeilne) {
// delete the pipeline so the policies can be deleted
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,12 +96,17 @@ public void execute(IngestDocument ingestDocument, BiConsumer<IngestDocument, Ex
}

if (overrideEnabled || ingestDocument.hasField(targetField) == false) {
List<Map<String, Object>> enrichDocuments = new ArrayList<>(searchHits.length);
for (SearchHit searchHit : searchHits) {
Map<String, Object> enrichDocument = searchHit.getSourceAsMap();
enrichDocuments.add(enrichDocument);
if (maxMatches == 1) {
Map<String, Object> firstDocument = searchHits[0].getSourceAsMap();
ingestDocument.setFieldValue(targetField, firstDocument);
} else {
List<Map<String, Object>> enrichDocuments = new ArrayList<>(searchHits.length);
for (SearchHit searchHit : searchHits) {
Map<String, Object> enrichDocument = searchHit.getSourceAsMap();
enrichDocuments.add(enrichDocument);
}
ingestDocument.setFieldValue(targetField, enrichDocuments);
}
ingestDocument.setFieldValue(targetField, enrichDocuments);
}
handler.accept(ingestDocument, null);
});
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -153,9 +153,11 @@ public void testIngestDataWithGeoMatchProcessor() {

GetResponse getResponse = client().get(new GetRequest("my-index", "_id")).actionGet();
Map<String, Object> source = getResponse.getSourceAsMap();
List<?> entries = (List<?>) source.get("enriched");
Map<?, ?> entries = (Map) source.get("enriched");
assertThat(entries, notNullValue());
assertThat(entries.size(), equalTo(1));
assertThat(entries.size(), equalTo(2));
assertThat(entries.containsKey(matchField), is(true));
assertThat(entries.get(enrichField), equalTo("94040"));

EnrichStatsAction.Response statsResponse =
client().execute(EnrichStatsAction.INSTANCE, new EnrichStatsAction.Request()).actionGet();
Expand Down Expand Up @@ -204,7 +206,7 @@ public void testMultiplePolicies() {
GetResponse getResponse = client().get(new GetRequest("my-index", Integer.toString(i))).actionGet();
Map<String, Object> source = getResponse.getSourceAsMap();
assertThat(source.size(), equalTo(2));
assertThat(source.get("target"), equalTo(List.of(Map.of("key", "key", "value", "val" + i))));
assertThat(source.get("target"), equalTo(Map.of("key", "key", "value", "val" + i)));
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -139,8 +139,7 @@ private static void enrich(List<String> keys, String coordinatingNode) {
for (int i = 0; i < numDocs; i++) {
GetResponse getResponse = client().get(new GetRequest("my-index", Integer.toString(i))).actionGet();
Map<String, Object> source = getResponse.getSourceAsMap();
List<?> entries = (List<?>) source.get("user");
Map<?, ?> userEntry = (Map<?, ?>) entries.get(0);
Map<?, ?> userEntry = (Map<?, ?>) source.get("user");
assertThat(userEntry.size(), equalTo(DECORATE_FIELDS.length + 1));
assertThat(keys.contains(userEntry.get(MATCH_FIELD)), is(true));
for (String field : DECORATE_FIELDS) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,13 @@ private void testBasicsForFieldValue(Object fieldValue, Geometry expectedGeometr
assertThat(shapeQueryBuilder.shape(), equalTo(expectedGeometry));

// Check result
List<?> entries = ingestDocument.getFieldValue("entry", List.class);
Map<?, ?> entry = (Map<?, ?>) entries.get(0);
Map<?, ?> entry;
if (maxMatches == 1) {
entry = ingestDocument.getFieldValue("entry", Map.class);
} else {
List<?> entries = ingestDocument.getFieldValue("entry", List.class);
entry = (Map<?, ?>) entries.get(0);
}
assertThat(entry.size(), equalTo(2));
assertThat(entry.get("zipcode"), equalTo(94040));

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -206,10 +206,9 @@ public void testExistingNullFieldWithOverrideDisabled() throws Exception {
}

public void testNumericValue() {
int maxMatches = randomIntBetween(1, 8);
MockSearchFunction mockSearch = mockedSearchFunction(Map.of(2, Map.of("globalRank", 451, "tldRank", 23, "tld", "co")));
MatchProcessor processor =
new MatchProcessor("_tag", mockSearch, "_name", "domain", "entry", false, true, "domain", maxMatches);
new MatchProcessor("_tag", mockSearch, "_name", "domain", "entry", false, true, "domain", 1);
IngestDocument ingestDocument =
new IngestDocument("_index", "_id", "_routing", 1L, VersionType.INTERNAL, Map.of("domain", 2));

Expand All @@ -227,20 +226,18 @@ public void testNumericValue() {
assertThat(termQueryBuilder.value(), equalTo(2));

// Check result
List<?> entries = ingestDocument.getFieldValue("entry", List.class);
Map<?, ?> entry = (Map<?, ?>) entries.get(0);
Map<?, ?> entry = ingestDocument.getFieldValue("entry", Map.class);
assertThat(entry.size(), equalTo(3));
assertThat(entry.get("globalRank"), equalTo(451));
assertThat(entry.get("tldRank"), equalTo(23));
assertThat(entry.get("tld"), equalTo("co"));
}

public void testArray() {
int maxMatches = randomIntBetween(1, 8);
MockSearchFunction mockSearch =
mockedSearchFunction(Map.of(List.of("1", "2"), Map.of("globalRank", 451, "tldRank", 23, "tld", "co")));
MatchProcessor processor =
new MatchProcessor("_tag", mockSearch, "_name", "domain", "entry", false, true, "domain", maxMatches);
new MatchProcessor("_tag", mockSearch, "_name", "domain", "entry", false, true, "domain", 1);
IngestDocument ingestDocument =
new IngestDocument("_index", "_id", "_routing", 1L, VersionType.INTERNAL, Map.of("domain", List.of("1", "2")));

Expand All @@ -260,8 +257,7 @@ public void testArray() {
assertThat(termQueryBuilder.values().get(1), equalTo("2"));

// Check result
List<?> entries = ingestDocument.getFieldValue("entry", List.class);
Map<?, ?> entry = (Map<?, ?>) entries.get(0);
Map<?, ?> entry = ingestDocument.getFieldValue("entry", Map.class);
assertThat(entry.size(), equalTo(3));
assertThat(entry.get("globalRank"), equalTo(451));
assertThat(entry.get("tldRank"), equalTo(23));
Expand Down