-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Geo-Match Enrich Processor #47243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geo-Match Enrich Processor #47243
Conversation
this commit introduces a geo-match enrich processor that looks up a specific `geo_point` field in the enrich-index for all entries that have a geo_shape match field that meets some specific relation criteria with the input field. For example, the enrich index may contain documents with zipcodes and their respective geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode they associate according to which shape they are contained within. this commit also refactors some of the MatchProcessor by moving a lot of the shared code to AbstractEnrichProcessor. Closes elastic#42639.
Pinging @elastic/es-core-features |
Pinging @elastic/es-analytics-geo |
Note to review:
Please let me know if this looks reasonable. If so, then I will continue to clean it up further, otherwise I will await further instruction to refactor! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @talevy! The current abstraction looks good. I left a few comments.
I was not sure how to extend EnrichPolicy to support passing in the shape_relation for the geo-enrich-processor without conflating the policy with things unrelated to Match
I think shape_relation should be a pipeline configuration option and not a policy configuration, because this setting doesn't affect the enrich index being created and it is merely a query parameter.
// No need to also configure index_options, because keyword type defaults to 'docs'. | ||
} else if (EnrichPolicy.GEO_MATCH_TYPE.equals(policy.getType())) { | ||
matchFieldMapping = (builder) -> builder.field("type", "geo_shape"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that geo_shape
field type uses doc values by default, right?
Do you know whether the default is going to change in the future?
I just want to make sure we never store more than what is needed for enrich. (since doc values are not being used for querying)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is that since doc-values aren't supported, neither is doc-values: false
. I was debating whether this should be fixed upstream so that one is allowed to set that. Currently, it errors out if you attempt to set it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will follow-up in another PR to expose doc-values
parsing to geo_shape
. I think it should understand that parameter, even though it only allows it to be false
. For now, I think this is the best that can be done. Thankfully, I know when doc-values on shapes will be supported so I will be sure to update this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 That sounds good to me.
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichPolicyRunner.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/GeoMatchProcessor.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/AbstractEnrichProcessor.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/AbstractEnrichProcessor.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichProcessorFactory.java
Show resolved
Hide resolved
run elasticsearch-ci/bwc |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for working on this!
assertThat(statsResponse.getCoordinatorStats().size(), equalTo(1)); | ||
String localNodeId = getInstanceFromNode(ClusterService.class).localNode().getId(); | ||
assertThat(statsResponse.getCoordinatorStats().get(0).getNodeId(), equalTo(localNodeId)); | ||
assertThat(statsResponse.getCoordinatorStats().get(0).getRemoteRequestsTotal(), greaterThanOrEqualTo(1L)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this should be equalTo(1)
, since only 1 index request is ingested.
This test suite also restarts the node between each test, so this should be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good catch. didn't update this.
I've only added support for |
run elasticsearch-ci/2 |
this commit introduces a geo-match enrich processor that looks up a specific `geo_point` field in the enrich-index for all entries that have a geo_shape match field that meets some specific relation criteria with the input field. For example, the enrich index may contain documents with zipcodes and their respective geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode they associate according to which shape they are contained within. this commit also refactors some of the MatchProcessor by moving a lot of the shared code to AbstractEnrichProcessor. Closes elastic#42639.
this commit introduces a geo-match enrich processor that looks up a specific `geo_point` field in the enrich-index for all entries that have a geo_shape match field that meets some specific relation criteria with the input field. For example, the enrich index may contain documents with zipcodes and their respective geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode they associate according to which shape they are contained within. this commit also refactors some of the MatchProcessor by moving a lot of the shared code to AbstractEnrichProcessor. Closes #42639.
@talevy This merged into |
@polyfractal I think the fact that #48039 is versioned, is sufficient? Also PRs on feature branches are typically not versioned and labelled |
@martijnvg 👍 makes sense, that works for me. Was just trying to hunt down if this made it as part of the enrich feature or not, since it was also being tracked semi-independent of the enrich feature. I think I was also a bit confused on what it did too... thought it was a separate processor, rather than being a policy of enrich. So that makes more sense now too :) |
what Martijn said :) |
Geo-Match Enrich Processor
this commit introduces a geo-match enrich processor that looks up a specific
geo_point
field in the enrich-index for all entries that have a geo_shape match fieldthat meets some specific relation criteria with the input field.
For example, the enrich index may contain documents with zipcodes and their respective
geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode
they associate according to which shape they are contained within.
this commit also refactors some of the MatchProcessor by moving a lot of the shared code to
AbstractEnrichProcessor.
Closes #42639.