Skip to content

Add enrich policy runner #41088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
May 2, 2019
Merged

Add enrich policy runner #41088

merged 29 commits into from
May 2, 2019

Conversation

jbaiera
Copy link
Member

@jbaiera jbaiera commented Apr 10, 2019

Note that this is a PR against the enrich branch and will be backported to enrich-7.x branch.

Adds a skeleton of the execution logic to execute an enrich policy. Validates the source index existence as well as mappings, creates a new enrich index for the policy, reindexes the source index into the new enrich index, and swaps the enrich alias for the policy to the new index.

WIP: Comments are welcome at this point. The code really needs some testing and will most likely be rebased on to some other PR's as they come in (namely #41003 and #40997), but I'd like to get some feedback before then. Good to go for review

@jbaiera jbaiera added >non-issue WIP :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP labels Apr 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@martijnvg
Copy link
Member

The general approach in the PR looks good to me.

@jbaiera jbaiera force-pushed the enrich-policy-runner branch from 90b124a to 6828919 Compare April 15, 2019 19:58
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few nits. This change is going well!

import static org.hamcrest.CoreMatchers.is;
import static org.hamcrest.CoreMatchers.nullValue;

public class EnrichPolicyRunnerTests extends ESSingleNodeTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great to test this without starting a node, but I'm not sure how :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was one of those cases where it felt like mocking out all of the dependencies needed was a bit much. Would be happy to change the test if there are any tools to make that easier, or if we feel that it's important enough that it doesn't spin up a node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is the right trade off. Otherwise the code for mocking becomes unmaintainable.

@jbaiera jbaiera removed the WIP label Apr 16, 2019
@jbaiera
Copy link
Member Author

jbaiera commented Apr 16, 2019

Removed WIP label, should be good for a proper review at this point

@jbaiera jbaiera requested a review from martijnvg April 16, 2019 20:29
"Enrich policy execution for [{}] failed. Could not locate enrich key field [{}] on mapping for index [{}]",
policyName, policy.getEnrichKey(), sourceIndex));
}
for (String enrichField : policy.getEnrichValues()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is OK if the not all of the values are present in the source index. Misses of the values should be handled by the processor.

However, we may want to log a warning if the same value is coming from multiple indexes since only 1 will win and it's not deterministic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards failing if a decorate field is missing in the mapping. I think this can lead to unexpected behaviour at enrich time? (why does this document not have all the enriched fields?)

Copy link
Contributor

@jakelandis jakelandis Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the mapping exists in the source index, the document we are using to enrich may not have that field and will need handle partial decorations in the processor.

If the decision to only allow full decorations, then the check here makes sense... but will need to changed slightly such that all enrich fields exist across the set of the source indexes (as opposed to exist in every source index as it is here).

I am in favor of removing this check and implement ignore_missing (or the like) in the processor to allow partial decorations.

client.execute(ReindexAction.INSTANCE, reindexRequest, new ActionListener<BulkByScrollResponse>() {
@Override
public void onResponse(BulkByScrollResponse bulkByScrollResponse) {
// Do we want to fail the request if there were failures during the reindex process?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with this API... does this get called once after the whole re-index is complete, or per scroll response ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this gets invoked after the whole reindex has been completed.

import static org.hamcrest.CoreMatchers.equalTo;
import static org.hamcrest.CoreMatchers.is;
import static org.hamcrest.CoreMatchers.nullValue;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests here are good for the happy case, but I would like to see more testing around the un-happy cases. Investments now in this test will greatly help future developers to quickly test changes.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another review round.

client.execute(ReindexAction.INSTANCE, reindexRequest, new ActionListener<BulkByScrollResponse>() {
@Override
public void onResponse(BulkByScrollResponse bulkByScrollResponse) {
// Do we want to fail the request if there were failures during the reindex process?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this gets invoked after the whole reindex has been completed.

"Enrich policy execution for [{}] failed. Could not read mapping for source [{}] included by pattern [{}]",
policyName, sourceIndex, policy.getIndexPattern()));
}
if (properties.containsKey(policy.getEnrichKey()) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably allow _id as the enrich key

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, though I think it's probably a good idea to put that into a separate PR. I don't think we should bog this one down with too many initial features.

@jbaiera
Copy link
Member Author

jbaiera commented Apr 18, 2019

I made the runner object a Runnable and added a basic EnrichPolicyExecutor implementation. I figured the management threadpool would be fine to execute these on. I also simplified the enrich mapping to just define the key, and rely on the eventual meta mapping type to be added later.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another review round. I like the split between EnrichPolicyExecutor and EnrichPolicyRunner.

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.fetchSource(retainFields.toArray(new String[0]), new String[0]);
if (policy.getQuery() != null) {
searchSourceBuilder.query(QueryBuilders.wrapperQuery(policy.getQuery().getQuery()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the policy query wrapped in a wrapper query?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the wrapper query since the EnrichPolicy.QuerySource returns the raw query json as a byte sequence. Is there a more appropriate way to convert a raw query body into a query builder?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, actually that does make sense to me :)

listener.onFailure(e);
}
});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also delete the old enrich index? Maybe not here and not in the pr, but just wondering. These old indices should be purged at some point in time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should definitely be subject to some kind of background cleanup task.

@jbaiera jbaiera requested review from jakelandis and martijnvg May 1, 2019 18:14
@jbaiera jbaiera merged commit a451292 into elastic:enrich May 2, 2019
@jbaiera jbaiera deleted the enrich-policy-runner branch May 2, 2019 16:38
jbaiera added a commit to jbaiera/elasticsearch that referenced this pull request May 2, 2019
Adds the foundation of the execution logic to execute an enrich policy. Validates
the source index existence as well as mappings, creates a new enrich index for
the policy, reindexes the source index into the new enrich index, and swaps the 
enrich alias for the policy to the new index.
jbaiera added a commit that referenced this pull request May 6, 2019
Backports #41088

Adds the foundation of the execution logic to execute an enrich policy. Validates
the source index existence as well as mappings, creates a new enrich index for
the policy, reindexes the source index into the new enrich index, and swaps the 
enrich alias for the policy to the new index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >non-issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants