-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Enrich processor: allow scheduling of policy executions #50071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-core-features (:Core/Features/Ingest) |
I would like to add my two cents here. my enrich processors matches on pair and sets the exchange.rate
my enrich policy
Since I am using transforms to dynamically build a small index, I would expect the Personally this would definitely be needed for indices that have frequent changes and/or transforms that run often. I know that enrich policy building is costly. Scheduling the policy update alone is definitively a nice feature, but there might be a window where the old data is already refreshed but the policy update has not run yet. Thus the documents are getting populated with old and maybe wrong data. |
Hey! Have any work already been done on this? |
It's been year 2022 😸 issue still open |
Would be great to get this feature! |
Hello, is there any update on this feature? Is this still being considered? |
+1 |
+1 |
1 similar comment
+1 |
I was able to work around this issue by creating a watcher that performs an http call to the cluster on a scheduled interval and re-executes the enrich policy. Initially when I did this, I was running into trouble with the execution failing periodically but that ended up being related to the auto_expand_replicas settings of the .enrich indices and our high disk utilization on one node. To get around that, I created an index template for .enrich indices and turned off auto_expand_replicas and setting the replica count to 1. The auto-execution now works like a charm! |
@kossde hello, can you share the watcher json configuration that you used? |
@leandrojmp: In the mean time i use a watcher like this. This is for using it wih ECE, but can easily changed for the use with different deployment methods.
|
Thanks @smnschneider! It is pretty similar to the one I was testing, but could not use in production yet because it would require a restart of the nodes to apply the http certificate configuration. In the end I'm using a simple script on crontab. |
The watcher I wrote ended up being very similar as well. We now have several different enrich policies running in our environment; much of which are periodically executed via watcher as new values come into the indices. These policies are so useful… it eludes me as to why there isn’t an easier way to auto-execute them. Anyway, I feel personally that there needs to be a way to apply trusted CA updates without whole cluster reboots. Maybe they could add an option to reapply the elasticsearch yaml or, at least parts of it, without forcing a full service restart. Surely it can’t be that difficult to set up a sort of configuration that lets us split the yaml into multiple files, some of which can be reloaded upon demand..? |
On version 7.x I was able to do it by increasing the priority of the index
template. In 8.x, though, the template refuses to apply. I was attempting
to do the same thing you are by decreasing number of replica shards. I
don’t have a solution to this, though, as even when I do get the template
to take, it reverts back back to placing shards on all nodes as soon as the
enrichment policy re-executes.
…On Wed, Sep 20, 2023 at 1:50 AM bil151515 ***@***.***> wrote:
Hello @kossde <https://github.com/kossde>
How do you manage to apply index template to the enrich indices?
I tried to a template like this but it won't apply to .enrich-*
{ "template": { "settings": { "index": { "lifecycle": { "name":
"enrichment" }, "routing": { "allocation": { "include": {
"_tier_preference": "data_content" } } }, "auto_expand_replicas": "false",
"number_of_replicas": "1" } }, "aliases": {}, "mappings": {} } }
—
Reply to this email directly, view it on GitHub
<#50071 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A7M5PUQIXXXK57GK4KUC6Z3X3KU5JANCNFSM4JZMTFCQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Pinging @elastic/es-data-management (Team:Data Management) |
+1 |
+1 |
+1 |
Scheduling would be great a feature. Additionally, we could incorporate a continuous execution function, similar to the one in transform. Although there may be some compromises, this would be particularly useful for small datasets. The ability of ESQL to enrich at query time further emphasizes the need for this feature. This suggestion could be associated with the partial update requests mentioned in the following issues: |
+1 |
+1 |
+1 |
Is there any roadmap of when or if this feature would be available? This could help solve some headaches we're having with managing watchers to simply update all our enrich policies. All our source indexes are being update constantly and having the enrich policy refresh it's index with the source every 1h/1d/1w would be very helpful. |
+1 |
+1 from the @elastic/security-entity-analytics team, we will be executingthe enrich policy we are using periodically from Kibana but it would be great if this was a built-in feature |
Before an enrich processor can be used, an enrich policy must be executed. When executed, an enrich policy uses enrich data from the policy’s source indices to create a streamlined system index called the enrich index.
The execution is executed manually by running
PUT /_enrich/policy/my-policy/_execute
, giving the user control on when the new data becomes part of the enriching policy.In cases where the policy’s source indices are constantly changing, the policy execution can also be scheduled.
Having the ability to schedule (say daily, hourly) the execution natively in elasticsearch would make it more approachable and would benefit this case of a constantly changing source indices.
The text was updated successfully, but these errors were encountered: