Skip to content

Enrich processor: allow scheduling of policy executions #50071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #101654
tahaderouiche opened this issue Dec 11, 2019 · 26 comments
Open
Tracked by #101654

Enrich processor: allow scheduling of policy executions #50071

tahaderouiche opened this issue Dec 11, 2019 · 26 comments
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team

Comments

@tahaderouiche
Copy link

Before an enrich processor can be used, an enrich policy must be executed. When executed, an enrich policy uses enrich data from the policy’s source indices to create a streamlined system index called the enrich index.
The execution is executed manually by running PUT /_enrich/policy/my-policy/_execute, giving the user control on when the new data becomes part of the enriching policy.

In cases where the policy’s source indices are constantly changing, the policy execution can also be scheduled.

Having the ability to schedule (say daily, hourly) the execution natively in elasticsearch would make it more approachable and would benefit this case of a constantly changing source indices.

@tahaderouiche tahaderouiche added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Dec 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Ingest)

@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@philippkahr
Copy link
Contributor

philippkahr commented May 12, 2021

I would like to add my two cents here.
Let's assume the following example with currencies and exchange.rates

my enrich processors matches on pair and sets the exchange.rate
I have the following indices

index: exchange-rate
contains {pair : EURO-DOLLAR and exchange.rate: 1.1}
are updated all 30 minutes using filebeat.
index: latest-exchange-rate
contains: {pair : .... } as above
are updated every hour using transforms.
it only includes the latest values for `pair`
index: currencies
contains { pair: EURO-DOLLAR }
which uses an ingest pipeline with an enrich to populate the `exchange.rate` value.

my enrich policy append-exchange-rate

{
  "match": {
    "indices": "latest-exchange-rate",
    "match_field": "pair",
    "enrich_fields": ["exchange.rate"]
  }
}

Since I am using transforms to dynamically build a small index, I would expect the enrich processor to pick up the change to the original index and do the POST /_enrich/policy/append-exchange-rate/_execute automagically.

Personally this would definitely be needed for indices that have frequent changes and/or transforms that run often. I know that enrich policy building is costly.

Scheduling the policy update alone is definitively a nice feature, but there might be a window where the old data is already refreshed but the policy update has not run yet. Thus the documents are getting populated with old and maybe wrong data.

@ar-mi
Copy link

ar-mi commented Sep 17, 2021

Hey! Have any work already been done on this?

@puppylpg
Copy link
Contributor

puppylpg commented Jul 15, 2022

It's been year 2022 😸 issue still open

@smnschneider
Copy link

Would be great to get this feature!

@leandrojmp
Copy link
Contributor

Hello, is there any update on this feature? Is this still being considered?

@timor-raiman
Copy link

+1
Alternatively - #58925

@Rick25-dev
Copy link

+1

1 similar comment
@Rick25-dev
Copy link

+1

@kossde
Copy link

kossde commented Apr 24, 2023

I was able to work around this issue by creating a watcher that performs an http call to the cluster on a scheduled interval and re-executes the enrich policy.

Initially when I did this, I was running into trouble with the execution failing periodically but that ended up being related to the auto_expand_replicas settings of the .enrich indices and our high disk utilization on one node. To get around that, I created an index template for .enrich indices and turned off auto_expand_replicas and setting the replica count to 1. The auto-execution now works like a charm!

@leandrojmp
Copy link
Contributor

@kossde hello, can you share the watcher json configuration that you used?

@smnschneider
Copy link

@leandrojmp: In the mean time i use a watcher like this. This is for using it wih ECE, but can easily changed for the use with different deployment methods.

{
  "trigger": {
    "schedule": {
      "interval": "1d"
    }
  },
  "condition" : {
    "always" : {}
  },
  "actions": {
    "webhook-execute_enrich_policy": {
      "webhook": {
        "scheme": "https",
        "host": "1.2.3.4",
        "port": 9243,
        "method": "PUT",
        "path": "/_enrich/policy/<enrich-policy>/_execute",
        "params": {
            "wait_for_completion": "false"
        },
        "headers": {
          "X-found-cluster": "<cluster-id>"
        },
        "auth": {
          "basic": {
            "username": "<enrich_executer>",
            "password": "<enrich_executer_password>"
          }
        }
      }
    }
  }
}

@leandrojmp
Copy link
Contributor

Thanks @smnschneider!

It is pretty similar to the one I was testing, but could not use in production yet because it would require a restart of the nodes to apply the http certificate configuration.

In the end I'm using a simple script on crontab.

@kossde
Copy link

kossde commented Aug 1, 2023

The watcher I wrote ended up being very similar as well. We now have several different enrich policies running in our environment; much of which are periodically executed via watcher as new values come into the indices. These policies are so useful… it eludes me as to why there isn’t an easier way to auto-execute them.

Anyway, I feel personally that there needs to be a way to apply trusted CA updates without whole cluster reboots. Maybe they could add an option to reapply the elasticsearch yaml or, at least parts of it, without forcing a full service restart. Surely it can’t be that difficult to set up a sort of configuration that lets us split the yaml into multiple files, some of which can be reloaded upon demand..?

@kossde
Copy link

kossde commented Sep 20, 2023 via email

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@matabar
Copy link

matabar commented Jan 18, 2024

+1

@carlopuri
Copy link

+1
I've different situations where the opportunity to schedule an auto policy execution will solve lot of management processes doing by human (me...).
I've different policies to run and maintain, but having this task to run manually, it's over complicating a simple and well designed process like the index enrichment....
Please add this feature

@dnegrescu
Copy link

+1

@clement-fouque
Copy link

Scheduling would be great a feature. Additionally, we could incorporate a continuous execution function, similar to the one in transform. Although there may be some compromises, this would be particularly useful for small datasets.

The ability of ESQL to enrich at query time further emphasizes the need for this feature.

This suggestion could be associated with the partial update requests mentioned in the following issues:

@supu2
Copy link

supu2 commented Apr 11, 2024

+1
Pinging @elastic/es-data-management (Team:Data Management)

@Requium
Copy link

Requium commented Apr 23, 2024

+1

@dominicbirch
Copy link

+1

@morgan-atwood
Copy link

Is there any roadmap of when or if this feature would be available? This could help solve some headaches we're having with managing watchers to simply update all our enrich policies. All our source indexes are being update constantly and having the enrich policy refresh it's index with the source every 1h/1d/1w would be very helpful.

@webbersharhan
Copy link

+1

@hop-dev
Copy link
Contributor

hop-dev commented Sep 18, 2024

+1 from the @elastic/security-entity-analytics team, we will be executingthe enrich policy we are using periodically from Kibana but it would be great if this was a built-in feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests