-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Ingest: Add conditional per processor #32398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-core-infra |
Still WIP, missing tests and some cleanup. I'd just like to get the ok from everyone that this is what we're looking for. With this one you can now add a field E.g.:
->
|
@original-brownbear yup. that is what I had in mind! thanks! I was wondering why we don't extend AbstractProcessor to understand conditionals and bake it into the framework instead of conditionally overriding the type to be "conditional", I feel like it should have the conditional support by default, but only invoked if the |
Yea that sounds nice in theory (though just in terms of code quality performance wise it probably doesn't matter in any way, either way you'll have the same number of interface(ish) calls in there).
Yea this, but worse yet, you'd have to change the => I think wrapping the processor as the implementation is probably the smallest/safest change we're going to get here. |
true, I didn't even think it would 😄 OK, this sounds good. One thing we should double-check is that when exceptions occurs in either the internal or conditional (if), the correct processor type is shown, since both the wrapping conditional and the inner processor share the same tag. |
The general approach here seems ok. My main concern before doing a deeper review is the object passed into the conditional script needs to be immutable. I think we should look at what variables are available too, as ctx only exists for legacy reasons (because that is what update scripts exposed when ingest was added). |
@rjernst why don't we just expose a new kind of interface there instead of |
I like the 'if' syntax and general direction. I share @talevy 's concern over handling messaging. For example today, if you change your example from
It's a bit misleading where the error is happening due to the
If we add per processor metrics, there would be a similar concern to accurately represent the combination of the inner processor and outer processor in the same metric. I am abit concerned by allowing arbitrary processing inside the if condition. Meaning the if condition can be much more then just a true/false single expression evaluation. When you combine this change with the ability to call other processors from scripting it can compound this concern ? For example, you could call grok from inside an If we don't want to allow that kind of arbitrary processing , could implement a much a simpler dsl that only allows single expression boolean evaluations ? Perhaps a custom subset of painless ? Also, since we are using scripting for the true/false evaluation, would we also support alternative scripting languages ? |
I think this isn't an issue afaik the bucket selector aggregation scripts will behave the same way.
We can also use the other languages here,
I'm a little torn here too. Obviously something like the LS config "language" has a much lower barrier to entry and is really short to write. Then again ... we already have the ES scripting, it's super powerful and existing users are probably proficient in it to a degree too. That said, one strong argument I would have for "this is a good thing" is that when looking at LS you often see the pattern of:
... those cases would become a lot simpler to implement on the user side with the flexibility of Painless.
As far as I understand the code this would be very hard. Also, it becomes a very strange situation where you're essentially just arbitrarily taking away flexibility from the user (without making our lives easier in return, if anything constraining what subset of Painless or the other languages we're supporting is more effort than just outright using them). @rjernst what do you think about moving forward with wrapping the |
IMO we should think about what we want the variables to be for the ingest script context long term. If ctx is what we want, then keeping it in conditional ingest scripts is fine. But if we think another script signature would work better long term, then the new conditional script should match the future signature we want. This will make it easier to migrate to the new signature, rather than adding new uses of |
fair point :) IMO, with the way we are approaching the calling of processor from scripts as of right now (having static methods that do what the processors do as opposed to setting up and invoking actual |
@rjernst added the lazy wrapping (urgh, that's quite a bit of code to handle all the cases that we have to look out for because they could leak a mutable view of the Can you take a quick look if that's an approach you're ok with? If you're ok with it then I think we only need some tests for the lazy wrapping (+ whatever your my find) here :) |
@original-brownbear - apologizes for the late response... I just made this correlation.
Beats already has a form of this implemented: https://www.elastic.co/guide/en/beats/filebeat/master/defining-processors.html Is it worth exploring emulating that syntax (in Json of course) ? It removes concerns (and features) of arbitrary processing and eliminates the need for immutable wrappers while providing a familiar dsl. |
I (~50%) agree it may be worth exploring a simpler language :) The upsides definitely are:
The downsides are:
|
I don't think we should have another language. This is a huge burden to maintain. We have been trying very hard to reduce the number of languages we must support within elasticsearch. For example, we got rid of groovy, python, and javascript scripting languages, and we have been working towards making expressions on par performance-wise with expressions so we can remove that too. IMO the overhead of wrapping these objects to make them immutable should be small, and the consistency of interacting with documents in the same way across conditionals or general script processors is hugely beneficial. |
@rjernst sweet, so add tests + be happy with this? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @original-brownbear. Tests for the immutability would be good. I left a few more comments as well.
@Override | ||
public void execute(IngestDocument ingestDocument) throws Exception { | ||
if (scriptService.compile(condition, ProcessorConditionalScript.CONTEXT) | ||
.newInstance(condition.getParams()).execute(new UnmodifiableIngestData(ingestDocument.getSourceAndMetadata()))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break this long line into compiling the script in a separate line from executing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure will do :)
@@ -82,6 +82,7 @@ public IngestCommonPlugin() { | |||
processors.put(KeyValueProcessor.TYPE, new KeyValueProcessor.Factory()); | |||
processors.put(URLDecodeProcessor.TYPE, new URLDecodeProcessor.Factory()); | |||
processors.put(BytesProcessor.TYPE, new BytesProcessor.Factory()); | |||
processors.put(ConditionalProcessor.TYPE, new ConditionalProcessor.Factory(parameters.scriptService)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were only exposing the conditional via each processor as if
, not as a processor on its own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #32398 (comment) and discussion leading up to it. This is just an implementation to prevent us from having to adjust every processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't understand. Can you reiterate? I don't understand why using it from parsing if
requires it be generally available. As it is here, it would be available to construct directly by a user, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me try to make it short:
In order to get the conditional we have to either add a conditional evaluation to every execute
method for every processor or make some abstract parent have an execute
implementation that holds the conditional evaluation. (I mean there's other options, but I think the two mentioned are the simplest and others will be even more noisy/risky/...)
Either option requires us to change all existing processors and also their factories (as a result of us parsing the configuration in each processor factory).
=> I went for this implementation since it was shortest and didn't have any functional/performance downsides over alternatives anyway (as far as I can see).
As it is here, it would be available to construct directly by a user, right?
Yea true, if you think that's a problem I can prevent that in the parser easily though :) Should I?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we should prevent that. But I'm still not understanding why it is necessary for this processor to have a factory or be registered. It can be constructed completely locally, and directly via its ctor, within ConfigurationUtils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst that's more of a convenience thing because org.elasticsearch.ingest.ConfigurationUtils#readProcessor(java.util.Map<java.lang.String,org.elasticsearch.ingest.Processor.Factory>, java.lang.String, java.lang.Object)
(which is called from like 3 places in prod. code) would to also get the script factory as an input then (which will trigger a pretty big change if you factor in test code).
Just tried to keep this less noisy again :) => bad idea?, better to add the script factory as an input here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With what I'm suggesting, the script factory is not needed. I don't think the ConditionalProcessor should be constructed from the script factory at all. Instead, have maybeWrapConditional
or something like that which takes the processor we have already constructed, and creates/wraps it with a conditional processor (or returns the input if there is no conditional). There is no reason to have a factory for the conditional processor, just parse the config directly in that method, and construct there based on the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst you will still need the ScriptFactory
as a method input to org.elasticsearch.ingest.ConfigurationUtils#readProcessorConfigs
in some form or another won't you?
I 100% agree, that the current solution looks kind of convoluted :), but it was the only way I saw out of blowing up the change-set with passing the ScriptFactory
into the methods inside ConfigurationUtils
.
Looking at your discussion so far it seems it's probably preferable to go with the bigger changeset + cleaner solution I guess? :)
Sorry again about the confusion I caused here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right, that method would have to take ScriptFactory. But I think this makes sense? And I think the signature changes should be mostly minimal? Maybe I am underestimating the extent, but I think it is fairly well contained to a handful of methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first tried it, it looked like a fairly massive change :) (but this is also really subjective).
I'll just code it up today and we can take a look, the result is def. nicer than what we have here ... so probably worth it
/** | ||
* A script used by the Ingest Script Processor. | ||
*/ | ||
public abstract class ProcessorConditionalScript { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this IngestConditionalScript?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure :)
} | ||
|
||
private static Object wrapUnmodifiable(Object raw) { | ||
if (raw instanceof Map) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here that these types must match what the json parser can create, and that anything not handled here must be immutable already (eg boxed numerics)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure :)
@rjernst alright, handled this by passing down the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @original-brownbear! LGTM
} catch (Exception e) { | ||
throw newConfigurationException(type, tag, null, e); | ||
} | ||
} | ||
throw newConfigurationException(type, tag, null, "No processor type exists with name [" + type + "]"); | ||
} | ||
|
||
private static Script maybeExtractConditional(Map<String, Object> config) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you are returning the script, I think this can just be called extractConditional
LoggingDeprecationHandler.INSTANCE, stream)) { | ||
return Script.parse(parser); | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for an else, it can just be outside the if
@rjernst thanks! Will merge once green :) |
* Ingest: Add conditional per processor * closes elastic#21248
if
setting to all processors in a pipeline