-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add Runtime Fields Contexts to Painless Execute API #71374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-core-infra (Team:Core/Infra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but @nik9000 may have more thoughts.
@@ -105,7 +114,15 @@ private PainlessExecuteAction() { | |||
static final Map<String, ScriptContext<?>> SUPPORTED_CONTEXTS = Map.of( | |||
"painless_test", PainlessTestScript.CONTEXT, | |||
"filter", FilterScript.CONTEXT, | |||
"score", ScoreScript.CONTEXT); | |||
"score", ScoreScript.CONTEXT, | |||
"boolean_script_field_script_field", BooleanFieldScript.CONTEXT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 This "bug" with doubling the "script_field" suffix is unfortunate. WDYT about doing a hard break and fixing it? The only place the name is used is for stored scripts (but only in the api, to know which context to compile against) and in the node settings for script compilation limits, and with the latter it's not applicable here since unlimited is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We disallowed stored scripts for runtime fields when we first built them and never re-allowed them. So I think that's mostly safe. I'm quite ok renaming things. I hadn't realized these names got out to users. Also - thing were different when we picked the names. We thought there'd be more different kinds of runtime fields. We never renamed them later. Anyway, what do you think of boolean_field_script
? They are the script that runtime fields use for boolean fields and they are the script that we'll use to calculate boolean fields at index time. Same script context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is odd to make users type this, but I would take this a step forward. I am not entirely comfortable even asking users to type context: "boolean_script_field" when they want to test a script for a boolean runtime field. It seems like there is a gap between how a runtime field is simulated and when it gets used in practice. There may be other ways to achieve this, but would it be a possibility to expose some artificial "runtime_field" context instead, and have another way to specify the type, for instance as part of the context setup? Any other ideas on this?
BooleanFieldScript booleanFieldScript = leafFactory.newInstance(leafReaderContext); | ||
booleanFieldScript.setDocument(0); | ||
booleanFieldScript.execute(); | ||
return new Response(Map.of("trues", booleanFieldScript.trues(), "falses", booleanFieldScript.falses())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems odd but I realize it is a result of the existing output of BooleanFieldScript. @nik9000 What about using a list of values like the other runtime field types do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to @jdconrad about this yesterday and I don't have a great answer here. I like that BooleanFieldScript
can do what it wants, at least on the theoretical level. And I don't think it's worth modifying BooleanFieldScript
just for this API. The ordering of the emitted booleans isn't really important in any other context.
But I think it'd be entirely reasonable for this API to make a list out of the counts just to make it consistent. That is also consistent with the way doc values and the fields
API work for this.
I do think its nice if we give these scripts the freedom to change their minds about they accumulate values without it breaking folks. I can't think of how they might do that right now but that don't mean there aren't good ways. If you "work like doc values" in this API then we have a consistent way to send the results to the callers.
As it stands now the scripts preserve ordering of values for everything but booleans. But the rest of ES leaves us the option to change that ordering in the future. And I'd like to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just stumbled upon the same as part of adding support for script to BooleanFieldMapper and I went for doing what doc_values do, meaning returning false values first, then true values.Given that doc_values are resorted, and the fields API does not guarantee ordering, it looks like it is not needed to hold an array of values for this specific case. What we could do is expose a method that returns the values in whatever order we decide to return them, so that consumers don't have to do it themselves. And I would not expose in the response the trues vs falses concept, but rather expose an array of values, regardless of whether we have it internally or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I said, I don't think we should be exposing trues and falses here, but an array of values like we do for other types. I am adding a method to achieve the same here, can we do something similar?
@@ -105,7 +114,15 @@ private PainlessExecuteAction() { | |||
static final Map<String, ScriptContext<?>> SUPPORTED_CONTEXTS = Map.of( | |||
"painless_test", PainlessTestScript.CONTEXT, | |||
"filter", FilterScript.CONTEXT, | |||
"score", ScoreScript.CONTEXT); | |||
"score", ScoreScript.CONTEXT, | |||
"boolean_script_field_script_field", BooleanFieldScript.CONTEXT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We disallowed stored scripts for runtime fields when we first built them and never re-allowed them. So I think that's mostly safe. I'm quite ok renaming things. I hadn't realized these names got out to users. Also - thing were different when we picked the names. We thought there'd be more different kinds of runtime fields. We never renamed them later. Anyway, what do you think of boolean_field_script
? They are the script that runtime fields use for boolean fields and they are the script that we'll use to calculate boolean fields at index time. Same script context.
BooleanFieldScript booleanFieldScript = leafFactory.newInstance(leafReaderContext); | ||
booleanFieldScript.setDocument(0); | ||
booleanFieldScript.execute(); | ||
return new Response(Map.of("trues", booleanFieldScript.trues(), "falses", booleanFieldScript.falses())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to @jdconrad about this yesterday and I don't have a great answer here. I like that BooleanFieldScript
can do what it wants, at least on the theoretical level. And I don't think it's worth modifying BooleanFieldScript
just for this API. The ordering of the emitted booleans isn't really important in any other context.
But I think it'd be entirely reasonable for this API to make a list out of the counts just to make it consistent. That is also consistent with the way doc values and the fields
API work for this.
I do think its nice if we give these scripts the freedom to change their minds about they accumulate values without it breaking folks. I can't think of how they might do that right now but that don't mean there aren't good ways. If you "work like doc values" in this API then we have a consistent way to send the results to the callers.
As it stands now the scripts preserve ordering of values for everything but booleans. But the rest of ES leaves us the option to change that ordering in the future. And I'd like to keep it.
...nless/src/yamlRestTest/resources/rest-api-spec/test/painless/70_execute_painless_scripts.yml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of comments, the main one around how users will refer to the runtime field contexts.
factory.newFactory("boolean_runtime_field", request.getScript().getParams(), context.lookup()); | ||
BooleanFieldScript booleanFieldScript = leafFactory.newInstance(leafReaderContext); | ||
booleanFieldScript.setDocument(0); | ||
booleanFieldScript.execute(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we could call script.runForDoc(0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you address this please? does it make sense?
BooleanFieldScript booleanFieldScript = leafFactory.newInstance(leafReaderContext); | ||
booleanFieldScript.setDocument(0); | ||
booleanFieldScript.execute(); | ||
return new Response(Map.of("trues", booleanFieldScript.trues(), "falses", booleanFieldScript.falses())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just stumbled upon the same as part of adding support for script to BooleanFieldMapper and I went for doing what doc_values do, meaning returning false values first, then true values.Given that doc_values are resorted, and the fields API does not guarantee ordering, it looks like it is not needed to hold an array of values for this specific case. What we could do is expose a method that returns the values in whatever order we decide to return them, so that consumers don't have to do it themselves. And I would not expose in the response the trues vs falses concept, but rather expose an array of values, regardless of whether we have it internally or not.
@@ -105,7 +114,15 @@ private PainlessExecuteAction() { | |||
static final Map<String, ScriptContext<?>> SUPPORTED_CONTEXTS = Map.of( | |||
"painless_test", PainlessTestScript.CONTEXT, | |||
"filter", FilterScript.CONTEXT, | |||
"score", ScoreScript.CONTEXT); | |||
"score", ScoreScript.CONTEXT, | |||
"boolean_script_field_script_field", BooleanFieldScript.CONTEXT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is odd to make users type this, but I would take this a step forward. I am not entirely comfortable even asking users to type context: "boolean_script_field" when they want to test a script for a boolean runtime field. It seems like there is a gap between how a runtime field is simulated and when it gets used in practice. There may be other ways to achieve this, but would it be a possibility to expose some artificial "runtime_field" context instead, and have another way to specify the type, for instance as part of the context setup? Any other ideas on this?
@javanna I agree about not wanting the user to type out that full name -- we could add an alias as you suggest for the names of the runtime field contexts. Maybe something like
|
++ @jdconrad I would only rename runtime_type to type in the context options ;) |
Just a note that @alisonelizabeth has elastic/kibana#96424 ready to go for Painless Lab dependent on this PR. |
Users won't need to type this: that is entirely up to how Kibana exposes this. I do not think we should complicate the execute api (or any other scripting api) to make runtime fields special here. From the scripting level, runtime fields are many contexts. We need to expose those contexts. In fact, we still want to move the execute API out so any context can be included, programmatically, and doing so means associating the handling code for parsing the context setup to the named context. Having a non real "runtime_field" here would make that effort much more difficult, and unable to use the most natural api, keyed by the context. So I think we should fix the names, but not try to coerce them any farther than necessary. Note that the other contexts do not have the "field" suffix. For example, score scripts are just "score". But I'm not opposed to keep field in this case, just thought I would point out this isn't something user friendly exactly to begin with, it is an opaque id. |
@rjernst @javanna While we have used these as ids - I do wonder if it would make sense to also have aliases even given the added layer of complexity as part of the execute api. Although, I do agree that most users probably just go through Kibana and will never see this, it is still available as an external api. Maybe aliases are something we could add as a follow up. |
While the UI can help abstracting internal concepts away, I would prefer that we try to also simplify what concepts we expose to the direct users of our API. In this case, users know about runtime fields, and they specify a script and a type. They have no idea what a script context is and the fact that each type has its own. I don't think we should expose this internal detail in our API. I'd like to try and expose this one runtime field context for which you can specify the type as an additional configuration parameter. That is not perfect but close enough to how a runtime field is specified in the mapping or in the search request. @jimczi what do you think? |
This API is not for runtime fields. It is an existing API, that has already defined semantics. That is, the context key matches a script context. Users will not be calling this API directly, they will be using a UI which will make a request to it behind the scenes, hidden to them. @javanna Can you clarify why given those facts how the context is specified by Kibana would matter? |
To add some thoughts here in terms of the UI - I imagine this API will be used in two places in Kibana:
If we do feel strongly about having a "runtime fields" context, followed by a "type" selection, we could in theory implement that in the UI and derive the correct context behind the scenes when making the ES request. One hesitation I have though is it might be confusing to users who also look at our docs. // Note the TODOs are just placeholder text for now.
|
@elasticmachine run elasticsearch-ci/2 |
I appreciate that we can hide internal details through the UI, but I don't think that it is a good reason not to make our API easier to use when we can. Painless execute is a public API, it is documented and can be used directly, hence I think it is more user-friendly to expose one runtime field context instead of seven. That simplifies the docs and the UI as well (otherwise it has to maintain the mapping between type and required context, and so does every consumer of the API). I cannot comment on the effort required, but my expectation was that this would require a small conversion layer, which should not be on the way of opening the API up to any context later on. |
The API is currently a hardcoded list of handlers, through a chain of if/else if looking at the context value specified. This hardcoded list is what we want to eventually change, moving the API outside of painless, so that any context created for scripting automatically gets supported. The way that would work is through the registration of scripting contexts that we already have (either directly in the context definition, or indirectly in the ScriptingPlugin interface). Using something other than the context id here will make such a transition difficult: it would mean defining more indirection layers (some "virtual" context called runtime_field and how that maps to concrete contexts).
Yes it is, but this API is specifically for scripting. Runtime fields are one use of scripting. I am strongly against trying to hide details of how the scripting api works. Context, while only exposed in a few places, is already exposed to users. We do lack good documentation on which contexts are available, but this is the means by which we have to define the signature of a type of script. Runtime fields has several because there are different signatures, depending on the type of the field.
The amount of work involved is not my concern, per se. It is the overhead of this conversion layer for both developers and users. I find being direct is much simpler than adding indirection, especially when the vast majority of users will never see this, yet it would considerably complicate documentation (the API currently takes context, but now what is "context" if not the script context name?), and implementation (the indirection i already mentioned above). |
+1, I think it's inlined with the intent. We have strongly typed contexts for fields that produce values through a Painless script. |
@javanna Are you all right with us proceeding with specifying the full context name for now given this is how the execute api already operates, and we can revisit api specification for this at some point in the future if we get feedback that it's not working well? |
@jdconrad I am ok to not have a unified context for runtime fields and expose one per type. We were chatting with @jimczi and it turns out that the same contexts are also used for indexed fields that hold a script (see #68984), so using e.g. boolean_field would be good enough. Is then the plan to rename the existing contexts as a followup? |
@javanna Cool, thanks Luca! I will address the rest of Nik's and your requests to make sure the return values are in doc-values order. And I do intend to follow this up with the name change, but after speaking with @nik9000 and @jimczi, it's possible users have added stored scripts with runtime field contexts, even though they're unusable. I would like to switch the names while also adding code that prevents this from happening and drops any that have been stored already with some kind of warning in the logs. (Please do let me know if you see an obvious flaw in this plan.) |
Note that stored scripts don't store the context, it is only passed in to trigger compilation. So even if a stored script was passed in with the old name, changing the name does not break the script. It only breaks someone trying to use that name for stored scripts in the future. |
This changes all the script context names specifically for runtime fields to be *_field such as long_field and geo_point_field, etc. This change is internal detail that will only be exposed through the Painless execute API as part of (#71374) and should not have bwc issues. I tested this change locally on a mixed cluster to ensure scripts stored with the old runtime fields context names are both still retrievable and delete-able. This works because the context name is only used during the request to check for valid compilation, but never actually stored as part of the cluster state.
…c#71581) This changes all the script context names specifically for runtime fields to be *_field such as long_field and geo_point_field, etc. This change is internal detail that will only be exposed through the Painless execute API as part of (elastic#71374) and should not have bwc issues. I tested this change locally on a mixed cluster to ensure scripts stored with the old runtime fields context names are both still retrievable and delete-able. This works because the context name is only used during the request to check for valid compilation, but never actually stored as part of the cluster state.
#71586) This changes all the script context names specifically for runtime fields to be *_field such as long_field and geo_point_field, etc. This change is internal detail that will only be exposed through the Painless execute API as part of (#71374) and should not have bwc issues. I tested this change locally on a mixed cluster to ensure scripts stored with the old runtime fields context names are both still retrievable and delete-able. This works because the context name is only used during the request to check for valid compilation, but never actually stored as part of the cluster state.
…elastic#71599) This adds utility methods to each type of runtime field to return the results of a document in an ordered array based on the same order that doc values are ordered in. This is useful for supporting execute api in this elastic#71374.
@nik9000 Would you please take one last pass at this? This last set of commits has the following:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @javanna should have another look too.
@elasticmachine run elasticsearch-ci/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks @jdconrad !
Thank you for the all the feedback, @javanna! |
This change adds support for the 7 different runtime fields contexts to the Painless Execute API. Each context can accept the standard script input (source and params) along with a user-defined document and an index name to pull mappings from. The results depend on the output of the runtime field type. Closes #70467
This change adds support for the 7 different runtime fields contexts to the Painless Execute API. Each context can accept the standard script input (source and params) along with a user-defined document and an index name to pull mappings from. The results depend on the output of the runtime field type.
Closes #70467