Add Runtime Fields Contexts to Painless Execute API #71374

jdconrad · 2021-04-06T21:09:42Z

This change adds support for the 7 different runtime fields contexts to the Painless Execute API. Each context can accept the standard script input (source and params) along with a user-defined document and an index name to pull mappings from. The results depend on the output of the runtime field type.

Closes #70467

elasticmachine · 2021-04-06T21:09:48Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

rjernst

LGTM but @nik9000 may have more thoughts.

rjernst · 2021-04-06T23:49:03Z

...les/lang-painless/src/main/java/org/elasticsearch/painless/action/PainlessExecuteAction.java

@@ -105,7 +114,15 @@ private PainlessExecuteAction() {
        static final Map<String, ScriptContext<?>> SUPPORTED_CONTEXTS = Map.of(
                "painless_test", PainlessTestScript.CONTEXT,
                "filter", FilterScript.CONTEXT,
-                "score", ScoreScript.CONTEXT);
+                "score", ScoreScript.CONTEXT,
+                "boolean_script_field_script_field", BooleanFieldScript.CONTEXT,


@nik9000 This "bug" with doubling the "script_field" suffix is unfortunate. WDYT about doing a hard break and fixing it? The only place the name is used is for stored scripts (but only in the api, to know which context to compile against) and in the node settings for script compilation limits, and with the latter it's not applicable here since unlimited is used.

We disallowed stored scripts for runtime fields when we first built them and never re-allowed them. So I think that's mostly safe. I'm quite ok renaming things. I hadn't realized these names got out to users. Also - thing were different when we picked the names. We thought there'd be more different kinds of runtime fields. We never renamed them later. Anyway, what do you think of boolean_field_script? They are the script that runtime fields use for boolean fields and they are the script that we'll use to calculate boolean fields at index time. Same script context.

I agree that it is odd to make users type this, but I would take this a step forward. I am not entirely comfortable even asking users to type context: "boolean_script_field" when they want to test a script for a boolean runtime field. It seems like there is a gap between how a runtime field is simulated and when it gets used in practice. There may be other ways to achieve this, but would it be a possibility to expose some artificial "runtime_field" context instead, and have another way to specify the type, for instance as part of the context setup? Any other ideas on this?

rjernst · 2021-04-06T23:49:54Z

...les/lang-painless/src/main/java/org/elasticsearch/painless/action/PainlessExecuteAction.java

+                    BooleanFieldScript booleanFieldScript = leafFactory.newInstance(leafReaderContext);
+                    booleanFieldScript.setDocument(0);
+                    booleanFieldScript.execute();
+                    return new Response(Map.of("trues", booleanFieldScript.trues(), "falses", booleanFieldScript.falses()));


This seems odd but I realize it is a result of the existing output of BooleanFieldScript. @nik9000 What about using a list of values like the other runtime field types do?

I talked to @jdconrad about this yesterday and I don't have a great answer here. I like that BooleanFieldScript can do what it wants, at least on the theoretical level. And I don't think it's worth modifying BooleanFieldScript just for this API. The ordering of the emitted booleans isn't really important in any other context.

But I think it'd be entirely reasonable for this API to make a list out of the counts just to make it consistent. That is also consistent with the way doc values and the fields API work for this.

I do think its nice if we give these scripts the freedom to change their minds about they accumulate values without it breaking folks. I can't think of how they might do that right now but that don't mean there aren't good ways. If you "work like doc values" in this API then we have a consistent way to send the results to the callers.

As it stands now the scripts preserve ordering of values for everything but booleans. But the rest of ES leaves us the option to change that ordering in the future. And I'd like to keep it.

I just stumbled upon the same as part of adding support for script to BooleanFieldMapper and I went for doing what doc_values do, meaning returning false values first, then true values.Given that doc_values are resorted, and the fields API does not guarantee ordering, it looks like it is not needed to hold an array of values for this specific case. What we could do is expose a method that returns the values in whatever order we decide to return them, so that consumers don't have to do it themselves. And I would not expose in the response the trues vs falses concept, but rather expose an array of values, regardless of whether we have it internally or not.

Like I said, I don't think we should be exposing trues and falses here, but an array of values like we do for other types. I am adding a method to achieve the same here, can we do something similar?

nik9000 · 2021-04-07T12:24:16Z

...les/lang-painless/src/main/java/org/elasticsearch/painless/action/PainlessExecuteAction.java

@@ -105,7 +114,15 @@ private PainlessExecuteAction() {
        static final Map<String, ScriptContext<?>> SUPPORTED_CONTEXTS = Map.of(
                "painless_test", PainlessTestScript.CONTEXT,
                "filter", FilterScript.CONTEXT,
-                "score", ScoreScript.CONTEXT);
+                "score", ScoreScript.CONTEXT,
+                "boolean_script_field_script_field", BooleanFieldScript.CONTEXT,


We disallowed stored scripts for runtime fields when we first built them and never re-allowed them. So I think that's mostly safe. I'm quite ok renaming things. I hadn't realized these names got out to users. Also - thing were different when we picked the names. We thought there'd be more different kinds of runtime fields. We never renamed them later. Anyway, what do you think of boolean_field_script? They are the script that runtime fields use for boolean fields and they are the script that we'll use to calculate boolean fields at index time. Same script context.

nik9000 · 2021-04-07T12:32:44Z

...les/lang-painless/src/main/java/org/elasticsearch/painless/action/PainlessExecuteAction.java

+                    BooleanFieldScript booleanFieldScript = leafFactory.newInstance(leafReaderContext);
+                    booleanFieldScript.setDocument(0);
+                    booleanFieldScript.execute();
+                    return new Response(Map.of("trues", booleanFieldScript.trues(), "falses", booleanFieldScript.falses()));


I talked to @jdconrad about this yesterday and I don't have a great answer here. I like that BooleanFieldScript can do what it wants, at least on the theoretical level. And I don't think it's worth modifying BooleanFieldScript just for this API. The ordering of the emitted booleans isn't really important in any other context.

But I think it'd be entirely reasonable for this API to make a list out of the counts just to make it consistent. That is also consistent with the way doc values and the fields API work for this.

I do think its nice if we give these scripts the freedom to change their minds about they accumulate values without it breaking folks. I can't think of how they might do that right now but that don't mean there aren't good ways. If you "work like doc values" in this API then we have a consistent way to send the results to the callers.

As it stands now the scripts preserve ordering of values for everything but booleans. But the rest of ES leaves us the option to change that ordering in the future. And I'd like to keep it.

...nless/src/yamlRestTest/resources/rest-api-spec/test/painless/70_execute_painless_scripts.yml

javanna

I left a couple of comments, the main one around how users will refer to the runtime field contexts.

javanna · 2021-04-07T13:44:49Z

...les/lang-painless/src/main/java/org/elasticsearch/painless/action/PainlessExecuteAction.java

+                            factory.newFactory("boolean_runtime_field", request.getScript().getParams(), context.lookup());
+                    BooleanFieldScript booleanFieldScript = leafFactory.newInstance(leafReaderContext);
+                    booleanFieldScript.setDocument(0);
+                    booleanFieldScript.execute();


nit: we could call script.runForDoc(0)?

can you address this please? does it make sense?

javanna · 2021-04-07T13:47:49Z

...les/lang-painless/src/main/java/org/elasticsearch/painless/action/PainlessExecuteAction.java

+                    BooleanFieldScript booleanFieldScript = leafFactory.newInstance(leafReaderContext);
+                    booleanFieldScript.setDocument(0);
+                    booleanFieldScript.execute();
+                    return new Response(Map.of("trues", booleanFieldScript.trues(), "falses", booleanFieldScript.falses()));


I just stumbled upon the same as part of adding support for script to BooleanFieldMapper and I went for doing what doc_values do, meaning returning false values first, then true values.Given that doc_values are resorted, and the fields API does not guarantee ordering, it looks like it is not needed to hold an array of values for this specific case. What we could do is expose a method that returns the values in whatever order we decide to return them, so that consumers don't have to do it themselves. And I would not expose in the response the trues vs falses concept, but rather expose an array of values, regardless of whether we have it internally or not.

javanna · 2021-04-07T13:56:40Z

...les/lang-painless/src/main/java/org/elasticsearch/painless/action/PainlessExecuteAction.java

@@ -105,7 +114,15 @@ private PainlessExecuteAction() {
        static final Map<String, ScriptContext<?>> SUPPORTED_CONTEXTS = Map.of(
                "painless_test", PainlessTestScript.CONTEXT,
                "filter", FilterScript.CONTEXT,
-                "score", ScoreScript.CONTEXT);
+                "score", ScoreScript.CONTEXT,
+                "boolean_script_field_script_field", BooleanFieldScript.CONTEXT,


I agree that it is odd to make users type this, but I would take this a step forward. I am not entirely comfortable even asking users to type context: "boolean_script_field" when they want to test a script for a boolean runtime field. It seems like there is a gap between how a runtime field is simulated and when it gets used in practice. There may be other ways to achieve this, but would it be a possibility to expose some artificial "runtime_field" context instead, and have another way to specify the type, for instance as part of the context setup? Any other ideas on this?

jdconrad · 2021-04-07T14:24:15Z

@javanna I agree about not wanting the user to type out that full name -- we could add an alias as you suggest for the names of the runtime field contexts. Maybe something like

"context": "runtime_field",
"context_setup": {
    "runtime_type": "long",
    ...
}

javanna · 2021-04-07T14:34:26Z

++ @jdconrad I would only rename runtime_type to type in the context options ;)

jdconrad · 2021-04-07T15:05:56Z

Just a note that @alisonelizabeth has elastic/kibana#96424 ready to go for Painless Lab dependent on this PR.

rjernst · 2021-04-07T15:07:45Z

I am not entirely comfortable even asking users to type context: "boolean_script_field

Users won't need to type this: that is entirely up to how Kibana exposes this. I do not think we should complicate the execute api (or any other scripting api) to make runtime fields special here. From the scripting level, runtime fields are many contexts. We need to expose those contexts. In fact, we still want to move the execute API out so any context can be included, programmatically, and doing so means associating the handling code for parsing the context setup to the named context. Having a non real "runtime_field" here would make that effort much more difficult, and unable to use the most natural api, keyed by the context.

So I think we should fix the names, but not try to coerce them any farther than necessary. Note that the other contexts do not have the "field" suffix. For example, score scripts are just "score". But I'm not opposed to keep field in this case, just thought I would point out this isn't something user friendly exactly to begin with, it is an opaque id.

jdconrad · 2021-04-07T15:47:03Z

@javanna @rjernst @nik9000 I'd like to reach a consensus on this quickly if possible. The simplest change to make is what @rjernst has suggested. How would you all feel about renaming each of the runtime fields contexts to type_field ie long_field or geo_point_field, etc?

jdconrad · 2021-04-07T15:50:24Z

@rjernst @javanna While we have used these as ids - I do wonder if it would make sense to also have aliases even given the added layer of complexity as part of the execute api. Although, I do agree that most users probably just go through Kibana and will never see this, it is still available as an external api. Maybe aliases are something we could add as a follow up.

javanna · 2021-04-07T16:06:24Z

While the UI can help abstracting internal concepts away, I would prefer that we try to also simplify what concepts we expose to the direct users of our API. In this case, users know about runtime fields, and they specify a script and a type. They have no idea what a script context is and the fact that each type has its own. I don't think we should expose this internal detail in our API. I'd like to try and expose this one runtime field context for which you can specify the type as an additional configuration parameter. That is not perfect but close enough to how a runtime field is specified in the mapping or in the search request. @jimczi what do you think?

rjernst · 2021-04-07T16:22:10Z

I don't think we should expose this internal detail in our API. I'd like to try and expose this one runtime field context for which you can specify the type as an additional configuration parameter.

This API is not for runtime fields. It is an existing API, that has already defined semantics. That is, the context key matches a script context. Users will not be calling this API directly, they will be using a UI which will make a request to it behind the scenes, hidden to them. @javanna Can you clarify why given those facts how the context is specified by Kibana would matter?

alisonelizabeth · 2021-04-07T16:37:40Z

To add some thoughts here in terms of the UI -

I imagine this API will be used in two places in Kibana:

Painless Lab (WIP PR - [Painless Lab] Add support for runtime field contexts kibana#96424) where we already have a notion of a user selecting an execution context. We can expose a user-friendly string for the context name (e.g., boolean instead of boolean_script_field_script).

If we do feel strongly about having a "runtime fields" context, followed by a "type" selection, we could in theory implement that in the UI and derive the correct context behind the scenes when making the ES request. One hesitation I have though is it might be confusing to users who also look at our docs.

// Note the TODOs are just placeholder text for now.

Runtime fields editor. I haven't been involved in recent discussions around this, but I don't foresee that we would need to expose the context to the user. We would determine what context to pass to the execute API based on the "type" the user selected.

jdconrad · 2021-04-07T16:58:45Z

@elasticmachine run elasticsearch-ci/2

javanna · 2021-04-07T19:36:53Z

I appreciate that we can hide internal details through the UI, but I don't think that it is a good reason not to make our API easier to use when we can. Painless execute is a public API, it is documented and can be used directly, hence I think it is more user-friendly to expose one runtime field context instead of seven. That simplifies the docs and the UI as well (otherwise it has to maintain the mapping between type and required context, and so does every consumer of the API).

I cannot comment on the effort required, but my expectation was that this would require a small conversion layer, which should not be on the way of opening the API up to any context later on.

rjernst · 2021-04-07T19:51:35Z

which should not be on the way of opening the API up to any context later on.

The API is currently a hardcoded list of handlers, through a chain of if/else if looking at the context value specified. This hardcoded list is what we want to eventually change, moving the API outside of painless, so that any context created for scripting automatically gets supported. The way that would work is through the registration of scripting contexts that we already have (either directly in the context definition, or indirectly in the ScriptingPlugin interface). Using something other than the context id here will make such a transition difficult: it would mean defining more indirection layers (some "virtual" context called runtime_field and how that maps to concrete contexts).

Painless execute is a public API, it is documented and can be used directly, hence I think it is more user-friendly to expose one runtime field context instead of seven

Yes it is, but this API is specifically for scripting. Runtime fields are one use of scripting. I am strongly against trying to hide details of how the scripting api works. Context, while only exposed in a few places, is already exposed to users. We do lack good documentation on which contexts are available, but this is the means by which we have to define the signature of a type of script. Runtime fields has several because there are different signatures, depending on the type of the field.

my expectation was that this would require a small conversion layer

The amount of work involved is not my concern, per se. It is the overhead of this conversion layer for both developers and users. I find being direct is much simpler than adding indirection, especially when the vast majority of users will never see this, yet it would considerably complicate documentation (the API currently takes context, but now what is "context" if not the script context name?), and implementation (the indirection i already mentioned above).

jimczi · 2021-04-08T08:27:38Z

How would you all feel about renaming each of the runtime fields contexts to type_field ie long_field or geo_point_field, etc?

+1, I think it's inlined with the intent. We have strongly typed contexts for fields that produce values through a Painless script.
We added the support for these contexts in the indexed fields so I liked the idea of having generic typed contexts for the mapping.
The context_setup will be useful to add more customizations (additional runtime mapping for fields that depend on other scripts), although I don't think it should be used it to magically reconstruct the context name. These contexts exist so we need to expose them clearly, a simple name would help for that matter.

jdconrad · 2021-04-08T16:32:50Z

@javanna Are you all right with us proceeding with specifying the full context name for now given this is how the execute api already operates, and we can revisit api specification for this at some point in the future if we get feedback that it's not working well?

javanna · 2021-04-08T20:04:05Z

@jdconrad I am ok to not have a unified context for runtime fields and expose one per type. We were chatting with @jimczi and it turns out that the same contexts are also used for indexed fields that hold a script (see #68984), so using e.g. boolean_field would be good enough. Is then the plan to rename the existing contexts as a followup?

jdconrad · 2021-04-08T20:12:26Z

@javanna Cool, thanks Luca!

I will address the rest of Nik's and your requests to make sure the return values are in doc-values order.

And I do intend to follow this up with the name change, but after speaking with @nik9000 and @jimczi, it's possible users have added stored scripts with runtime field contexts, even though they're unusable. I would like to switch the names while also adding code that prevents this from happening and drops any that have been stored already with some kind of warning in the logs. (Please do let me know if you see an obvious flaw in this plan.)

rjernst · 2021-04-08T21:14:53Z

it's possible users have added stored scripts with runtime field contexts, even though they're unusable

Note that stored scripts don't store the context, it is only passed in to trigger compilation. So even if a stored script was passed in with the old name, changing the name does not break the script. It only breaks someone trying to use that name for stored scripts in the future.

This changes all the script context names specifically for runtime fields to be *_field such as long_field and geo_point_field, etc. This change is internal detail that will only be exposed through the Painless execute API as part of (#71374) and should not have bwc issues. I tested this change locally on a mixed cluster to ensure scripts stored with the old runtime fields context names are both still retrievable and delete-able. This works because the context name is only used during the request to check for valid compilation, but never actually stored as part of the cluster state.

…c#71581) This changes all the script context names specifically for runtime fields to be *_field such as long_field and geo_point_field, etc. This change is internal detail that will only be exposed through the Painless execute API as part of (elastic#71374) and should not have bwc issues. I tested this change locally on a mixed cluster to ensure scripts stored with the old runtime fields context names are both still retrievable and delete-able. This works because the context name is only used during the request to check for valid compilation, but never actually stored as part of the cluster state.

#71586) This changes all the script context names specifically for runtime fields to be *_field such as long_field and geo_point_field, etc. This change is internal detail that will only be exposed through the Painless execute API as part of (#71374) and should not have bwc issues. I tested this change locally on a mixed cluster to ensure scripts stored with the old runtime fields context names are both still retrievable and delete-able. This works because the context name is only used during the request to check for valid compilation, but never actually stored as part of the cluster state.

…#71599) This adds utility methods to each type of runtime field to return the results of a document in an ordered array based on the same order that doc values are ordered in. This is useful for supporting execute api in this #71374.

…elastic#71599) This adds utility methods to each type of runtime field to return the results of a document in an ordered array based on the same order that doc values are ordered in. This is useful for supporting execute api in this elastic#71374.

…#71599) (#71604) This adds utility methods to each type of runtime field to return the results of a document in an ordered array based on the same order that doc values are ordered in. This is useful for supporting execute api in this #71374.

jdconrad · 2021-04-13T20:42:42Z

@nik9000 Would you please take one last pass at this?

This last set of commits has the following:

Updated context names to *_type (long_field, etc.)
Updated to use the new asDocValues methods on the contexts for results
Updated to use runAsDoc instead of execute
Updated unit tests
Updated yaml tests w/ single value and multi value for each type

nik9000

LGTM. @javanna should have another look too.

jdconrad · 2021-04-13T20:46:38Z

Thanks @nik9000. @javanna Would you please take one last pass at this? :)

jdconrad · 2021-04-13T21:09:35Z

@elasticmachine run elasticsearch-ci/2

javanna

LGTM thanks @jdconrad !

jdconrad · 2021-04-14T15:56:34Z

Thank you for the all the feedback, @javanna!

This change adds support for the 7 different runtime fields contexts to the Painless Execute API. Each context can accept the standard script input (source and params) along with a user-defined document and an index name to pull mappings from. The results depend on the output of the runtime field type. Closes #70467

jdconrad added 4 commits April 5, 2021 16:48

add rt fields to painless execute action

7fa4599

add some tests

df768ba

finish single node tests

105c7cd

add yaml tests

ab58a91

jdconrad added >enhancement :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache v8.0.0 v7.13.0 labels Apr 6, 2021

jdconrad requested review from nik9000 and rjernst April 6, 2021 21:09

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Apr 6, 2021

Merge branch 'master' into rtexec

1f4e585

rjernst approved these changes Apr 6, 2021

View reviewed changes

nik9000 reviewed Apr 7, 2021

View reviewed changes

javanna self-requested a review April 7, 2021 13:26

javanna requested changes Apr 7, 2021

View reviewed changes

jdconrad mentioned this pull request Apr 12, 2021

Change script context names for run time fields to type_field #71581

Merged

jdconrad mentioned this pull request Apr 12, 2021

Add utility methods to return runtime field values in doc value order #71599

Merged

jdconrad added 6 commits April 13, 2021 08:06

Merge branch 'master' into rtexec

e26157b

update based on pr feedback

9634407

Merge branch 'master' into rtexec

20746ac

some yaml tests updated

6b40ed1

Merge branch 'master' into rtexec

662f78b

finish updating yaml tests

0c78bea

nik9000 approved these changes Apr 13, 2021

View reviewed changes

javanna approved these changes Apr 14, 2021

View reviewed changes

jdconrad merged commit 301dcb6 into elastic:master Apr 14, 2021

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Add Runtime Fields Contexts to Painless Execute API #71374

Add Runtime Fields Contexts to Painless Execute API #71374

Uh oh!

Conversation

jdconrad commented Apr 6, 2021

Uh oh!

elasticmachine commented Apr 6, 2021

Uh oh!

rjernst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

javanna left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdconrad commented Apr 7, 2021

Uh oh!

javanna commented Apr 7, 2021

Uh oh!

jdconrad commented Apr 7, 2021

Uh oh!

rjernst commented Apr 7, 2021

Uh oh!

jdconrad commented Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdconrad commented Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

javanna commented Apr 7, 2021

Uh oh!

rjernst commented Apr 7, 2021

Uh oh!

alisonelizabeth commented Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdconrad commented Apr 7, 2021

Uh oh!

javanna commented Apr 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rjernst commented Apr 7, 2021

Uh oh!

jimczi commented Apr 8, 2021

Uh oh!

jdconrad commented Apr 8, 2021

Uh oh!

javanna commented Apr 8, 2021

Uh oh!

jdconrad commented Apr 8, 2021

Uh oh!

rjernst commented Apr 8, 2021

Uh oh!

jdconrad commented Apr 13, 2021

jdconrad commented Apr 7, 2021 •

edited

Loading

jdconrad commented Apr 7, 2021 •

edited

Loading

alisonelizabeth commented Apr 7, 2021 •

edited

Loading

javanna commented Apr 7, 2021 •

edited

Loading