Skip to content

Add Runtime Fields Contexts to Painless Execute API #71374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 14, 2021

Conversation

jdconrad
Copy link
Contributor

@jdconrad jdconrad commented Apr 6, 2021

This change adds support for the 7 different runtime fields contexts to the Painless Execute API. Each context can accept the standard script input (source and params) along with a user-defined document and an index name to pull mappings from. The results depend on the output of the runtime field type.

Closes #70467

@jdconrad jdconrad added >enhancement :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache v8.0.0 v7.13.0 labels Apr 6, 2021
@jdconrad jdconrad requested review from nik9000 and rjernst April 6, 2021 21:09
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Apr 6, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but @nik9000 may have more thoughts.

@@ -105,7 +114,15 @@ private PainlessExecuteAction() {
static final Map<String, ScriptContext<?>> SUPPORTED_CONTEXTS = Map.of(
"painless_test", PainlessTestScript.CONTEXT,
"filter", FilterScript.CONTEXT,
"score", ScoreScript.CONTEXT);
"score", ScoreScript.CONTEXT,
"boolean_script_field_script_field", BooleanFieldScript.CONTEXT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 This "bug" with doubling the "script_field" suffix is unfortunate. WDYT about doing a hard break and fixing it? The only place the name is used is for stored scripts (but only in the api, to know which context to compile against) and in the node settings for script compilation limits, and with the latter it's not applicable here since unlimited is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We disallowed stored scripts for runtime fields when we first built them and never re-allowed them. So I think that's mostly safe. I'm quite ok renaming things. I hadn't realized these names got out to users. Also - thing were different when we picked the names. We thought there'd be more different kinds of runtime fields. We never renamed them later. Anyway, what do you think of boolean_field_script? They are the script that runtime fields use for boolean fields and they are the script that we'll use to calculate boolean fields at index time. Same script context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is odd to make users type this, but I would take this a step forward. I am not entirely comfortable even asking users to type context: "boolean_script_field" when they want to test a script for a boolean runtime field. It seems like there is a gap between how a runtime field is simulated and when it gets used in practice. There may be other ways to achieve this, but would it be a possibility to expose some artificial "runtime_field" context instead, and have another way to specify the type, for instance as part of the context setup? Any other ideas on this?

BooleanFieldScript booleanFieldScript = leafFactory.newInstance(leafReaderContext);
booleanFieldScript.setDocument(0);
booleanFieldScript.execute();
return new Response(Map.of("trues", booleanFieldScript.trues(), "falses", booleanFieldScript.falses()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd but I realize it is a result of the existing output of BooleanFieldScript. @nik9000 What about using a list of values like the other runtime field types do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to @jdconrad about this yesterday and I don't have a great answer here. I like that BooleanFieldScript can do what it wants, at least on the theoretical level. And I don't think it's worth modifying BooleanFieldScript just for this API. The ordering of the emitted booleans isn't really important in any other context.

But I think it'd be entirely reasonable for this API to make a list out of the counts just to make it consistent. That is also consistent with the way doc values and the fields API work for this.

I do think its nice if we give these scripts the freedom to change their minds about they accumulate values without it breaking folks. I can't think of how they might do that right now but that don't mean there aren't good ways. If you "work like doc values" in this API then we have a consistent way to send the results to the callers.

As it stands now the scripts preserve ordering of values for everything but booleans. But the rest of ES leaves us the option to change that ordering in the future. And I'd like to keep it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just stumbled upon the same as part of adding support for script to BooleanFieldMapper and I went for doing what doc_values do, meaning returning false values first, then true values.Given that doc_values are resorted, and the fields API does not guarantee ordering, it looks like it is not needed to hold an array of values for this specific case. What we could do is expose a method that returns the values in whatever order we decide to return them, so that consumers don't have to do it themselves. And I would not expose in the response the trues vs falses concept, but rather expose an array of values, regardless of whether we have it internally or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said, I don't think we should be exposing trues and falses here, but an array of values like we do for other types. I am adding a method to achieve the same here, can we do something similar?

@@ -105,7 +114,15 @@ private PainlessExecuteAction() {
static final Map<String, ScriptContext<?>> SUPPORTED_CONTEXTS = Map.of(
"painless_test", PainlessTestScript.CONTEXT,
"filter", FilterScript.CONTEXT,
"score", ScoreScript.CONTEXT);
"score", ScoreScript.CONTEXT,
"boolean_script_field_script_field", BooleanFieldScript.CONTEXT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We disallowed stored scripts for runtime fields when we first built them and never re-allowed them. So I think that's mostly safe. I'm quite ok renaming things. I hadn't realized these names got out to users. Also - thing were different when we picked the names. We thought there'd be more different kinds of runtime fields. We never renamed them later. Anyway, what do you think of boolean_field_script? They are the script that runtime fields use for boolean fields and they are the script that we'll use to calculate boolean fields at index time. Same script context.

BooleanFieldScript booleanFieldScript = leafFactory.newInstance(leafReaderContext);
booleanFieldScript.setDocument(0);
booleanFieldScript.execute();
return new Response(Map.of("trues", booleanFieldScript.trues(), "falses", booleanFieldScript.falses()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked to @jdconrad about this yesterday and I don't have a great answer here. I like that BooleanFieldScript can do what it wants, at least on the theoretical level. And I don't think it's worth modifying BooleanFieldScript just for this API. The ordering of the emitted booleans isn't really important in any other context.

But I think it'd be entirely reasonable for this API to make a list out of the counts just to make it consistent. That is also consistent with the way doc values and the fields API work for this.

I do think its nice if we give these scripts the freedom to change their minds about they accumulate values without it breaking folks. I can't think of how they might do that right now but that don't mean there aren't good ways. If you "work like doc values" in this API then we have a consistent way to send the results to the callers.

As it stands now the scripts preserve ordering of values for everything but booleans. But the rest of ES leaves us the option to change that ordering in the future. And I'd like to keep it.

@javanna javanna self-requested a review April 7, 2021 13:26
Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments, the main one around how users will refer to the runtime field contexts.

factory.newFactory("boolean_runtime_field", request.getScript().getParams(), context.lookup());
BooleanFieldScript booleanFieldScript = leafFactory.newInstance(leafReaderContext);
booleanFieldScript.setDocument(0);
booleanFieldScript.execute();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could call script.runForDoc(0)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you address this please? does it make sense?

BooleanFieldScript booleanFieldScript = leafFactory.newInstance(leafReaderContext);
booleanFieldScript.setDocument(0);
booleanFieldScript.execute();
return new Response(Map.of("trues", booleanFieldScript.trues(), "falses", booleanFieldScript.falses()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just stumbled upon the same as part of adding support for script to BooleanFieldMapper and I went for doing what doc_values do, meaning returning false values first, then true values.Given that doc_values are resorted, and the fields API does not guarantee ordering, it looks like it is not needed to hold an array of values for this specific case. What we could do is expose a method that returns the values in whatever order we decide to return them, so that consumers don't have to do it themselves. And I would not expose in the response the trues vs falses concept, but rather expose an array of values, regardless of whether we have it internally or not.

@@ -105,7 +114,15 @@ private PainlessExecuteAction() {
static final Map<String, ScriptContext<?>> SUPPORTED_CONTEXTS = Map.of(
"painless_test", PainlessTestScript.CONTEXT,
"filter", FilterScript.CONTEXT,
"score", ScoreScript.CONTEXT);
"score", ScoreScript.CONTEXT,
"boolean_script_field_script_field", BooleanFieldScript.CONTEXT,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is odd to make users type this, but I would take this a step forward. I am not entirely comfortable even asking users to type context: "boolean_script_field" when they want to test a script for a boolean runtime field. It seems like there is a gap between how a runtime field is simulated and when it gets used in practice. There may be other ways to achieve this, but would it be a possibility to expose some artificial "runtime_field" context instead, and have another way to specify the type, for instance as part of the context setup? Any other ideas on this?

@jdconrad
Copy link
Contributor Author

jdconrad commented Apr 7, 2021

@javanna I agree about not wanting the user to type out that full name -- we could add an alias as you suggest for the names of the runtime field contexts. Maybe something like

"context": "runtime_field",
"context_setup": {
    "runtime_type": "long",
    ...
}

@javanna
Copy link
Member

javanna commented Apr 7, 2021

++ @jdconrad I would only rename runtime_type to type in the context options ;)

@jdconrad
Copy link
Contributor Author

jdconrad commented Apr 7, 2021

Just a note that @alisonelizabeth has elastic/kibana#96424 ready to go for Painless Lab dependent on this PR.

@rjernst
Copy link
Member

rjernst commented Apr 7, 2021

I am not entirely comfortable even asking users to type context: "boolean_script_field

Users won't need to type this: that is entirely up to how Kibana exposes this. I do not think we should complicate the execute api (or any other scripting api) to make runtime fields special here. From the scripting level, runtime fields are many contexts. We need to expose those contexts. In fact, we still want to move the execute API out so any context can be included, programmatically, and doing so means associating the handling code for parsing the context setup to the named context. Having a non real "runtime_field" here would make that effort much more difficult, and unable to use the most natural api, keyed by the context.

So I think we should fix the names, but not try to coerce them any farther than necessary. Note that the other contexts do not have the "field" suffix. For example, score scripts are just "score". But I'm not opposed to keep field in this case, just thought I would point out this isn't something user friendly exactly to begin with, it is an opaque id.

@jdconrad
Copy link
Contributor Author

jdconrad commented Apr 7, 2021

@javanna @rjernst @nik9000 I'd like to reach a consensus on this quickly if possible. The simplest change to make is what @rjernst has suggested. How would you all feel about renaming each of the runtime fields contexts to type_field ie long_field or geo_point_field, etc?

@jdconrad
Copy link
Contributor Author

jdconrad commented Apr 7, 2021

@rjernst @javanna While we have used these as ids - I do wonder if it would make sense to also have aliases even given the added layer of complexity as part of the execute api. Although, I do agree that most users probably just go through Kibana and will never see this, it is still available as an external api. Maybe aliases are something we could add as a follow up.

@javanna
Copy link
Member

javanna commented Apr 7, 2021

While the UI can help abstracting internal concepts away, I would prefer that we try to also simplify what concepts we expose to the direct users of our API. In this case, users know about runtime fields, and they specify a script and a type. They have no idea what a script context is and the fact that each type has its own. I don't think we should expose this internal detail in our API. I'd like to try and expose this one runtime field context for which you can specify the type as an additional configuration parameter. That is not perfect but close enough to how a runtime field is specified in the mapping or in the search request. @jimczi what do you think?

@rjernst
Copy link
Member

rjernst commented Apr 7, 2021

I don't think we should expose this internal detail in our API. I'd like to try and expose this one runtime field context for which you can specify the type as an additional configuration parameter.

This API is not for runtime fields. It is an existing API, that has already defined semantics. That is, the context key matches a script context. Users will not be calling this API directly, they will be using a UI which will make a request to it behind the scenes, hidden to them. @javanna Can you clarify why given those facts how the context is specified by Kibana would matter?

@alisonelizabeth
Copy link
Contributor

alisonelizabeth commented Apr 7, 2021

To add some thoughts here in terms of the UI -

I imagine this API will be used in two places in Kibana:

  1. Painless Lab (WIP PR - [Painless Lab] Add support for runtime field contexts kibana#96424) where we already have a notion of a user selecting an execution context. We can expose a user-friendly string for the context name (e.g., boolean instead of boolean_script_field_script).

If we do feel strongly about having a "runtime fields" context, followed by a "type" selection, we could in theory implement that in the UI and derive the correct context behind the scenes when making the ES request. One hesitation I have though is it might be confusing to users who also look at our docs.

// Note the TODOs are just placeholder text for now.
Screen Shot 2021-04-07 at 12 33 24 PM

  1. Runtime fields editor. I haven't been involved in recent discussions around this, but I don't foresee that we would need to expose the context to the user. We would determine what context to pass to the execute API based on the "type" the user selected.

@jdconrad
Copy link
Contributor Author

jdconrad commented Apr 7, 2021

@elasticmachine run elasticsearch-ci/2

@javanna
Copy link
Member

javanna commented Apr 7, 2021

I appreciate that we can hide internal details through the UI, but I don't think that it is a good reason not to make our API easier to use when we can. Painless execute is a public API, it is documented and can be used directly, hence I think it is more user-friendly to expose one runtime field context instead of seven. That simplifies the docs and the UI as well (otherwise it has to maintain the mapping between type and required context, and so does every consumer of the API).

I cannot comment on the effort required, but my expectation was that this would require a small conversion layer, which should not be on the way of opening the API up to any context later on.

@rjernst
Copy link
Member

rjernst commented Apr 7, 2021

which should not be on the way of opening the API up to any context later on.

The API is currently a hardcoded list of handlers, through a chain of if/else if looking at the context value specified. This hardcoded list is what we want to eventually change, moving the API outside of painless, so that any context created for scripting automatically gets supported. The way that would work is through the registration of scripting contexts that we already have (either directly in the context definition, or indirectly in the ScriptingPlugin interface). Using something other than the context id here will make such a transition difficult: it would mean defining more indirection layers (some "virtual" context called runtime_field and how that maps to concrete contexts).

Painless execute is a public API, it is documented and can be used directly, hence I think it is more user-friendly to expose one runtime field context instead of seven

Yes it is, but this API is specifically for scripting. Runtime fields are one use of scripting. I am strongly against trying to hide details of how the scripting api works. Context, while only exposed in a few places, is already exposed to users. We do lack good documentation on which contexts are available, but this is the means by which we have to define the signature of a type of script. Runtime fields has several because there are different signatures, depending on the type of the field.

my expectation was that this would require a small conversion layer

The amount of work involved is not my concern, per se. It is the overhead of this conversion layer for both developers and users. I find being direct is much simpler than adding indirection, especially when the vast majority of users will never see this, yet it would considerably complicate documentation (the API currently takes context, but now what is "context" if not the script context name?), and implementation (the indirection i already mentioned above).

@jimczi
Copy link
Contributor

jimczi commented Apr 8, 2021

How would you all feel about renaming each of the runtime fields contexts to type_field ie long_field or geo_point_field, etc?

+1, I think it's inlined with the intent. We have strongly typed contexts for fields that produce values through a Painless script.
We added the support for these contexts in the indexed fields so I liked the idea of having generic typed contexts for the mapping.
The context_setup will be useful to add more customizations (additional runtime mapping for fields that depend on other scripts), although I don't think it should be used it to magically reconstruct the context name. These contexts exist so we need to expose them clearly, a simple name would help for that matter.

@jdconrad
Copy link
Contributor Author

jdconrad commented Apr 8, 2021

@javanna Are you all right with us proceeding with specifying the full context name for now given this is how the execute api already operates, and we can revisit api specification for this at some point in the future if we get feedback that it's not working well?

@javanna
Copy link
Member

javanna commented Apr 8, 2021

@jdconrad I am ok to not have a unified context for runtime fields and expose one per type. We were chatting with @jimczi and it turns out that the same contexts are also used for indexed fields that hold a script (see #68984), so using e.g. boolean_field would be good enough. Is then the plan to rename the existing contexts as a followup?

@jdconrad
Copy link
Contributor Author

jdconrad commented Apr 8, 2021

@javanna Cool, thanks Luca!

I will address the rest of Nik's and your requests to make sure the return values are in doc-values order.

And I do intend to follow this up with the name change, but after speaking with @nik9000 and @jimczi, it's possible users have added stored scripts with runtime field contexts, even though they're unusable. I would like to switch the names while also adding code that prevents this from happening and drops any that have been stored already with some kind of warning in the logs. (Please do let me know if you see an obvious flaw in this plan.)

@rjernst
Copy link
Member

rjernst commented Apr 8, 2021

it's possible users have added stored scripts with runtime field contexts, even though they're unusable

Note that stored scripts don't store the context, it is only passed in to trigger compilation. So even if a stored script was passed in with the old name, changing the name does not break the script. It only breaks someone trying to use that name for stored scripts in the future.

jdconrad added a commit that referenced this pull request Apr 12, 2021
This changes all the script context names specifically for runtime fields to be *_field such as long_field 
and geo_point_field, etc. This change is internal detail that will only be exposed through the Painless 
execute API as part of (#71374) and should not have bwc issues. I tested this change locally on a 
mixed cluster to ensure scripts stored with the old runtime fields context names are both still 
retrievable and delete-able. This works because the context name is only used during the request to 
check for valid compilation, but never actually stored as part of the cluster state.
jdconrad added a commit to jdconrad/elasticsearch that referenced this pull request Apr 12, 2021
…c#71581)

This changes all the script context names specifically for runtime fields to be *_field such as long_field 
and geo_point_field, etc. This change is internal detail that will only be exposed through the Painless 
execute API as part of (elastic#71374) and should not have bwc issues. I tested this change locally on a 
mixed cluster to ensure scripts stored with the old runtime fields context names are both still 
retrievable and delete-able. This works because the context name is only used during the request to 
check for valid compilation, but never actually stored as part of the cluster state.
jdconrad added a commit that referenced this pull request Apr 12, 2021
#71586)

This changes all the script context names specifically for runtime fields to be *_field such as long_field 
and geo_point_field, etc. This change is internal detail that will only be exposed through the Painless 
execute API as part of (#71374) and should not have bwc issues. I tested this change locally on a 
mixed cluster to ensure scripts stored with the old runtime fields context names are both still 
retrievable and delete-able. This works because the context name is only used during the request to 
check for valid compilation, but never actually stored as part of the cluster state.
jdconrad added a commit that referenced this pull request Apr 12, 2021
…#71599)

This adds utility methods to each type of runtime field to return the results of a document in an ordered array based on the same order that doc values are ordered in. This is useful for supporting execute api in this #71374.
jdconrad added a commit to jdconrad/elasticsearch that referenced this pull request Apr 13, 2021
…elastic#71599)

This adds utility methods to each type of runtime field to return the results of a document in an ordered array based on the same order that doc values are ordered in. This is useful for supporting execute api in this elastic#71374.
jdconrad added a commit that referenced this pull request Apr 13, 2021
…#71599) (#71604)

This adds utility methods to each type of runtime field to return the results of a document in an ordered array based on the same order that doc values are ordered in. This is useful for supporting execute api in this #71374.
@jdconrad
Copy link
Contributor Author

@nik9000 Would you please take one last pass at this?

This last set of commits has the following:

  • Updated context names to *_type (long_field, etc.)
  • Updated to use the new asDocValues methods on the contexts for results
  • Updated to use runAsDoc instead of execute
  • Updated unit tests
  • Updated yaml tests w/ single value and multi value for each type

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @javanna should have another look too.

@jdconrad
Copy link
Contributor Author

Thanks @nik9000. @javanna Would you please take one last pass at this? :)

@jdconrad
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/2

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @jdconrad !

@jdconrad
Copy link
Contributor Author

Thank you for the all the feedback, @javanna!

@jdconrad jdconrad merged commit 301dcb6 into elastic:master Apr 14, 2021
jdconrad added a commit that referenced this pull request Apr 14, 2021
This change adds support for the 7 different runtime fields contexts to the Painless Execute API. Each
context can accept the standard script input (source and params) along with a user-defined document
and an index name to pull mappings from. The results depend on the output of the runtime field
type.

Closes #70467
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement Team:Core/Infra Meta label for core/infra team v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make script contexts for runtime fields available as part of the Painless execute api
8 participants