Kibana scripted fields to runtime fields #69631

stu-elastic · 2021-02-25T19:40:01Z

The goal is to translate a kibana scripted field to a runtime field.

Differences

emit vs return

Scripted fields return values, runtime fields emit values.
- emit appends a value to an internally managed list and does not halt execution
- returning a value in a scripted field halts execution. There is no implicit list management.

Types

Scripted field returns are coerced into a javascript type: number, string, date or boolean. The coereced type is selected by the user. The return type for the painless context is Object.
- The user may return a type that does not match the javascript types (eg a List of Objects).
Runtime fields can emit boolean, date (from a long), double, long, String, GeoPoint, IP (from a String).

Contexts

Scripted fields uses the painless Field Context.
Runtime fields uses a context based on the emited type, eg ip_script_field.

Required Return Value

Scripted fields are required to return a value, which may be null.
Runtime fields may not emit a value.

Implicit returns

Scripted fields will implicitly return the result of the last statement, if that statement is an expression and there is no explicit return.
There is no implicit emit from runtime fields.

Options

Have kibana perform a string replacement. return x; -> emit(x); return;
Provide kibana the AST from a parsed script to perform a more intelligent replacement.
Provide an elasticsearch API that performs the replacement.

cc: @mattkime @jdconrad

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-02-25T19:40:16Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

stu-elastic · 2021-03-01T16:35:12Z

Have kibana perform a string replacement. return x; -> emit(x); return;
- cost: Small? (needs confirmation from @mattkime)
- 80% solution, may have difficulties in handling cases where x is an expression. Would have to match parens, curly braces, double and single quotes (including escapes) for a 99% solution.
Provide kibana the AST from a parsed script to perform a more intelligent replacement.
- cost: X-Large
- High backward compatibility cost, exposes internal datastructures that change regularly.
Provide an elasticsearch API that performs the replacement.
- cost: Large
- One-off API that must be removed in next major. Adds costs to AST changes, as they must consider this feature.

mattkime · 2021-03-16T22:26:30Z

At least initially I'm not feeling very confident about the first option but perhaps I can get there. Obviously return x; -> emit(x); return; isn't too bad.

but parens, curly braces, double and single quotes (including escapes) makes me nervous.

If nothing else I'd need a complete set of examples to work from as a set of test cases.

Perhaps @stu-elastic or someone else knows how to simplify the problem.

stu-elastic · 2021-03-16T22:52:16Z

Ahh curlies are not relevant, that was mistyped.

The rest is standard expression parsing to do it perfectly.

I suspect there'll be a ton of mileage gained from simply grabbing until the next ;. The next level would be to detect ; inside a string, but starts to get vanishingly unlikely.

stu-elastic · 2021-03-17T00:49:51Z

Assuming there's no implicit return, here's an idea of the flavors of return statements one could encounter:
return (1 + 2 + Math.floor(foo));
return 1 + 2 + Math.floor(foo);
return "abc";
return foo;
return 5;
return "this String has a ; in the middle";
return 'this string is single quoted and has a double quote (") inside';
return 'this string has an escaped single quote(\\')';

mattkime · 2021-03-17T02:10:34Z

@stu-elastic Thanks, thats looking a lot more approachable.

Might it be possible for return to reference a multiline construct?

stu-elastic · 2021-03-17T14:49:43Z

Yes.
Newline \n, along with tab \t, carriage return \r and space are all considered whitespace. Elements of an expression may be separated by whitespace.

stu-elastic · 2021-03-24T17:05:09Z

Here's another idea, why not do the same thing kibana does when it takes kibana scripted fields and uses them as filters in discover?

In that case, kibana generates some methods and embeds the old script in a lambda:

boolean gte(Supplier s, def v) {return s.get() >= v }
boolean lt(Supplier s, def v) { return s.get() < v }
gte(() -> { 
  // previous script here
})

Instead of doing the string transform with return, why not just wrap the entire user script in a method and emit the result of the method?

def exec() {
  // previous script here
}
emit(exec())

This reuses the script transform infrastructure already in kibana while avoiding any parsing. The same limitations in these transforms as already apply to kibana scripted fields used as filters in discover.

If there's a syntax error, it'd show up in a call to the runtime fields execute api #70467

mattkime · 2021-03-24T17:15:51Z

This reuses the script transform infrastructure already in kibana while avoiding any parsing. The same limitations in these transforms as already apply to kibana scripted fields used as filters in discover.

If I remember correctly, this doesn't work if there are function definitions as part of the script, correct? What other code structures can't be dropped inside a function?

stu-elastic · 2021-03-24T17:18:49Z

~~That's the only limitation.~~ one other, see below

jdconrad · 2021-03-24T17:21:43Z

One other caveat, that is a long term fix @stu-elastic is working on actually is our "static" methods that require the Script pointer will not work once we put them inside a user-defined function, but most of these aren't used often.

Example:

...
return cosineSimilarity(my_vectors, "my_field");

stu-elastic · 2021-03-24T17:34:51Z

Ahh good catch @jdconrad. The nice thing is both of those limitations would fail at compile time, so they would be detectable via the execute api.

mattkime · 2021-03-24T17:58:43Z

is our "static" methods that require the Script pointer

Can you say more about this or point me to the docs? Looking at the example code I just see a function invocation with two arguments.

The nice thing is both of those limitations would fail at compile time, so they would be detectable via the execute api.

Compilation failures strike me as an ideal way to catch failed scripts.

jdconrad · 2021-03-24T18:16:23Z

@mattkime I'll go through our allow lists and see if I can gather up the specific methods. Our documentation is likely out-of-date for all them, and probably a bit unwieldy to gather them through there.

mattkime · 2021-03-26T14:35:24Z

One thing worth pointing out is that wrapping scripts doesn't provide a very good introduction to writing a runtime script. I'm not sure if that will affect which route we choose but its worth mentioning.

jdconrad · 2021-03-31T17:58:15Z

@mattkime @jimczi @rdnm @stu-elastic @rjernst

I wanted to get a better understanding of what can be provided client side for this migration, so I took a pass at this problem with the constraints of doing a string to string translation using standard data structures.

The idea is the Java code I have is a good approximation of the equivalent TypeScript code. Doing better on the server-side may be possible, but would require a decent amount of investigation.

I have stored the code in a test case class on this branch in this file (https://github.com/jdconrad/elasticsearch/blob/kmig/modules/lang-painless/src/test/java/org/elasticsearch/painless/AdditionTests.java#L24). The implementation is in replaceReturnWithEmit.

Notes:

It's a bit on the longer side, since as a short-term one off I did not want to invest too much time here. replaceReturnWithEmit is 624 lines of Java, including comments and empty lines. The rest of the file is driver code and tests.
It's purely string to string conversion based on sentinel characters used to collect enough information to convert return to emit in the correct places.
This should be directly portable to TypeScript without anything special to the best of my knowledge.
It uses simple data structures such as maps, and lists and relies on the ability to extract individual characters from a string.

Caveats:

This only works for scripts that are syntactically valid.
This will not handle a declaration as the last statement. While this is semantically valid, it doesn't make sense in practice.
This only works for scripts that are guaranteed to return a value including implicit returns for the last statement.
This only works for single-valued scripted fields. Any scripted field that would emit multiple values is not covered here, such as returning a list or object. Luckily, to my knowledge, Kibana does not provide affordances returning these.
This definitely needs some more testing.

Some examples:

doc['return'] + doc['my_return_field']
 emit(doc['return'] + doc['my_return_field'])

The above ignores strings as part of the field names. It then takes an implicit return value and adds an emit to the last statement.

Another more complex example:

int get(/* blah */){return 5;}org.elastic.type [] method() {   //test
 return null}  if (x) {return 1} else return 2; /*hello hello*/ get(x -> x, () -> {return 9;}); // comment
int get(/* blah */){return 5;}org.elastic.type [] method() {   //test
 return null}  if (x) {emit(1); return;} else {emit(2); return;} emit(/*hello hello*/ get(x -> x, () -> {return 9;})); // comment

The above ignores the returns as part of user-defined functions and lambdas. It ignores comments. It converts main body statements with return to emit(...); return; with appropriate braces around single statement else. It converts the last statement to emit a value for an implicit return.

mattkime · 2021-03-31T18:39:19Z

My initial reaction looking at that code - its what I thought this would require but I lack the experience to do it confidently. The code looks easy to translate into typescript although I'd want a pretty extensive test suite to go with it.

This only works for single-valued scripted fields. Any scripted field that would emit multiple values is not covered here, such as returning a list or object. Luckily, to my knowledge, Kibana does not provide affordances returning these.

Kibana scripted fields does support this. That said, we'd probably be okay not supporting it.

jdconrad · 2021-04-05T21:21:14Z

Moved this code to (https://github.com/jdconrad/elasticsearch/blob/kmig/modules/lang-painless/src/test/java/org/elasticsearch/painless/MigrationTests.java) for now. Added 72 tests. TODOs embedded to add more. Pausing for now.

mattkime · 2021-04-06T03:05:46Z

@jdconrad Our current plan is to not provide an automated method of converting scripted fields. Still good to have this in case something changes.

stu-elastic · 2021-06-10T16:32:59Z

Let's close for now, if the plans change please reopen.

stu-elastic added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement labels Feb 25, 2021

elasticmachine added the Team:Core/Infra Meta label for core/infra team label Feb 25, 2021

stu-elastic closed this as completed Jun 10, 2021

Kibana scripted fields to runtime fields #69631

Kibana scripted fields to runtime fields #69631

Comments

stu-elastic commented Feb 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Differences

emit vs return

Types

Contexts

Required Return Value

Implicit returns

Options

elasticmachine commented Feb 25, 2021

Uh oh!

stu-elastic commented Mar 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattkime commented Mar 16, 2021

Uh oh!

stu-elastic commented Mar 16, 2021

Uh oh!

stu-elastic commented Mar 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattkime commented Mar 17, 2021

Uh oh!

stu-elastic commented Mar 17, 2021

Uh oh!

stu-elastic commented Mar 24, 2021

Uh oh!

mattkime commented Mar 24, 2021

Uh oh!

stu-elastic commented Mar 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdconrad commented Mar 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stu-elastic commented Mar 24, 2021

Uh oh!

mattkime commented Mar 24, 2021

Uh oh!

jdconrad commented Mar 24, 2021

Uh oh!

mattkime commented Mar 26, 2021

Uh oh!

jdconrad commented Mar 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattkime commented Mar 31, 2021

Uh oh!

jdconrad commented Apr 5, 2021

Uh oh!

mattkime commented Apr 6, 2021

Uh oh!

stu-elastic commented Jun 10, 2021

Uh oh!

stu-elastic commented Feb 25, 2021 •

edited

Loading

stu-elastic commented Mar 1, 2021 •

edited

Loading

stu-elastic commented Mar 17, 2021 •

edited

Loading

stu-elastic commented Mar 24, 2021 •

edited

Loading

jdconrad commented Mar 24, 2021 •

edited

Loading

jdconrad commented Mar 31, 2021 •

edited

Loading