Skip to content

Kibana scripted fields to runtime fields #69631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stu-elastic opened this issue Feb 25, 2021 · 20 comments
Closed

Kibana scripted fields to runtime fields #69631

stu-elastic opened this issue Feb 25, 2021 · 20 comments
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement Team:Core/Infra Meta label for core/infra team

Comments

@stu-elastic
Copy link
Contributor

stu-elastic commented Feb 25, 2021

The goal is to translate a kibana scripted field to a runtime field.

Differences

emit vs return

  • Scripted fields return values, runtime fields emit values.
    • emit appends a value to an internally managed list and does not halt execution
    • returning a value in a scripted field halts execution. There is no implicit list management.

Types

  • Scripted field returns are coerced into a javascript type: number, string, date or boolean. The coereced type is selected by the user. The return type for the painless context is Object.
    • The user may return a type that does not match the javascript types (eg a List of Objects).
  • Runtime fields can emit boolean, date (from a long), double, long, String, GeoPoint, IP (from a String).

Contexts

  • Scripted fields uses the painless Field Context.
  • Runtime fields uses a context based on the emited type, eg ip_script_field.

Required Return Value

  • Scripted fields are required to return a value, which may be null.
  • Runtime fields may not emit a value.

Implicit returns

  • Scripted fields will implicitly return the result of the last statement, if that statement is an expression and there is no explicit return.
  • There is no implicit emit from runtime fields.

Options

  • Have kibana perform a string replacement. return x; -> emit(x); return;
  • Provide kibana the AST from a parsed script to perform a more intelligent replacement.
  • Provide an elasticsearch API that performs the replacement.

cc: @mattkime @jdconrad

@stu-elastic stu-elastic added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement labels Feb 25, 2021
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Feb 25, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@stu-elastic
Copy link
Contributor Author

stu-elastic commented Mar 1, 2021

  • Have kibana perform a string replacement. return x; -> emit(x); return;
    • cost: Small? (needs confirmation from @mattkime)
    • 80% solution, may have difficulties in handling cases where x is an expression. Would have to match parens, curly braces, double and single quotes (including escapes) for a 99% solution.
  • Provide kibana the AST from a parsed script to perform a more intelligent replacement.
    • cost: X-Large
    • High backward compatibility cost, exposes internal datastructures that change regularly.
  • Provide an elasticsearch API that performs the replacement.
    • cost: Large
    • One-off API that must be removed in next major. Adds costs to AST changes, as they must consider this feature.

@mattkime
Copy link

At least initially I'm not feeling very confident about the first option but perhaps I can get there. Obviously return x; -> emit(x); return; isn't too bad.

but parens, curly braces, double and single quotes (including escapes) makes me nervous.

If nothing else I'd need a complete set of examples to work from as a set of test cases.

Perhaps @stu-elastic or someone else knows how to simplify the problem.

@stu-elastic
Copy link
Contributor Author

Ahh curlies are not relevant, that was mistyped.

The rest is standard expression parsing to do it perfectly.

I suspect there'll be a ton of mileage gained from simply grabbing until the next ;. The next level would be to detect ; inside a string, but starts to get vanishingly unlikely.

@stu-elastic
Copy link
Contributor Author

stu-elastic commented Mar 17, 2021

Assuming there's no implicit return, here's an idea of the flavors of return statements one could encounter:
return (1 + 2 + Math.floor(foo));
return 1 + 2 + Math.floor(foo);
return "abc";
return foo;
return 5;
return "this String has a ; in the middle";
return 'this string is single quoted and has a double quote (") inside';
return 'this string has an escaped single quote(\\')';

@mattkime
Copy link

@stu-elastic Thanks, thats looking a lot more approachable.

Might it be possible for return to reference a multiline construct?

@stu-elastic
Copy link
Contributor Author

Yes.
Newline \n, along with tab \t, carriage return \r and space are all considered whitespace. Elements of an expression may be separated by whitespace.

@stu-elastic
Copy link
Contributor Author

Here's another idea, why not do the same thing kibana does when it takes kibana scripted fields and uses them as filters in discover?

In that case, kibana generates some methods and embeds the old script in a lambda:

boolean gte(Supplier s, def v) {return s.get() >= v }
boolean lt(Supplier s, def v) { return s.get() < v }
gte(() -> { 
  // previous script here
})

Instead of doing the string transform with return, why not just wrap the entire user script in a method and emit the result of the method?

def exec() {
  // previous script here
}
emit(exec())

This reuses the script transform infrastructure already in kibana while avoiding any parsing. The same limitations in these transforms as already apply to kibana scripted fields used as filters in discover.

If there's a syntax error, it'd show up in a call to the runtime fields execute api #70467

@mattkime
Copy link

This reuses the script transform infrastructure already in kibana while avoiding any parsing. The same limitations in these transforms as already apply to kibana scripted fields used as filters in discover.

If I remember correctly, this doesn't work if there are function definitions as part of the script, correct? What other code structures can't be dropped inside a function?

@stu-elastic
Copy link
Contributor Author

stu-elastic commented Mar 24, 2021

That's the only limitation. one other, see below

@jdconrad
Copy link
Contributor

jdconrad commented Mar 24, 2021

One other caveat, that is a long term fix @stu-elastic is working on actually is our "static" methods that require the Script pointer will not work once we put them inside a user-defined function, but most of these aren't used often.

Example:

...
return cosineSimilarity(my_vectors, "my_field");

@stu-elastic
Copy link
Contributor Author

Ahh good catch @jdconrad. The nice thing is both of those limitations would fail at compile time, so they would be detectable via the execute api.

@mattkime
Copy link

is our "static" methods that require the Script pointer

Can you say more about this or point me to the docs? Looking at the example code I just see a function invocation with two arguments.

The nice thing is both of those limitations would fail at compile time, so they would be detectable via the execute api.

Compilation failures strike me as an ideal way to catch failed scripts.

@jdconrad
Copy link
Contributor

@mattkime I'll go through our allow lists and see if I can gather up the specific methods. Our documentation is likely out-of-date for all them, and probably a bit unwieldy to gather them through there.

@mattkime
Copy link

One thing worth pointing out is that wrapping scripts doesn't provide a very good introduction to writing a runtime script. I'm not sure if that will affect which route we choose but its worth mentioning.

@jdconrad
Copy link
Contributor

jdconrad commented Mar 31, 2021

@mattkime @jimczi @rdnm @stu-elastic @rjernst

I wanted to get a better understanding of what can be provided client side for this migration, so I took a pass at this problem with the constraints of doing a string to string translation using standard data structures.

The idea is the Java code I have is a good approximation of the equivalent TypeScript code. Doing better on the server-side may be possible, but would require a decent amount of investigation.

I have stored the code in a test case class on this branch in this file (https://github.com/jdconrad/elasticsearch/blob/kmig/modules/lang-painless/src/test/java/org/elasticsearch/painless/AdditionTests.java#L24). The implementation is in replaceReturnWithEmit.

Notes:

  • It's a bit on the longer side, since as a short-term one off I did not want to invest too much time here. replaceReturnWithEmit is 624 lines of Java, including comments and empty lines. The rest of the file is driver code and tests.
  • It's purely string to string conversion based on sentinel characters used to collect enough information to convert return to emit in the correct places.
  • This should be directly portable to TypeScript without anything special to the best of my knowledge.
  • It uses simple data structures such as maps, and lists and relies on the ability to extract individual characters from a string.

Caveats:

  • This only works for scripts that are syntactically valid.
  • This will not handle a declaration as the last statement. While this is semantically valid, it doesn't make sense in practice.
  • This only works for scripts that are guaranteed to return a value including implicit returns for the last statement.
    This only works for single-valued scripted fields. Any scripted field that would emit multiple values is not covered here, such as returning a list or object. Luckily, to my knowledge, Kibana does not provide affordances returning these.
  • This definitely needs some more testing.

Some examples:

doc['return'] + doc['my_return_field']
 emit(doc['return'] + doc['my_return_field'])

The above ignores strings as part of the field names. It then takes an implicit return value and adds an emit to the last statement.

Another more complex example:

int get(/* blah */){return 5;}org.elastic.type [] method() {   //test
 return null}  if (x) {return 1} else return 2; /*hello hello*/ get(x -> x, () -> {return 9;}); // comment
int get(/* blah */){return 5;}org.elastic.type [] method() {   //test
 return null}  if (x) {emit(1); return;} else {emit(2); return;} emit(/*hello hello*/ get(x -> x, () -> {return 9;})); // comment

The above ignores the returns as part of user-defined functions and lambdas. It ignores comments. It converts main body statements with return to emit(...); return; with appropriate braces around single statement else. It converts the last statement to emit a value for an implicit return.

@mattkime
Copy link

My initial reaction looking at that code - its what I thought this would require but I lack the experience to do it confidently. The code looks easy to translate into typescript although I'd want a pretty extensive test suite to go with it.

This only works for single-valued scripted fields. Any scripted field that would emit multiple values is not covered here, such as returning a list or object. Luckily, to my knowledge, Kibana does not provide affordances returning these.

Kibana scripted fields does support this. That said, we'd probably be okay not supporting it.

@jdconrad
Copy link
Contributor

jdconrad commented Apr 5, 2021

Moved this code to (https://github.com/jdconrad/elasticsearch/blob/kmig/modules/lang-painless/src/test/java/org/elasticsearch/painless/MigrationTests.java) for now. Added 72 tests. TODOs embedded to add more. Pausing for now.

@mattkime
Copy link

mattkime commented Apr 6, 2021

@jdconrad Our current plan is to not provide an automated method of converting scripted fields. Still good to have this in case something changes.

@stu-elastic
Copy link
Contributor Author

Let's close for now, if the plans change please reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

4 participants